#10008 closed defect (can't reproduce)
Reading PDB file: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte
| Reported by: | Owned by: | Eric Pettersen | |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | Input/Output | Version: | |
| Keywords: | Cc: | Tom Goddard | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
The following bug report has been submitted:
Platform: Windows-10-10.0.19045
ChimeraX Version: 1.6.1 (2023-05-09 17:57:07 UTC)
Description
(Describe the actions that caused this problem to occur here)
Log:
Startup Messages
---
note | available bundle cache has not been initialized yet
You can double click a model's Name or ID in the model panel to edit those
fields
UCSF ChimeraX version: 1.6.1 (2023-05-09)
© 2016-2023 Regents of the University of California. All rights reserved.
How to cite UCSF ChimeraX
> open "C:/Users/beatr/Downloads/PDB TP4/1a6m.pdb"
> "C:/Users/beatr/Downloads/PDB TP4/1a19.pdb" "C:/Users/beatr/Downloads/PDB
> TP4/1jug.pdb" "C:/Users/beatr/Downloads/PDB TP4/1neu.pdb"
> "C:/Users/beatr/Downloads/PDB TP4/1shs.pdb" "C:/Users/beatr/Downloads/PDB
> TP4/2bnh.pdb" "C:/Users/beatr/Downloads/PDB TP4/4zhp.pdb"
> "C:/Users/beatr/Downloads/PDB TP4/6hit.pdb"
1a6m.pdb title:
Oxy-myoglobin, atomic resolution [more info...]
Chain information for 1a6m.pdb #1
---
Chain | Description | UniProt
A | myoglobin | MYG_PHYCA 1-151
Non-standard residues in 1a6m.pdb #1
---
HEM — protoporphyrin IX containing Fe (HEME)
OXY — oxygen molecule
SO4 — sulfate ion
1a19.pdb title:
Barstar (free), C82A mutant [more info...]
Chain information for 1a19.pdb #2
---
Chain | Description | UniProt
A B | barstar | BARS_BACAM 1-89
1jug.pdb title:
Lysozyme from echiDNA milk (tachyglossus aculeatus) [more info...]
Chain information for 1jug.pdb #3
---
Chain | Description | UniProt
A | lysozyme | LYSC1_TACAC 1-125
Non-standard residues in 1jug.pdb #3
---
CA — calcium ion
1neu.pdb title:
Structure of myelin membrane adhesion molecule P0 [more info...]
Chain information for 1neu.pdb #4
---
Chain | Description | UniProt
A | myelin P0 protein | MYP0_RAT 1-124
1shs.pdb title:
Small heat shock protein from methanococcus jannaschii [more info...]
Chain information for 1shs.pdb #5
---
Chain | Description | UniProt
A B C D E F G H | small heat shock protein | HSPS_METJA 1-147
2bnh.pdb title:
Porcine ribonuclease inhibitor [more info...]
Chain information for 2bnh.pdb #6
---
Chain | Description | UniProt
A | ribonuclease inhibitor | RINI_PIG 1-456
Non-standard residues in 2bnh.pdb #6
---
ACE — acetyl group
4zhp.pdb title:
The crystal structure of potato ferredoxin I with 2FE-2S cluster [more
info...]
Chain information for 4zhp.pdb #7
---
Chain | Description | UniProt
A | potato ferredoxin I | Q93XJ9_SOLTU 1-98
Non-standard residues in 4zhp.pdb #7
---
FES — FE2/S2 (inorganic) cluster
6hit.pdb title:
The crystal structure of haemoglobin from atlantic cod [more info...]
Chain information for 6hit.pdb #8
---
Chain | Description | UniProt
A C E G | hemoglobin α chain | B3F9D9_GADMO 2-143
B D F H | hemoglobin β chain | B3F9D7_GADMO 2-146
Non-standard residues in 6hit.pdb #8
---
ACE — acetyl group
HEM — protoporphyrin IX containing Fe (HEME)
> close session
> open "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/1nqu.pdb"
> "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/1rvv.pdb"
> "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/psicrofilico.pdb"
Traceback (most recent call last):
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\dialog.py", line 162, in _qt_safe
run(session, "open " + " ".join([FileNameArg.unparse(p) for p in paths]) + (""
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\core\commands\run.py", line 38, in run
results = command.run(text, log=log, return_json=return_json)
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\core\commands\cli.py", line 2897, in run
result = ci.function(session, **kw_args)
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 119, in cmd_open
models = Command(session, registry=registry).run(provider_cmd_text,
log=log)[0]
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\core\commands\cli.py", line 2897, in run
result = ci.function(session, **kw_args)
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 194, in provider_open
models, status = collated_open(session, None, [data], data_format,
_add_models,
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 464, in collated_open
return remember_data_format()
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 435, in remember_data_format
models, status = func(*func_args, **func_kw)
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\pdb\\__init__.py", line 34, in open
return pdb.open_pdb(session, data, file_name, **kw)
File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\pdb\pdb.py",
line 78, in open_pdb
pointers = _pdbio.read_pdb_file(stream, session.logger, not coordsets, atomic,
segid_chains,
File "C:\Program Files\ChimeraX\bin\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94:
invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94:
invalid continuation byte
File "C:\Program Files\ChimeraX\bin\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
See log for complete Python traceback.
OpenGL version: 3.3.0 - Build 27.20.100.8682
OpenGL renderer: Intel(R) HD Graphics 500
OpenGL vendor: Intel
Python: 3.9.11
Locale: es_AR.cp1252
Qt version: PyQt6 6.4.2, Qt 6.4.2
Qt runtime version: 6.4.3
Qt platform: windows
Manufacturer: ASUSTeK COMPUTER INC.
Model: X541NA
OS: Microsoft Windows 10 Home Single Language (Build 19045)
Memory: 4,151,885,824
MaxProcessMemory: 137,438,953,344
CPU: 2 Intel(R) Celeron(R) CPU N3350 @ 1.10GHz
OSLanguage: es-ES
Installed Packages:
alabaster: 0.7.13
appdirs: 1.4.4
asttokens: 2.2.1
Babel: 2.12.1
backcall: 0.2.0
beautifulsoup4: 4.11.2
blockdiag: 3.0.0
build: 0.10.0
certifi: 2023.5.7
cftime: 1.6.2
charset-normalizer: 3.1.0
ChimeraX-AddCharge: 1.5.9.1
ChimeraX-AddH: 2.2.5
ChimeraX-AlignmentAlgorithms: 2.0.1
ChimeraX-AlignmentHdrs: 3.3.1
ChimeraX-AlignmentMatrices: 2.1
ChimeraX-Alignments: 2.9.3
ChimeraX-AlphaFold: 1.0
ChimeraX-AltlocExplorer: 1.0.3
ChimeraX-AmberInfo: 1.0
ChimeraX-Arrays: 1.1
ChimeraX-Atomic: 1.43.10
ChimeraX-AtomicLibrary: 10.0.6
ChimeraX-AtomSearch: 2.0.1
ChimeraX-AxesPlanes: 2.3.2
ChimeraX-BasicActions: 1.1.2
ChimeraX-BILD: 1.0
ChimeraX-BlastProtein: 2.1.2
ChimeraX-BondRot: 2.0.1
ChimeraX-BugReporter: 1.0.1
ChimeraX-BuildStructure: 2.8
ChimeraX-Bumps: 1.0
ChimeraX-BundleBuilder: 1.2.2
ChimeraX-ButtonPanel: 1.0.1
ChimeraX-CageBuilder: 1.0.1
ChimeraX-CellPack: 1.0
ChimeraX-Centroids: 1.3.2
ChimeraX-ChangeChains: 1.0.2
ChimeraX-CheckWaters: 1.3.1
ChimeraX-ChemGroup: 2.0.1
ChimeraX-Clashes: 2.2.4
ChimeraX-ColorActions: 1.0.3
ChimeraX-ColorGlobe: 1.0
ChimeraX-ColorKey: 1.5.3
ChimeraX-CommandLine: 1.2.5
ChimeraX-ConnectStructure: 2.0.1
ChimeraX-Contacts: 1.0.1
ChimeraX-Core: 1.6.1
ChimeraX-CoreFormats: 1.1
ChimeraX-coulombic: 1.4.2
ChimeraX-Crosslinks: 1.0
ChimeraX-Crystal: 1.0
ChimeraX-CrystalContacts: 1.0.1
ChimeraX-DataFormats: 1.2.3
ChimeraX-Dicom: 1.2
ChimeraX-DistMonitor: 1.4
ChimeraX-DockPrep: 1.1.1
ChimeraX-Dssp: 2.0
ChimeraX-EMDB-SFF: 1.0
ChimeraX-ESMFold: 1.0
ChimeraX-FileHistory: 1.0.1
ChimeraX-FunctionKey: 1.0.1
ChimeraX-Geometry: 1.3
ChimeraX-gltf: 1.0
ChimeraX-Graphics: 1.1.1
ChimeraX-Hbonds: 2.4
ChimeraX-Help: 1.2.1
ChimeraX-HKCage: 1.3
ChimeraX-IHM: 1.1
ChimeraX-ImageFormats: 1.2
ChimeraX-IMOD: 1.0
ChimeraX-IO: 1.0.1
ChimeraX-ItemsInspection: 1.0.1
ChimeraX-Label: 1.1.7
ChimeraX-ListInfo: 1.1.1
ChimeraX-Log: 1.1.5
ChimeraX-LookingGlass: 1.1
ChimeraX-Maestro: 1.8.2
ChimeraX-Map: 1.1.4
ChimeraX-MapData: 2.0
ChimeraX-MapEraser: 1.0.1
ChimeraX-MapFilter: 2.0.1
ChimeraX-MapFit: 2.0
ChimeraX-MapSeries: 2.1.1
ChimeraX-Markers: 1.0.1
ChimeraX-Mask: 1.0.2
ChimeraX-MatchMaker: 2.0.12
ChimeraX-MDcrds: 2.6
ChimeraX-MedicalToolbar: 1.0.2
ChimeraX-Meeting: 1.0.1
ChimeraX-MLP: 1.1.1
ChimeraX-mmCIF: 2.12
ChimeraX-MMTF: 2.2
ChimeraX-Modeller: 1.5.9
ChimeraX-ModelPanel: 1.3.7
ChimeraX-ModelSeries: 1.0.1
ChimeraX-Mol2: 2.0
ChimeraX-Mole: 1.0
ChimeraX-Morph: 1.0.2
ChimeraX-MouseModes: 1.2
ChimeraX-Movie: 1.0
ChimeraX-Neuron: 1.0
ChimeraX-Nifti: 1.0
ChimeraX-NRRD: 1.0
ChimeraX-Nucleotides: 2.0.3
ChimeraX-OpenCommand: 1.10.1
ChimeraX-PDB: 2.7.2
ChimeraX-PDBBio: 1.0
ChimeraX-PDBLibrary: 1.0.2
ChimeraX-PDBMatrices: 1.0
ChimeraX-PickBlobs: 1.0.1
ChimeraX-Positions: 1.0
ChimeraX-PresetMgr: 1.1
ChimeraX-PubChem: 2.1
ChimeraX-ReadPbonds: 1.0.1
ChimeraX-Registration: 1.1.1
ChimeraX-RemoteControl: 1.0
ChimeraX-RenderByAttr: 1.1
ChimeraX-RenumberResidues: 1.1
ChimeraX-ResidueFit: 1.0.1
ChimeraX-RestServer: 1.1
ChimeraX-RNALayout: 1.0
ChimeraX-RotamerLibMgr: 3.0
ChimeraX-RotamerLibsDunbrack: 2.0
ChimeraX-RotamerLibsDynameomics: 2.0
ChimeraX-RotamerLibsRichardson: 2.0
ChimeraX-SaveCommand: 1.5.1
ChimeraX-SchemeMgr: 1.0
ChimeraX-SDF: 2.0.1
ChimeraX-Segger: 1.0
ChimeraX-Segment: 1.0.1
ChimeraX-SelInspector: 1.0
ChimeraX-SeqView: 2.8.3
ChimeraX-Shape: 1.0.1
ChimeraX-Shell: 1.0.1
ChimeraX-Shortcuts: 1.1.1
ChimeraX-ShowSequences: 1.0.1
ChimeraX-SideView: 1.0.1
ChimeraX-Smiles: 2.1
ChimeraX-SmoothLines: 1.0
ChimeraX-SpaceNavigator: 1.0
ChimeraX-StdCommands: 1.10.3
ChimeraX-STL: 1.0.1
ChimeraX-Storm: 1.0
ChimeraX-StructMeasure: 1.1.2
ChimeraX-Struts: 1.0.1
ChimeraX-Surface: 1.0.1
ChimeraX-SwapAA: 2.0.1
ChimeraX-SwapRes: 2.2.1
ChimeraX-TapeMeasure: 1.0
ChimeraX-Test: 1.0
ChimeraX-Toolbar: 1.1.2
ChimeraX-ToolshedUtils: 1.2.1
ChimeraX-Topography: 1.0
ChimeraX-Tug: 1.0.1
ChimeraX-UI: 1.28.4
ChimeraX-uniprot: 2.2.2
ChimeraX-UnitCell: 1.0.1
ChimeraX-ViewDockX: 1.2
ChimeraX-VIPERdb: 1.0
ChimeraX-Vive: 1.1
ChimeraX-VolumeMenu: 1.0.1
ChimeraX-VTK: 1.0
ChimeraX-WavefrontOBJ: 1.0
ChimeraX-WebCam: 1.0.2
ChimeraX-WebServices: 1.1.1
ChimeraX-Zone: 1.0.1
colorama: 0.4.6
comm: 0.1.3
comtypes: 1.1.14
contourpy: 1.0.7
cxservices: 1.2.2
cycler: 0.11.0
Cython: 0.29.33
debugpy: 1.6.7
decorator: 5.1.1
docutils: 0.19
executing: 1.2.0
filelock: 3.9.0
fonttools: 4.39.3
funcparserlib: 1.0.1
grako: 3.16.5
h5py: 3.8.0
html2text: 2020.1.16
idna: 3.4
ihm: 0.35
imagecodecs: 2022.9.26
imagesize: 1.4.1
importlib-metadata: 6.6.0
ipykernel: 6.21.1
ipython: 8.10.0
ipython-genutils: 0.2.0
ipywidgets: 8.0.6
jedi: 0.18.2
Jinja2: 3.1.2
jupyter-client: 8.0.2
jupyter-core: 5.3.0
jupyterlab-widgets: 3.0.7
kiwisolver: 1.4.4
line-profiler: 4.0.2
lxml: 4.9.2
lz4: 4.3.2
MarkupSafe: 2.1.2
matplotlib: 3.6.3
matplotlib-inline: 0.1.6
msgpack: 1.0.4
nest-asyncio: 1.5.6
netCDF4: 1.6.2
networkx: 2.8.8
nibabel: 5.0.1
nptyping: 2.5.0
numexpr: 2.8.4
numpy: 1.23.5
openvr: 1.23.701
packaging: 23.1
ParmEd: 3.4.3
parso: 0.8.3
pep517: 0.13.0
pickleshare: 0.7.5
Pillow: 9.3.0
pip: 23.0
pkginfo: 1.9.6
platformdirs: 3.5.0
prompt-toolkit: 3.0.38
psutil: 5.9.4
pure-eval: 0.2.2
pycollada: 0.7.2
pydicom: 2.3.0
Pygments: 2.14.0
pynrrd: 1.0.0
PyOpenGL: 3.1.5
PyOpenGL-accelerate: 3.1.5
pyparsing: 3.0.9
pyproject-hooks: 1.0.0
PyQt6-commercial: 6.4.2
PyQt6-Qt6: 6.4.3
PyQt6-sip: 13.4.1
PyQt6-WebEngine-commercial: 6.4.0
PyQt6-WebEngine-Qt6: 6.4.3
python-dateutil: 2.8.2
pytz: 2023.3
pywin32: 305
pyzmq: 25.0.2
qtconsole: 5.4.0
QtPy: 2.3.1
RandomWords: 0.4.0
requests: 2.28.2
scipy: 1.9.3
setuptools: 67.4.0
sfftk-rw: 0.7.3
six: 1.16.0
snowballstemmer: 2.2.0
sortedcontainers: 2.4.0
soupsieve: 2.4.1
sphinx: 6.1.3
sphinx-autodoc-typehints: 1.22
sphinxcontrib-applehelp: 1.0.4
sphinxcontrib-blockdiag: 3.0.0
sphinxcontrib-devhelp: 1.0.2
sphinxcontrib-htmlhelp: 2.0.1
sphinxcontrib-jsmath: 1.0.1
sphinxcontrib-qthelp: 1.0.3
sphinxcontrib-serializinghtml: 1.1.5
stack-data: 0.6.2
tables: 3.7.0
tcia-utils: 1.2.0
tifffile: 2022.10.10
tinyarray: 1.2.4
tomli: 2.0.1
tornado: 6.3.1
traitlets: 5.9.0
typing-extensions: 4.5.0
tzdata: 2023.3
urllib3: 1.26.15
wcwidth: 0.2.6
webcolors: 1.12
wheel: 0.38.4
wheel-filename: 1.4.1
widgetsnbextension: 4.0.7
WMI: 1.5.1
zipp: 3.15.0
Change History (5)
comment:1 by , 2 years ago
| Cc: | added |
|---|---|
| Component: | Unassigned → Input/Output |
| Owner: | set to |
| Platform: | → all |
| Project: | → ChimeraX |
| Status: | new → assigned |
| Summary: | ChimeraX bug report submission → Reading PDB file: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte |
comment:2 by , 2 years ago
| Resolution: | → can't reproduce |
|---|---|
| Status: | assigned → closed |
Yeah, ChimeraX already tries UTF-8, -16, and -32 automatically, so this must be some kind of Windows codec. If I had the file as an example, I might add some Windows codecs as well. Yes, PDB standard-conformant files are strictly ASCII, but good luck having your program used by anyone if it only accepts standard-conformant files! :-)
comment:3 by , 2 years ago
I hear ya. Maybe it is worth catching UnicodeDecodeError and giving an easier to grasp error message without the traceback like "Your PDB file X.pdb has an invalid character at line X, column Y and cannot be read." On the other hand if this is rarely reported then it may not be worth the small effort to implement a better message.
comment:4 by , 2 years ago
Leaving it as an error gives some chance of getting actual feedback and an example file...
comment:5 by , 2 years ago
Ok. It can be an error and still have an understandable error message. But may not be worth the effort.
PDB file was not valid UTF-8 unicode.
My guess would be it has some special characters embedded in another codec and we are reading as if it is unicode.
Maybe we should catch this error and say the file has special characters that prevent it from being read. Or could try using another codec to read it.
What encoding does the PDB file specification require? ASCII? I recall citations sometimes have accented characters.