#10008 closed defect (can't reproduce)
Reading PDB file: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte
Reported by: | Owned by: | pett | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | Tom Goddard | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: Windows-10-10.0.19045 ChimeraX Version: 1.6.1 (2023-05-09 17:57:07 UTC) Description (Describe the actions that caused this problem to occur here) Log: Startup Messages --- note | available bundle cache has not been initialized yet You can double click a model's Name or ID in the model panel to edit those fields UCSF ChimeraX version: 1.6.1 (2023-05-09) © 2016-2023 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX > open "C:/Users/beatr/Downloads/PDB TP4/1a6m.pdb" > "C:/Users/beatr/Downloads/PDB TP4/1a19.pdb" "C:/Users/beatr/Downloads/PDB > TP4/1jug.pdb" "C:/Users/beatr/Downloads/PDB TP4/1neu.pdb" > "C:/Users/beatr/Downloads/PDB TP4/1shs.pdb" "C:/Users/beatr/Downloads/PDB > TP4/2bnh.pdb" "C:/Users/beatr/Downloads/PDB TP4/4zhp.pdb" > "C:/Users/beatr/Downloads/PDB TP4/6hit.pdb" 1a6m.pdb title: Oxy-myoglobin, atomic resolution [more info...] Chain information for 1a6m.pdb #1 --- Chain | Description | UniProt A | myoglobin | MYG_PHYCA 1-151 Non-standard residues in 1a6m.pdb #1 --- HEM — protoporphyrin IX containing Fe (HEME) OXY — oxygen molecule SO4 — sulfate ion 1a19.pdb title: Barstar (free), C82A mutant [more info...] Chain information for 1a19.pdb #2 --- Chain | Description | UniProt A B | barstar | BARS_BACAM 1-89 1jug.pdb title: Lysozyme from echiDNA milk (tachyglossus aculeatus) [more info...] Chain information for 1jug.pdb #3 --- Chain | Description | UniProt A | lysozyme | LYSC1_TACAC 1-125 Non-standard residues in 1jug.pdb #3 --- CA — calcium ion 1neu.pdb title: Structure of myelin membrane adhesion molecule P0 [more info...] Chain information for 1neu.pdb #4 --- Chain | Description | UniProt A | myelin P0 protein | MYP0_RAT 1-124 1shs.pdb title: Small heat shock protein from methanococcus jannaschii [more info...] Chain information for 1shs.pdb #5 --- Chain | Description | UniProt A B C D E F G H | small heat shock protein | HSPS_METJA 1-147 2bnh.pdb title: Porcine ribonuclease inhibitor [more info...] Chain information for 2bnh.pdb #6 --- Chain | Description | UniProt A | ribonuclease inhibitor | RINI_PIG 1-456 Non-standard residues in 2bnh.pdb #6 --- ACE — acetyl group 4zhp.pdb title: The crystal structure of potato ferredoxin I with 2FE-2S cluster [more info...] Chain information for 4zhp.pdb #7 --- Chain | Description | UniProt A | potato ferredoxin I | Q93XJ9_SOLTU 1-98 Non-standard residues in 4zhp.pdb #7 --- FES — FE2/S2 (inorganic) cluster 6hit.pdb title: The crystal structure of haemoglobin from atlantic cod [more info...] Chain information for 6hit.pdb #8 --- Chain | Description | UniProt A C E G | hemoglobin α chain | B3F9D9_GADMO 2-143 B D F H | hemoglobin β chain | B3F9D7_GADMO 2-146 Non-standard residues in 6hit.pdb #8 --- ACE — acetyl group HEM — protoporphyrin IX containing Fe (HEME) > close session > open "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/1nqu.pdb" > "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/1rvv.pdb" > "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/psicrofilico.pdb" Traceback (most recent call last): File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\open_command\dialog.py", line 162, in _qt_safe run(session, "open " + " ".join([FileNameArg.unparse(p) for p in paths]) + ("" File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\core\commands\run.py", line 38, in run results = command.run(text, log=log, return_json=return_json) File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\core\commands\cli.py", line 2897, in run result = ci.function(session, **kw_args) File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\open_command\cmd.py", line 119, in cmd_open models = Command(session, registry=registry).run(provider_cmd_text, log=log)[0] File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\core\commands\cli.py", line 2897, in run result = ci.function(session, **kw_args) File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\open_command\cmd.py", line 194, in provider_open models, status = collated_open(session, None, [data], data_format, _add_models, File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\open_command\cmd.py", line 464, in collated_open return remember_data_format() File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\open_command\cmd.py", line 435, in remember_data_format models, status = func(*func_args, **func_kw) File "C:\Program Files\ChimeraX\bin\lib\site- packages\chimerax\pdb\\__init__.py", line 34, in open return pdb.open_pdb(session, data, file_name, **kw) File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\pdb\pdb.py", line 78, in open_pdb pointers = _pdbio.read_pdb_file(stream, session.logger, not coordsets, atomic, segid_chains, File "C:\Program Files\ChimeraX\bin\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte File "C:\Program Files\ChimeraX\bin\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) See log for complete Python traceback. OpenGL version: 3.3.0 - Build 27.20.100.8682 OpenGL renderer: Intel(R) HD Graphics 500 OpenGL vendor: Intel Python: 3.9.11 Locale: es_AR.cp1252 Qt version: PyQt6 6.4.2, Qt 6.4.2 Qt runtime version: 6.4.3 Qt platform: windows Manufacturer: ASUSTeK COMPUTER INC. Model: X541NA OS: Microsoft Windows 10 Home Single Language (Build 19045) Memory: 4,151,885,824 MaxProcessMemory: 137,438,953,344 CPU: 2 Intel(R) Celeron(R) CPU N3350 @ 1.10GHz OSLanguage: es-ES Installed Packages: alabaster: 0.7.13 appdirs: 1.4.4 asttokens: 2.2.1 Babel: 2.12.1 backcall: 0.2.0 beautifulsoup4: 4.11.2 blockdiag: 3.0.0 build: 0.10.0 certifi: 2023.5.7 cftime: 1.6.2 charset-normalizer: 3.1.0 ChimeraX-AddCharge: 1.5.9.1 ChimeraX-AddH: 2.2.5 ChimeraX-AlignmentAlgorithms: 2.0.1 ChimeraX-AlignmentHdrs: 3.3.1 ChimeraX-AlignmentMatrices: 2.1 ChimeraX-Alignments: 2.9.3 ChimeraX-AlphaFold: 1.0 ChimeraX-AltlocExplorer: 1.0.3 ChimeraX-AmberInfo: 1.0 ChimeraX-Arrays: 1.1 ChimeraX-Atomic: 1.43.10 ChimeraX-AtomicLibrary: 10.0.6 ChimeraX-AtomSearch: 2.0.1 ChimeraX-AxesPlanes: 2.3.2 ChimeraX-BasicActions: 1.1.2 ChimeraX-BILD: 1.0 ChimeraX-BlastProtein: 2.1.2 ChimeraX-BondRot: 2.0.1 ChimeraX-BugReporter: 1.0.1 ChimeraX-BuildStructure: 2.8 ChimeraX-Bumps: 1.0 ChimeraX-BundleBuilder: 1.2.2 ChimeraX-ButtonPanel: 1.0.1 ChimeraX-CageBuilder: 1.0.1 ChimeraX-CellPack: 1.0 ChimeraX-Centroids: 1.3.2 ChimeraX-ChangeChains: 1.0.2 ChimeraX-CheckWaters: 1.3.1 ChimeraX-ChemGroup: 2.0.1 ChimeraX-Clashes: 2.2.4 ChimeraX-ColorActions: 1.0.3 ChimeraX-ColorGlobe: 1.0 ChimeraX-ColorKey: 1.5.3 ChimeraX-CommandLine: 1.2.5 ChimeraX-ConnectStructure: 2.0.1 ChimeraX-Contacts: 1.0.1 ChimeraX-Core: 1.6.1 ChimeraX-CoreFormats: 1.1 ChimeraX-coulombic: 1.4.2 ChimeraX-Crosslinks: 1.0 ChimeraX-Crystal: 1.0 ChimeraX-CrystalContacts: 1.0.1 ChimeraX-DataFormats: 1.2.3 ChimeraX-Dicom: 1.2 ChimeraX-DistMonitor: 1.4 ChimeraX-DockPrep: 1.1.1 ChimeraX-Dssp: 2.0 ChimeraX-EMDB-SFF: 1.0 ChimeraX-ESMFold: 1.0 ChimeraX-FileHistory: 1.0.1 ChimeraX-FunctionKey: 1.0.1 ChimeraX-Geometry: 1.3 ChimeraX-gltf: 1.0 ChimeraX-Graphics: 1.1.1 ChimeraX-Hbonds: 2.4 ChimeraX-Help: 1.2.1 ChimeraX-HKCage: 1.3 ChimeraX-IHM: 1.1 ChimeraX-ImageFormats: 1.2 ChimeraX-IMOD: 1.0 ChimeraX-IO: 1.0.1 ChimeraX-ItemsInspection: 1.0.1 ChimeraX-Label: 1.1.7 ChimeraX-ListInfo: 1.1.1 ChimeraX-Log: 1.1.5 ChimeraX-LookingGlass: 1.1 ChimeraX-Maestro: 1.8.2 ChimeraX-Map: 1.1.4 ChimeraX-MapData: 2.0 ChimeraX-MapEraser: 1.0.1 ChimeraX-MapFilter: 2.0.1 ChimeraX-MapFit: 2.0 ChimeraX-MapSeries: 2.1.1 ChimeraX-Markers: 1.0.1 ChimeraX-Mask: 1.0.2 ChimeraX-MatchMaker: 2.0.12 ChimeraX-MDcrds: 2.6 ChimeraX-MedicalToolbar: 1.0.2 ChimeraX-Meeting: 1.0.1 ChimeraX-MLP: 1.1.1 ChimeraX-mmCIF: 2.12 ChimeraX-MMTF: 2.2 ChimeraX-Modeller: 1.5.9 ChimeraX-ModelPanel: 1.3.7 ChimeraX-ModelSeries: 1.0.1 ChimeraX-Mol2: 2.0 ChimeraX-Mole: 1.0 ChimeraX-Morph: 1.0.2 ChimeraX-MouseModes: 1.2 ChimeraX-Movie: 1.0 ChimeraX-Neuron: 1.0 ChimeraX-Nifti: 1.0 ChimeraX-NRRD: 1.0 ChimeraX-Nucleotides: 2.0.3 ChimeraX-OpenCommand: 1.10.1 ChimeraX-PDB: 2.7.2 ChimeraX-PDBBio: 1.0 ChimeraX-PDBLibrary: 1.0.2 ChimeraX-PDBMatrices: 1.0 ChimeraX-PickBlobs: 1.0.1 ChimeraX-Positions: 1.0 ChimeraX-PresetMgr: 1.1 ChimeraX-PubChem: 2.1 ChimeraX-ReadPbonds: 1.0.1 ChimeraX-Registration: 1.1.1 ChimeraX-RemoteControl: 1.0 ChimeraX-RenderByAttr: 1.1 ChimeraX-RenumberResidues: 1.1 ChimeraX-ResidueFit: 1.0.1 ChimeraX-RestServer: 1.1 ChimeraX-RNALayout: 1.0 ChimeraX-RotamerLibMgr: 3.0 ChimeraX-RotamerLibsDunbrack: 2.0 ChimeraX-RotamerLibsDynameomics: 2.0 ChimeraX-RotamerLibsRichardson: 2.0 ChimeraX-SaveCommand: 1.5.1 ChimeraX-SchemeMgr: 1.0 ChimeraX-SDF: 2.0.1 ChimeraX-Segger: 1.0 ChimeraX-Segment: 1.0.1 ChimeraX-SelInspector: 1.0 ChimeraX-SeqView: 2.8.3 ChimeraX-Shape: 1.0.1 ChimeraX-Shell: 1.0.1 ChimeraX-Shortcuts: 1.1.1 ChimeraX-ShowSequences: 1.0.1 ChimeraX-SideView: 1.0.1 ChimeraX-Smiles: 2.1 ChimeraX-SmoothLines: 1.0 ChimeraX-SpaceNavigator: 1.0 ChimeraX-StdCommands: 1.10.3 ChimeraX-STL: 1.0.1 ChimeraX-Storm: 1.0 ChimeraX-StructMeasure: 1.1.2 ChimeraX-Struts: 1.0.1 ChimeraX-Surface: 1.0.1 ChimeraX-SwapAA: 2.0.1 ChimeraX-SwapRes: 2.2.1 ChimeraX-TapeMeasure: 1.0 ChimeraX-Test: 1.0 ChimeraX-Toolbar: 1.1.2 ChimeraX-ToolshedUtils: 1.2.1 ChimeraX-Topography: 1.0 ChimeraX-Tug: 1.0.1 ChimeraX-UI: 1.28.4 ChimeraX-uniprot: 2.2.2 ChimeraX-UnitCell: 1.0.1 ChimeraX-ViewDockX: 1.2 ChimeraX-VIPERdb: 1.0 ChimeraX-Vive: 1.1 ChimeraX-VolumeMenu: 1.0.1 ChimeraX-VTK: 1.0 ChimeraX-WavefrontOBJ: 1.0 ChimeraX-WebCam: 1.0.2 ChimeraX-WebServices: 1.1.1 ChimeraX-Zone: 1.0.1 colorama: 0.4.6 comm: 0.1.3 comtypes: 1.1.14 contourpy: 1.0.7 cxservices: 1.2.2 cycler: 0.11.0 Cython: 0.29.33 debugpy: 1.6.7 decorator: 5.1.1 docutils: 0.19 executing: 1.2.0 filelock: 3.9.0 fonttools: 4.39.3 funcparserlib: 1.0.1 grako: 3.16.5 h5py: 3.8.0 html2text: 2020.1.16 idna: 3.4 ihm: 0.35 imagecodecs: 2022.9.26 imagesize: 1.4.1 importlib-metadata: 6.6.0 ipykernel: 6.21.1 ipython: 8.10.0 ipython-genutils: 0.2.0 ipywidgets: 8.0.6 jedi: 0.18.2 Jinja2: 3.1.2 jupyter-client: 8.0.2 jupyter-core: 5.3.0 jupyterlab-widgets: 3.0.7 kiwisolver: 1.4.4 line-profiler: 4.0.2 lxml: 4.9.2 lz4: 4.3.2 MarkupSafe: 2.1.2 matplotlib: 3.6.3 matplotlib-inline: 0.1.6 msgpack: 1.0.4 nest-asyncio: 1.5.6 netCDF4: 1.6.2 networkx: 2.8.8 nibabel: 5.0.1 nptyping: 2.5.0 numexpr: 2.8.4 numpy: 1.23.5 openvr: 1.23.701 packaging: 23.1 ParmEd: 3.4.3 parso: 0.8.3 pep517: 0.13.0 pickleshare: 0.7.5 Pillow: 9.3.0 pip: 23.0 pkginfo: 1.9.6 platformdirs: 3.5.0 prompt-toolkit: 3.0.38 psutil: 5.9.4 pure-eval: 0.2.2 pycollada: 0.7.2 pydicom: 2.3.0 Pygments: 2.14.0 pynrrd: 1.0.0 PyOpenGL: 3.1.5 PyOpenGL-accelerate: 3.1.5 pyparsing: 3.0.9 pyproject-hooks: 1.0.0 PyQt6-commercial: 6.4.2 PyQt6-Qt6: 6.4.3 PyQt6-sip: 13.4.1 PyQt6-WebEngine-commercial: 6.4.0 PyQt6-WebEngine-Qt6: 6.4.3 python-dateutil: 2.8.2 pytz: 2023.3 pywin32: 305 pyzmq: 25.0.2 qtconsole: 5.4.0 QtPy: 2.3.1 RandomWords: 0.4.0 requests: 2.28.2 scipy: 1.9.3 setuptools: 67.4.0 sfftk-rw: 0.7.3 six: 1.16.0 snowballstemmer: 2.2.0 sortedcontainers: 2.4.0 soupsieve: 2.4.1 sphinx: 6.1.3 sphinx-autodoc-typehints: 1.22 sphinxcontrib-applehelp: 1.0.4 sphinxcontrib-blockdiag: 3.0.0 sphinxcontrib-devhelp: 1.0.2 sphinxcontrib-htmlhelp: 2.0.1 sphinxcontrib-jsmath: 1.0.1 sphinxcontrib-qthelp: 1.0.3 sphinxcontrib-serializinghtml: 1.1.5 stack-data: 0.6.2 tables: 3.7.0 tcia-utils: 1.2.0 tifffile: 2022.10.10 tinyarray: 1.2.4 tomli: 2.0.1 tornado: 6.3.1 traitlets: 5.9.0 typing-extensions: 4.5.0 tzdata: 2023.3 urllib3: 1.26.15 wcwidth: 0.2.6 webcolors: 1.12 wheel: 0.38.4 wheel-filename: 1.4.1 widgetsnbextension: 4.0.7 WMI: 1.5.1 zipp: 3.15.0
Change History (5)
comment:1 by , 2 years ago
Cc: | added |
---|---|
Component: | Unassigned → Input/Output |
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | ChimeraX bug report submission → Reading PDB file: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte |
comment:2 by , 2 years ago
Resolution: | → can't reproduce |
---|---|
Status: | assigned → closed |
Yeah, ChimeraX already tries UTF-8, -16, and -32 automatically, so this must be some kind of Windows codec. If I had the file as an example, I might add some Windows codecs as well. Yes, PDB standard-conformant files are strictly ASCII, but good luck having your program used by anyone if it only accepts standard-conformant files! :-)
comment:3 by , 2 years ago
I hear ya. Maybe it is worth catching UnicodeDecodeError and giving an easier to grasp error message without the traceback like "Your PDB file X.pdb has an invalid character at line X, column Y and cannot be read." On the other hand if this is rarely reported then it may not be worth the small effort to implement a better message.
comment:4 by , 2 years ago
Leaving it as an error gives some chance of getting actual feedback and an example file...
comment:5 by , 2 years ago
Ok. It can be an error and still have an understandable error message. But may not be worth the effort.
PDB file was not valid UTF-8 unicode.
My guess would be it has some special characters embedded in another codec and we are reading as if it is unicode.
Maybe we should catch this error and say the file has special characters that prevent it from being read. Or could try using another codec to read it.
What encoding does the PDB file specification require? ASCII? I recall citations sometimes have accented characters.