Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#10008 closed defect (can't reproduce)

Reading PDB file: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte

Reported by: chimerax-bug-report@… Owned by: pett
Priority: normal Milestone:
Component: Input/Output Version:
Keywords: Cc: Tom Goddard
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Windows-10-10.0.19045
ChimeraX Version: 1.6.1 (2023-05-09 17:57:07 UTC)
Description
(Describe the actions that caused this problem to occur here)

Log:
Startup Messages  
---  
note | available bundle cache has not been initialized yet  
  
You can double click a model's Name or ID in the model panel to edit those
fields  
UCSF ChimeraX version: 1.6.1 (2023-05-09)  
© 2016-2023 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open "C:/Users/beatr/Downloads/PDB TP4/1a6m.pdb"
> "C:/Users/beatr/Downloads/PDB TP4/1a19.pdb" "C:/Users/beatr/Downloads/PDB
> TP4/1jug.pdb" "C:/Users/beatr/Downloads/PDB TP4/1neu.pdb"
> "C:/Users/beatr/Downloads/PDB TP4/1shs.pdb" "C:/Users/beatr/Downloads/PDB
> TP4/2bnh.pdb" "C:/Users/beatr/Downloads/PDB TP4/4zhp.pdb"
> "C:/Users/beatr/Downloads/PDB TP4/6hit.pdb"

1a6m.pdb title:  
Oxy-myoglobin, atomic resolution [more info...]  
  
Chain information for 1a6m.pdb #1  
---  
Chain | Description | UniProt  
A | myoglobin | MYG_PHYCA 1-151  
  
Non-standard residues in 1a6m.pdb #1  
---  
HEM — protoporphyrin IX containing Fe (HEME)  
OXY — oxygen molecule  
SO4 — sulfate ion  
  
1a19.pdb title:  
Barstar (free), C82A mutant [more info...]  
  
Chain information for 1a19.pdb #2  
---  
Chain | Description | UniProt  
A B | barstar | BARS_BACAM 1-89  
  
1jug.pdb title:  
Lysozyme from echiDNA milk (tachyglossus aculeatus) [more info...]  
  
Chain information for 1jug.pdb #3  
---  
Chain | Description | UniProt  
A | lysozyme | LYSC1_TACAC 1-125  
  
Non-standard residues in 1jug.pdb #3  
---  
CA — calcium ion  
  
1neu.pdb title:  
Structure of myelin membrane adhesion molecule P0 [more info...]  
  
Chain information for 1neu.pdb #4  
---  
Chain | Description | UniProt  
A | myelin P0 protein | MYP0_RAT 1-124  
  
1shs.pdb title:  
Small heat shock protein from methanococcus jannaschii [more info...]  
  
Chain information for 1shs.pdb #5  
---  
Chain | Description | UniProt  
A B C D E F G H | small heat shock protein | HSPS_METJA 1-147  
  
2bnh.pdb title:  
Porcine ribonuclease inhibitor [more info...]  
  
Chain information for 2bnh.pdb #6  
---  
Chain | Description | UniProt  
A | ribonuclease inhibitor | RINI_PIG 1-456  
  
Non-standard residues in 2bnh.pdb #6  
---  
ACE — acetyl group  
  
4zhp.pdb title:  
The crystal structure of potato ferredoxin I with 2FE-2S cluster [more
info...]  
  
Chain information for 4zhp.pdb #7  
---  
Chain | Description | UniProt  
A | potato ferredoxin I | Q93XJ9_SOLTU 1-98  
  
Non-standard residues in 4zhp.pdb #7  
---  
FES — FE2/S2 (inorganic) cluster  
  
6hit.pdb title:  
The crystal structure of haemoglobin from atlantic cod [more info...]  
  
Chain information for 6hit.pdb #8  
---  
Chain | Description | UniProt  
A C E G | hemoglobin α chain | B3F9D9_GADMO 2-143  
B D F H | hemoglobin β chain | B3F9D7_GADMO 2-146  
  
Non-standard residues in 6hit.pdb #8  
---  
ACE — acetyl group  
HEM — protoporphyrin IX containing Fe (HEME)  
  

> close session

> open "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/1nqu.pdb"
> "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/1rvv.pdb"
> "C:/Users/beatr/Downloads/Lumazina sintasa PDB TP4/psicrofilico.pdb"

Traceback (most recent call last):  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\dialog.py", line 162, in _qt_safe  
run(session, "open " + " ".join([FileNameArg.unparse(p) for p in paths]) + (""  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\core\commands\run.py", line 38, in run  
results = command.run(text, log=log, return_json=return_json)  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\core\commands\cli.py", line 2897, in run  
result = ci.function(session, **kw_args)  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 119, in cmd_open  
models = Command(session, registry=registry).run(provider_cmd_text,
log=log)[0]  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\core\commands\cli.py", line 2897, in run  
result = ci.function(session, **kw_args)  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 194, in provider_open  
models, status = collated_open(session, None, [data], data_format,
_add_models,  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 464, in collated_open  
return remember_data_format()  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\open_command\cmd.py", line 435, in remember_data_format  
models, status = func(*func_args, **func_kw)  
File "C:\Program Files\ChimeraX\bin\lib\site-
packages\chimerax\pdb\\__init__.py", line 34, in open  
return pdb.open_pdb(session, data, file_name, **kw)  
File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\pdb\pdb.py",
line 78, in open_pdb  
pointers = _pdbio.read_pdb_file(stream, session.logger, not coordsets, atomic,
segid_chains,  
File "C:\Program Files\ChimeraX\bin\lib\codecs.py", line 322, in decode  
(result, consumed) = self._buffer_decode(data, self.errors, final)  
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94:
invalid continuation byte  
  
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94:
invalid continuation byte  
  
File "C:\Program Files\ChimeraX\bin\lib\codecs.py", line 322, in decode  
(result, consumed) = self._buffer_decode(data, self.errors, final)  
  
See log for complete Python traceback.  
  




OpenGL version: 3.3.0 - Build 27.20.100.8682
OpenGL renderer: Intel(R) HD Graphics 500
OpenGL vendor: Intel

Python: 3.9.11
Locale: es_AR.cp1252
Qt version: PyQt6 6.4.2, Qt 6.4.2
Qt runtime version: 6.4.3
Qt platform: windows

Manufacturer: ASUSTeK COMPUTER INC.
Model: X541NA
OS: Microsoft Windows 10 Home Single Language (Build 19045)
Memory: 4,151,885,824
MaxProcessMemory: 137,438,953,344
CPU: 2 Intel(R) Celeron(R) CPU N3350 @ 1.10GHz
OSLanguage: es-ES

Installed Packages:
    alabaster: 0.7.13
    appdirs: 1.4.4
    asttokens: 2.2.1
    Babel: 2.12.1
    backcall: 0.2.0
    beautifulsoup4: 4.11.2
    blockdiag: 3.0.0
    build: 0.10.0
    certifi: 2023.5.7
    cftime: 1.6.2
    charset-normalizer: 3.1.0
    ChimeraX-AddCharge: 1.5.9.1
    ChimeraX-AddH: 2.2.5
    ChimeraX-AlignmentAlgorithms: 2.0.1
    ChimeraX-AlignmentHdrs: 3.3.1
    ChimeraX-AlignmentMatrices: 2.1
    ChimeraX-Alignments: 2.9.3
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0.3
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.1
    ChimeraX-Atomic: 1.43.10
    ChimeraX-AtomicLibrary: 10.0.6
    ChimeraX-AtomSearch: 2.0.1
    ChimeraX-AxesPlanes: 2.3.2
    ChimeraX-BasicActions: 1.1.2
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.1.2
    ChimeraX-BondRot: 2.0.1
    ChimeraX-BugReporter: 1.0.1
    ChimeraX-BuildStructure: 2.8
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.2.2
    ChimeraX-ButtonPanel: 1.0.1
    ChimeraX-CageBuilder: 1.0.1
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.3.2
    ChimeraX-ChangeChains: 1.0.2
    ChimeraX-CheckWaters: 1.3.1
    ChimeraX-ChemGroup: 2.0.1
    ChimeraX-Clashes: 2.2.4
    ChimeraX-ColorActions: 1.0.3
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5.3
    ChimeraX-CommandLine: 1.2.5
    ChimeraX-ConnectStructure: 2.0.1
    ChimeraX-Contacts: 1.0.1
    ChimeraX-Core: 1.6.1
    ChimeraX-CoreFormats: 1.1
    ChimeraX-coulombic: 1.4.2
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0.1
    ChimeraX-DataFormats: 1.2.3
    ChimeraX-Dicom: 1.2
    ChimeraX-DistMonitor: 1.4
    ChimeraX-DockPrep: 1.1.1
    ChimeraX-Dssp: 2.0
    ChimeraX-EMDB-SFF: 1.0
    ChimeraX-ESMFold: 1.0
    ChimeraX-FileHistory: 1.0.1
    ChimeraX-FunctionKey: 1.0.1
    ChimeraX-Geometry: 1.3
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1.1
    ChimeraX-Hbonds: 2.4
    ChimeraX-Help: 1.2.1
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ItemsInspection: 1.0.1
    ChimeraX-Label: 1.1.7
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.5
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.2
    ChimeraX-Map: 1.1.4
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0.1
    ChimeraX-MapFilter: 2.0.1
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1.1
    ChimeraX-Markers: 1.0.1
    ChimeraX-Mask: 1.0.2
    ChimeraX-MatchMaker: 2.0.12
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.2
    ChimeraX-Meeting: 1.0.1
    ChimeraX-MLP: 1.1.1
    ChimeraX-mmCIF: 2.12
    ChimeraX-MMTF: 2.2
    ChimeraX-Modeller: 1.5.9
    ChimeraX-ModelPanel: 1.3.7
    ChimeraX-ModelSeries: 1.0.1
    ChimeraX-Mol2: 2.0
    ChimeraX-Mole: 1.0
    ChimeraX-Morph: 1.0.2
    ChimeraX-MouseModes: 1.2
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nifti: 1.0
    ChimeraX-NRRD: 1.0
    ChimeraX-Nucleotides: 2.0.3
    ChimeraX-OpenCommand: 1.10.1
    ChimeraX-PDB: 2.7.2
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-PickBlobs: 1.0.1
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0.1
    ChimeraX-Registration: 1.1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-RenderByAttr: 1.1
    ChimeraX-RenumberResidues: 1.1
    ChimeraX-ResidueFit: 1.0.1
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 3.0
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5.1
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0.1
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0.1
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.8.3
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0.1
    ChimeraX-Shortcuts: 1.1.1
    ChimeraX-ShowSequences: 1.0.1
    ChimeraX-SideView: 1.0.1
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.10.3
    ChimeraX-STL: 1.0.1
    ChimeraX-Storm: 1.0
    ChimeraX-StructMeasure: 1.1.2
    ChimeraX-Struts: 1.0.1
    ChimeraX-Surface: 1.0.1
    ChimeraX-SwapAA: 2.0.1
    ChimeraX-SwapRes: 2.2.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1.2
    ChimeraX-ToolshedUtils: 1.2.1
    ChimeraX-Topography: 1.0
    ChimeraX-Tug: 1.0.1
    ChimeraX-UI: 1.28.4
    ChimeraX-uniprot: 2.2.2
    ChimeraX-UnitCell: 1.0.1
    ChimeraX-ViewDockX: 1.2
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0.1
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0.2
    ChimeraX-WebServices: 1.1.1
    ChimeraX-Zone: 1.0.1
    colorama: 0.4.6
    comm: 0.1.3
    comtypes: 1.1.14
    contourpy: 1.0.7
    cxservices: 1.2.2
    cycler: 0.11.0
    Cython: 0.29.33
    debugpy: 1.6.7
    decorator: 5.1.1
    docutils: 0.19
    executing: 1.2.0
    filelock: 3.9.0
    fonttools: 4.39.3
    funcparserlib: 1.0.1
    grako: 3.16.5
    h5py: 3.8.0
    html2text: 2020.1.16
    idna: 3.4
    ihm: 0.35
    imagecodecs: 2022.9.26
    imagesize: 1.4.1
    importlib-metadata: 6.6.0
    ipykernel: 6.21.1
    ipython: 8.10.0
    ipython-genutils: 0.2.0
    ipywidgets: 8.0.6
    jedi: 0.18.2
    Jinja2: 3.1.2
    jupyter-client: 8.0.2
    jupyter-core: 5.3.0
    jupyterlab-widgets: 3.0.7
    kiwisolver: 1.4.4
    line-profiler: 4.0.2
    lxml: 4.9.2
    lz4: 4.3.2
    MarkupSafe: 2.1.2
    matplotlib: 3.6.3
    matplotlib-inline: 0.1.6
    msgpack: 1.0.4
    nest-asyncio: 1.5.6
    netCDF4: 1.6.2
    networkx: 2.8.8
    nibabel: 5.0.1
    nptyping: 2.5.0
    numexpr: 2.8.4
    numpy: 1.23.5
    openvr: 1.23.701
    packaging: 23.1
    ParmEd: 3.4.3
    parso: 0.8.3
    pep517: 0.13.0
    pickleshare: 0.7.5
    Pillow: 9.3.0
    pip: 23.0
    pkginfo: 1.9.6
    platformdirs: 3.5.0
    prompt-toolkit: 3.0.38
    psutil: 5.9.4
    pure-eval: 0.2.2
    pycollada: 0.7.2
    pydicom: 2.3.0
    Pygments: 2.14.0
    pynrrd: 1.0.0
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 3.0.9
    pyproject-hooks: 1.0.0
    PyQt6-commercial: 6.4.2
    PyQt6-Qt6: 6.4.3
    PyQt6-sip: 13.4.1
    PyQt6-WebEngine-commercial: 6.4.0
    PyQt6-WebEngine-Qt6: 6.4.3
    python-dateutil: 2.8.2
    pytz: 2023.3
    pywin32: 305
    pyzmq: 25.0.2
    qtconsole: 5.4.0
    QtPy: 2.3.1
    RandomWords: 0.4.0
    requests: 2.28.2
    scipy: 1.9.3
    setuptools: 67.4.0
    sfftk-rw: 0.7.3
    six: 1.16.0
    snowballstemmer: 2.2.0
    sortedcontainers: 2.4.0
    soupsieve: 2.4.1
    sphinx: 6.1.3
    sphinx-autodoc-typehints: 1.22
    sphinxcontrib-applehelp: 1.0.4
    sphinxcontrib-blockdiag: 3.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.1
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    stack-data: 0.6.2
    tables: 3.7.0
    tcia-utils: 1.2.0
    tifffile: 2022.10.10
    tinyarray: 1.2.4
    tomli: 2.0.1
    tornado: 6.3.1
    traitlets: 5.9.0
    typing-extensions: 4.5.0
    tzdata: 2023.3
    urllib3: 1.26.15
    wcwidth: 0.2.6
    webcolors: 1.12
    wheel: 0.38.4
    wheel-filename: 1.4.1
    widgetsnbextension: 4.0.7
    WMI: 1.5.1
    zipp: 3.15.0

Change History (5)

comment:1 by Tom Goddard, 2 years ago

Cc: Tom Goddard added
Component: UnassignedInput/Output
Owner: set to pett
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionReading PDB file: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 94: invalid continuation byte

PDB file was not valid UTF-8 unicode.

My guess would be it has some special characters embedded in another codec and we are reading as if it is unicode.

Maybe we should catch this error and say the file has special characters that prevent it from being read. Or could try using another codec to read it.

What encoding does the PDB file specification require? ASCII? I recall citations sometimes have accented characters.

comment:2 by pett, 2 years ago

Resolution: can't reproduce
Status: assignedclosed

Yeah, ChimeraX already tries UTF-8, -16, and -32 automatically, so this must be some kind of Windows codec. If I had the file as an example, I might add some Windows codecs as well. Yes, PDB standard-conformant files are strictly ASCII, but good luck having your program used by anyone if it only accepts standard-conformant files! :-)

comment:3 by Tom Goddard, 2 years ago

I hear ya. Maybe it is worth catching UnicodeDecodeError and giving an easier to grasp error message without the traceback like "Your PDB file X.pdb has an invalid character at line X, column Y and cannot be read." On the other hand if this is rarely reported then it may not be worth the small effort to implement a better message.

comment:4 by pett, 2 years ago

Leaving it as an error gives some chance of getting actual feedback and an example file...

comment:5 by Tom Goddard, 2 years ago

Ok. It can be an error and still have an understandable error message. But may not be worth the effort.

Note: See TracTickets for help on using tickets.