Opened 10 years ago

Closed 10 years ago

Last modified 7 years ago

#121 closed defect (fixed)

String too long error opening mmCIF file

Reported by: Tom Goddard Owned by: Greg Couch
Priority: major Milestone:
Component: Input/Output Version:
Keywords: Cc: pett@…
Blocked By: Blocking:
Notify when closed: Platform: all
Project: chimera

Description

Opening the mmcif file 1voq_1vor_1vos_1vou_1vov_1vow_1vox_1voy_1voz_1vp0 found on the PDB mmCIF examples web page produces a "String too long" error in Chimera.

http://mmcif.wwpdb.org/docs/large-pdbx-examples/

This is a 100 Mbyte file so I am not going to attach it to the ticket.

Not clear what string is too long, maybe a chain name, or residue name or some other fixed length string.

We should use C++ std::string to avoid string length issues. The use of fixed length strings was to save memory but from what I recall it saved a neglible 2%.

Traceback (most recent call last):
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/gui.py", line 313, in <lambda>
self.Bind(wx.EVT_MENU, lambda evt, ses=session: self.on_open(evt, ses),
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/gui.py", line 205, in on_open
session.models.open(paths)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/models.py", line 256, in open
models, status = io.open_multiple_data(session, filenames, kw)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/io.py", line 611, in open_multiple_data
models, status = open_data(session, fspec,
kw)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/io.py", line 574, in open_data
models, status = open_func(session, stream, name, kw)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/mmcif.py", line 25, in open_mmcif
pointers = _mmcif.parse_mmCIF_file(filename, session.logger)
TypeError: String too long

Change History (13)

comment:1 by Eric Pettersen, 10 years ago

Cc: pett@… added
Owner: changed from Eric Pettersen to Greg Couch
Status: newassigned

Apparently this is an "old style" mmCIF file where the column justification makes it impossible to figure out which columns belong to which table components. The mmCIF parsing code doesn't handle the file right and tries to make a chain ID named "J 1". So I guess it's actually kind of good that the fixed-length string object picks this up as an error, since it would just lead to heartache later.

I'm re-assigning this to Greg in case he wants to do something about it, or complain to the RCSB about it.

--Eric

comment:2 by goddard@…, 10 years ago

It is legal to have a long chain id in mmCIF.  So even though the nice RCSB files don’t have those it seems unreasonable that Chimera will crash in that scenario.

comment:3 by Eric Pettersen, 10 years ago

It doesn't crash for me -- the error appears in the reply log.

comment:4 by goddard@…, 10 years ago

My mistake, it doesn’t crash for me, just spews a Python traceback with the error message “String too long”.  I still thing long chain ids should be handled.  But I admit that is a rare case.  If we aren’t going to handle it at least the error should say “Cannot handle chain id longer than N characters”.


comment:5 by Eric Pettersen, 10 years ago

If we decide that longer chain IDs are needed it is totally trivial to change them over to a longer fixed length or to std::string, since ChainID is a typedef in atomstruct/string_types.h, and CString is API compatible with std::string. Though like I said, allowing arbitrary length chain IDs would just have produced some other later error in this case.

On the other hand, it is totally non-trivial to generate the error message you suggest, since the length checking occurs in CString, and that class has no idea what the semantic meaning is of the string it contains.

--Eric

comment:6 by Tom Goddard, 10 years ago

Summary: String to long error opening mmCIF fileString too long error opening mmCIF file

4gob is an official mmcif file that gives a "String too long" error when I try to open it:

open mmcif:4gob
Traceback (most recent call last):
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/cmd_line/gui.py", line 113, in on_enter
cmd.execute()
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/cli.py", line 1345, in execute
results.append(ci.function(session, kwargs))
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/commands.py", line 51, in open
return session.models.open(filename, id=id, as_=as_)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/models.py", line 255, in open
models, status = io.open_multiple_data(session, filenames,
kw)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/io.py", line 611, in open_multiple_data
models, status = open_data(session, fspec, kw)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/io.py", line 574, in open_data
models, status = open_func(session, stream, name,
kw)
File "/Users/goddard/ucsf/chimera2/Chimera2.app/Contents/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/chimera/core/mmcif.py", line 25, in open_mmcif
pointers = _mmcif.parse_mmCIF_file(filename, session.logger)
TypeError: String too long

comment:7 by Tom Goddard, 10 years ago

I don't see any long chain ids or residue names in mmCIF 4gob and Chimera 1 opens it and I saw nothing unusual.

We need a better error message than "String too long". It should say something like 'Residue name "abcsdfee" too long, maximum 4 characters".

comment:8 by Eric Pettersen, 10 years ago

Upon further reflection, since all uses of CString are through typedefs, it's actually easy to generate such an error message.  I'll try to get that in tomorrow.

--Eric

On Aug 11, 2015, at 6:42 PM, Chimera2 wrote:


comment:9 by Eric Pettersen, 10 years ago

Well, the change was far harder than I anticipated due to a pitched battle with template syntax, but the better error reporting is in there. The 4gob error is due to not using the 3-letter name for non-standard residue CR8, and instead using "HIS, TYR, GLY".

--Eric

comment:10 by Tom Goddard, 10 years ago

Eric found that the offending string is in a chemical component template file, for example in

~/Downloads/Chimera/CCD/CR8.cif

is the line

_chem_comp.mon_nstd_parent_comp_id "HIS, TYR, GLY"

Our mmCIF reader uses that field. Perhaps it should be using instead the "three_letter_code" field:

_chem_comp.three_letter_code CR8

I changed the mmCIF code trying that and it didn't get any "String too long" errors in about 10,000 mmCIF files. Before the fix I saw about 30 mmCIF files that gave the error out of 7000: 2a48, 2a50, 2a52, 2a53, 2a54, 2a56, 3a8s, 3adf, 2ah8, 2aha, 5ahj, 3ai4, 3ai5, 3ako, 4anj, 4arl, 4ar7, 4as8, 2awk, 2awl, 2awm, 2b3p, 2b3q, 4b30, 4b5y, 1b8f, 1b9c, 4baa, 4bab, 4bdu, ...

comment:11 by Greg Couch, 10 years ago

Resolution: fixed
Status: assignedclosed

Agree that three_letter_code is better. Removed commented out code.

comment:12 by Eric Pettersen, 10 years ago

Component: Molecular DatammCIF Reader/Writer

comment:13 by Eric Pettersen, 7 years ago

Component: mmCIF Reader/WriterInput/Output

reducing # of categories, so lumping into I/O category

Note: See TracTickets for help on using tickets.