Opened 8 years ago
Closed 8 years ago
#881 closed defect (fixed)
mmCIF writer writing uninitialised values into _atom_site.label_alt_ID
| Reported by: | Tristan Croll | Owned by: | Eric Pettersen |
|---|---|---|---|
| Priority: | blocker | Milestone: | |
| Component: | Core | Version: | |
| Keywords: | Cc: | ||
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description (last modified by )
open 2r4O
save 2r4O_test.cif #1
open 2r4O_test.cif
Traceback (most recent call last): File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/cmd_line/tool.py", line 188, in execute cmd.run(cmd_text) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/commands/cli.py", line 2499, in run result = ci.function(session, **kw_args) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/commands/open.py", line 104, in open models = handle_unknown_kw(session.models.open, paths, format=format, name=name, **kw) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/commands/open.py", line 57, in handle_unknown_kw return f(*args, **kw) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/models.py", line 439, in open session, filenames, format=format, name=name, **kw) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/io.py", line 439, in open_multiple_data models, status = open_data(session, fspec, format=format, name=name, **kw) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/io.py", line 399, in open_data models, status = open_func(*args, **kw) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/atomic/mmcif.py", line 44, in open_mmcif pointers = _mmcif.parse_mmCIF_file(path, _additional_categories, session.logger, coordsets, atomic) _mmcif.error: not enough data values near line 25 _mmcif.error: not enough data values near line 25 File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/atomic/mmcif.py", line 44, in open_mmcif pointers = _mmcif.parse_mmCIF_file(path, _additional_categories, session.logger, coordsets, atomic)
(in Bash shell)
head -30 2r4O_test.cif
data_2r4O.pdb # loop_ _atom_site.id _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_alt_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_entity_id _atom_site.label_seq_id _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv _atom_site.pdbx_PDB_model_num 1 N N ° ALA A 1 1 -30.445 -3.477 17.238 1.00 37.77 1 2 C CA ALA A 1 1 -29.600 -4.080 16.174 1.00 37.77 1 3 C C ALA A 1 1 -29.118 -5.476 16.573 1.00 37.77 1 ALA A 1 1 -28.345 -5.621 17.519 1.00 37.77 1 5 C CB ALA A 1 1 -28.400 -3.169 15.894 1.00 20.66 1 6 H H1 ALA A 1 1 -30.519 -4.119 18.014 1.00 0.00 1 7 H H2 ALA A 1 1 -31.366 -3.287 16.869 1.00 0.00 1 8 H H3 ALA A 1 1 -30.025 -2.615 17.553 1.00 0.00 1 9 H HA ° ALA A 1 1 -30.194 -4.164 15.264 1.00 0.00 1 10 H HB1 ALA A 1 1 -27.781 -3.612 15.114 1.00 0.00 1 11 H HB2 ALA A 1 1 -27.811 -3.055 16.804 1.00 0.00 1 ALA A 1 1 -28.754 -2.192 15.566 1.00 0.00 1 13 N N GLY A 1 2 -29.573 -6.500 15.857 1.00 0.47 1
... so the label_alt_id column is filled with garbage, and the occasional line seems to be missing the first four columns entirely. Text editors refuse to open the file at all (gedit hangs, geany decides it's not a text file).
Attachments (1)
Change History (5)
by , 8 years ago
| Attachment: | 2r4O_test.cif added |
|---|
comment:1 by , 8 years ago
comment:2 by , 8 years ago
| Component: | Input/Output → Core |
|---|---|
| Owner: | changed from to |
OK, not an IO problem. Atoms.alt_locs is bugged out:
m.atoms.alt_locs Out[2]: bytearray(b'\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97')
comment:3 by , 8 years ago
Digging into this a bit more, it would appear that your convention of numpy.object = string in molc.py doesn't work out for char arrays - Numpy seems to lose track of the character encoding so the bytearray conversion gets screwed up. It needs to be explicitly cast somewhere. If you remove the astype=bytearray argument at line 551 of molarray.py then Atoms.alt_locs returns a usable Numpy array of strings, but mmCIF output falls over because it's expecting a bytearray.
Replacing line 90 of mmcif_write.py with:
aloc = [a if a!= ' ' else '.' for a in atoms.alt_locs]
and doing away with the call to alt_loc_text at line 101 seems to work (I can save and successfully reload a mmCIF file, and the results look sensible). If you really want a bytearray, it looks like the Numpy array has to be explicitly cast first - e.g.:
aloc = bytearray(aloc.astype('|S1'))
comment:4 by , 8 years ago
| Description: | modified (diff) |
|---|---|
| Resolution: | → fixed |
| Status: | assigned → closed |
Alt locs changed from bytes to strings when I implemented the add-hydrogen code, and Atoms.alt_locs didn't get updated correctly. Thanks for the diagnosis and fixes. Have committed them.
Not sure why this got created twice, sorry. Trac decided I was logged out, and duplicated it when I logged in.