Opened 8 years ago

Closed 8 years ago

#881 closed defect (fixed)

mmCIF writer writing uninitialised values into _atom_site.label_alt_ID

Reported by: Tristan Croll Owned by: Eric Pettersen
Priority: blocker Milestone:
Component: Core Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description (last modified by Eric Pettersen)

open 2r4O
save 2r4O_test.cif #1
open 2r4O_test.cif

Traceback (most recent call last):
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/cmd_line/tool.py", line 188, in execute
cmd.run(cmd_text)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/commands/cli.py", line 2499, in run
result = ci.function(session, **kw_args)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/commands/open.py", line 104, in open
models = handle_unknown_kw(session.models.open, paths, format=format, name=name, **kw)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/commands/open.py", line 57, in handle_unknown_kw
return f(*args, **kw)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/models.py", line 439, in open
session, filenames, format=format, name=name, **kw)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/io.py", line 439, in open_multiple_data
models, status = open_data(session, fspec, format=format, name=name, **kw)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/io.py", line 399, in open_data
models, status = open_func(*args, **kw)
File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/atomic/mmcif.py", line 44, in open_mmcif
pointers = _mmcif.parse_mmCIF_file(path, _additional_categories, session.logger, coordsets, atomic)
_mmcif.error: not enough data values near line 25

_mmcif.error: not enough data values near line 25

File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/atomic/mmcif.py", line 44, in open_mmcif
pointers = _mmcif.parse_mmCIF_file(path, _additional_categories, session.logger, coordsets, atomic)

(in Bash shell)
head -30 2r4O_test.cif

data_2r4O.pdb
#
loop_
_atom_site.id 
_atom_site.type_symbol 
_atom_site.label_atom_id 
_atom_site.label_alt_id 
_atom_site.label_comp_id 
_atom_site.label_asym_id 
_atom_site.label_entity_id 
_atom_site.label_seq_id 
_atom_site.Cartn_x 
_atom_site.Cartn_y 
_atom_site.Cartn_z 
_atom_site.occupancy 
_atom_site.B_iso_or_equiv 
_atom_site.pdbx_PDB_model_num
1 N N ° ALA A 1 1 -30.445 -3.477 17.238 1.00 37.77 1
2 C CA Š ALA A 1 1 -29.600 -4.080 16.174 1.00 37.77 1
3 C C  ALA A 1 1 -29.118 -5.476 16.573 1.00 37.77 1
 ALA A 1 1 -28.345 -5.621 17.519 1.00 37.77 1
5 C CB — ALA A 1 1 -28.400 -3.169 15.894 1.00 20.66 1
6 H H1 ALA A 1 1 -30.519 -4.119 18.014 1.00 0.00 1
7 H H2  ALA A 1 1 -31.366 -3.287 16.869 1.00 0.00 1
8 H H3  ALA A 1 1 -30.025 -2.615 17.553 1.00 0.00 1
9 H HA ° ALA A 1 1 -30.194 -4.164 15.264 1.00 0.00 1
10 H HB1 Š ALA A 1 1 -27.781 -3.612 15.114 1.00 0.00 1
11 H HB2  ALA A 1 1 -27.811 -3.055 16.804 1.00 0.00 1
 ALA A 1 1 -28.754 -2.192 15.566 1.00 0.00 1
13 N N — GLY A 1 2 -29.573 -6.500 15.857 1.00 0.47 1

... so the label_alt_id column is filled with garbage, and the occasional line seems to be missing the first four columns entirely. Text editors refuse to open the file at all (gedit hangs, geany decides it's not a text file).

Attachments (1)

2r4O_test.cif (375.1 KB ) - added by Tristan Croll 8 years ago.

Download all attachments as: .zip

Change History (5)

by Tristan Croll, 8 years ago

Attachment: 2r4O_test.cif added

comment:1 by Tristan Croll, 8 years ago

Not sure why this got created twice, sorry. Trac decided I was logged out, and duplicated it when I logged in.

comment:2 by Tristan Croll, 8 years ago

Component: Input/OutputCore
Owner: changed from Greg Couch to Eric Pettersen

OK, not an IO problem. Atoms.alt_locs is bugged out:

m.atoms.alt_locs
Out[2]: bytearray(b'\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97\x7f\x00\x00\xb0\x8a\x10\r\x97')

comment:3 by Tristan Croll, 8 years ago

Digging into this a bit more, it would appear that your convention of numpy.object = string in molc.py doesn't work out for char arrays - Numpy seems to lose track of the character encoding so the bytearray conversion gets screwed up. It needs to be explicitly cast somewhere. If you remove the astype=bytearray argument at line 551 of molarray.py then Atoms.alt_locs returns a usable Numpy array of strings, but mmCIF output falls over because it's expecting a bytearray.

Replacing line 90 of mmcif_write.py with:

aloc = [a if a!= ' ' else '.' for a in atoms.alt_locs]

and doing away with the call to alt_loc_text at line 101 seems to work (I can save and successfully reload a mmCIF file, and the results look sensible). If you really want a bytearray, it looks like the Numpy array has to be explicitly cast first - e.g.:

aloc = bytearray(aloc.astype('|S1'))

comment:4 by Eric Pettersen, 8 years ago

Description: modified (diff)
Resolution: fixed
Status: assignedclosed

Alt locs changed from bytes to strings when I implemented the add-hydrogen code, and Atoms.alt_locs didn't get updated correctly. Thanks for the diagnosis and fixes. Have committed them.

Note: See TracTickets for help on using tickets.