Opened 8 years ago

Closed 8 years ago

#827 closed defect (fixed)

Structure file I/O issues

Reported by: Tristan Croll Owned by: Eric Pettersen
Priority: major Milestone:
Component: Input/Output Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Simple-but-important one first: mmcif_write.py currently crashes due to a reference to atoms.occupancy rather than atoms.occupancies at line 91.

Another issue on loading: if a residue has all its atoms but the geometry is "aggressively" bad, ChimeraX will sometimes miss a bond. This is cropping up for me when I use Coot's "Add hydrogens using Refmac" command. It's the only currently-available hydrogen-addition tool I know of that (a) the majority of existing structural biologists will have, and (b) provides all the hydrogens necessary for OpenMM to work (including N-terminal hydrogens). Unfortunately, apart from those advantages it's *really* bad. It seems to work OK at adding hydrogens to a "perfect" high-resolution structure, but at lower resolution it doesn't seem to even attempt to provide decent geometry for some hydrogens, and a handful will end up 2-3 Angstroms away from the atom they should be bonded to! I guess they rely on subsequent refinement to fix them - and OpenMM's minimiser can pull them into line in a snap. But, in these cases I inevitably run into a handful of residues where ChimeraX has failed to form a heavy atom-hydrogen bond on loading, leading to the simulation failing to launch.

I presume ChimeraX has some form of geometric test to decide whether or not to form a given bond when loading a structure. I don't think this is advisable, to be honest - or at least, there should be a "permissive" bond that forms all bonds according to the residue dictionary definitions for each residue, no matter what the geometry. Bad geometry happens for all sorts of reasons, after all.

Attachments (2)

2b9r-coot-0.pdb (721.6 KB ) - added by Tristan Croll 8 years ago.
pdb with bad hydrogens
2ajq.tar.gz (544.3 KB ) - added by tic20@… 8 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (11)

comment:1 by Tom Goddard, 8 years ago

Owner: changed from Tom Goddard to Eric Pettersen

Issues for Eric.

comment:2 by Eric Pettersen, 8 years ago

Status: assignedaccepted

Fixed the mmcif_write problem,

Are these problematic-hydrogen-connectivity files mmCIF files or PDB files? Do the hydrogens have their standard PDB names?

--Eric

in reply to:  3 ; comment:3 by tic20@…, 8 years ago

Yes, they're PDB files with standard nomenclature and all the atoms grouped by residue (hydrogens at the end of the residue). If I adjust the coordinates of the offending hydrogens and re-open, then the bonds appear. I'll find/make an example case tomorrow.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

by Tristan Croll, 8 years ago

Attachment: 2b9r-coot-0.pdb added

pdb with bad hydrogens

comment:4 by Tristan Croll, 8 years ago

OK, I've uploaded an example. Plenty of examples of terrible geometry throughout, but in this particular case ChimeraX still originally assigned the bonds correctly. I was able to reproduce the issue by moving the H of the first residue (A 7) by 3 Angstroms. There does seem to be at least some nomenclature aspect here: this is an N-terminal residue with three hydrogens (H, H2, H3) attached to the N. If I rename the H to H1, then it correctly bonds even with the big move.

I've previously seen failure-to-bond issues with sidechain atoms on other residues (methionines for sure, but not sure about others) with structures coming from Coot's hydrogen addition. Can't seem to replicate it when I want to, though. Will pass along any case I find.

comment:5 by Eric Pettersen, 8 years ago

Not sure what to say here. The PDB-standard names for N-terminal hydrogens are H1, H2, H3, not H, H2, H3. Look at any deposited NMR ensemble (e.g. 1mtx, 1jwe). So you've got a hydrogen with a non-standard name several angstroms away from the nearest heavy atom. The software isn't psychic. 'H' isn't the only possible variant here either -- I've seen HN, HT1, and others.

Another unmentioned problem is that that residue isn't even the true N-terminus -- there is missing structure, so it is unclear if it is even right to have H2/H3 atoms on that nitrogen.

I *am* pretty close to having "simple" hydrogen addition (non-H-bond-guided) in ChimeraX. Another week or two most likely. I don't know if that would resolve this issue for you or if you would still need these problematic external PDB files to work. If the latter, then you might simply nuke their hydrogens and let ChimeraX add them back once it can.

in reply to:  7 comment:6 by tic20@…, 8 years ago

You're preaching to the converted here. As I said, given that Coot and 
Refmac are two of the heaviest used tools in all of experimental 
structural biology, I was shocked to see how badly they handle this 
task. I guess the reason is that historically hydrogens have been 
considered "uninteresting" and even now are usually left out... Looking 
forward to having your tool available!

On 2017-09-13 19:13, ChimeraX wrote:

in reply to:  8 ; comment:7 by tic20@…, 8 years ago

There does appear to be a real bug here after all. The attached file has 
two instances of the residue 2DT (residue 822 on chains X and P). Both 
at the chain terminus, both with heavy atom and hydrogen names 
consistent with the PDB, and all atoms present in each residue. Yet the 
one on chain P doesn't bond H3'1 and H5'', whereas the one on chain X 
has all bonds present and correct.

On 2017-09-13 19:22, ChimeraX wrote:

2ajq.tar.gz

by tic20@…, 8 years ago

Attachment: 2ajq.tar.gz added

Added by email2trac

comment:8 by Eric Pettersen, 8 years ago

Well, 2DT isn't a standard nucleic acid, so Chimera has no builtin connectivity template for it. If this were an mmCIF file, there is some chance that those protons would get connected because the mmCIF code will fetch the connectivity template for 2DT from the RCSB, whereas the PDB code will not. The PDB code looks for atoms within "bonding distance" of each other. For a C-H bond that would be 0.91 angstroms plus a 0.4 angstrom "slop" for a total of 1.31 angstroms. Those protons, for whatever reason, are 1.346 and 1.378 angstroms away from the corresponding carbon.

I'm hoping the new ChimeraX addh capability will reduce the need for me to fight this fight.

--Eric

comment:9 by Eric Pettersen, 8 years ago

Resolution: fixed
Status: acceptedclosed

The new addh code protonates the 2DT residues of 2ajq correctly.

Note: See TracTickets for help on using tickets.