#5601 closed enhancement (fixed)
Log templates from mmCIF files for theoretical models
Reported by: | Owned by: | pett | |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Structure Comparison | Version: | |
Keywords: | Cc: | Tom Goddard | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Ben is interested in being able to show template sequence alignments, template structures, and scores for theoretical models from ModBase and possibly other sources such as the Model Archive database and AlphaFold models. Maybe this mmCIF table info with links could be logged in a table when the file is opened.
Begin forwarded message:
From: Ben Webb via ChimeraX-users <chimerax-users@…>
Subject: [chimerax-users] Any interest in reading ModelArchive metadata from mmCIF files?
Date: November 12, 2021 at 10:20:19 AM PST
To: ChimeraX Users Help <chimerax-users@…>
Reply-To: Ben Webb
Do you have any plans to extend ChimeraX's mmCIF reader to parse and display metadata on theoretical models, such as quality scores or the alignments to template structures?
The folks at PDB have recently done a lot of work to standardize this metadata in the MA mmCIF dictionary:
https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/
The dictionary has already been adopted by ModelArchive (e.g. AlphaFold2 models) and by ModBase (Modeller models) and I believe that other repositories such as SwissModel are also moving in that direction. See e.g. mmCIF downloads at
https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250
https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12321
(My ulterior motive: we've previously built Chimera web data files to download a ModBase model and the accompanying alignment, and display them in Chimera; now that this data is embedded in the mmCIF file, in principle ChimeraX could do this itself in a less clunky and not ModBase-specific fashion.)
Ben
Change History (12)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Status: | assigned → accepted |
---|
comment:3 by , 4 years ago
I have a question. For any particular model structure, is there only one template? Looking at the mmCIF dictionary, it seems like it has to be, but I want to make sure I'm not misunderstanding something.
comment:4 by , 4 years ago
Status: | accepted → feedback |
---|
To get the ball rolling, tomorrow's build will show the template-target alignment if you open a Modbase mmCIF file. So in addition to my previous question, I'd like to know the immediate next steps you'd want to see to get to the information you are interested in, plus desired improvements to what I've already done. Also, should the alignment immediately show up (current behavior), or should there be a link in the log to show the alignment?
follow-up: 5 comment:5 by , 4 years ago
On 1/19/22 4:45 PM, ChimeraX wrote:> I have a question. For any particular model structure, is there only one Disclaimer: I didn't develop the MA mmCIF dictionary (this was a collaboration between PDB and the MA folks) although I did review it. You can certainly have multiple templates for a given alignment (although all ModBase models are single-chain single-template models). This would result in multiple lines in the _ma_alignment_details loop with the same alignment_id but different template_segment_id/target_asym_id pairs. The only issue is that the _ma_alignment table doesn't provide enough information to uniquely identify the template in this case (see https://github.com/ihmwg/MA-dictionary/issues/4) BTW, my understanding is that the SwissModel repository folks plan to also use this dictionary for their mmCIF models in the near future. Ben
comment:6 by , 4 years ago
It was exactly the lack of a template_segment_id in the _ma_alignment data that misled me to think that there could only be one template. Without that info, the only name I can put next to the sequence is "template" -- not very informative.
follow-up: 7 comment:7 by , 4 years ago
Looks good to me just the way it is - thanks! Next steps? Ideally I'd like to see any quality scores for the models (ma_qa_metric* tables), perhaps in the log or as model attributes. ModBase models only have per-model scores, but AlphaFold models have a bunch of per-residue (and also I think per-residue-pair) scores too. BTW, I shared this ticket with folks from SwissModel and AlphaFold so they may have additional suggestions. Also, if it helps at all with interpretation of the various tables, my own Python code for parsing MA mmCIF is at https://github.com/ihmwg/python-ma. Ben
comment:8 by , 4 years ago
Thanks for the info. I'll work on getting the score information available and will let you know when something happens. Until the _ma_alignment table deficiency gets addressed, I'll do _something_ about files with multiple templates, probably resorting to just the name "template" instead of the more informative name I can use when I know which template is involved.
follow-up: 9 comment:9 by , 4 years ago
On 1/20/22 4:41 PM, ChimeraX wrote: Sounds good. FWIW, the MA folks pointed out that the main advantages of using per-residue scores from the _ma_qa_metric_local table (if available) instead of relying on the b-factors are "a) the type of metric used is described in _ma_qa_metric (i.e. instead of guessing, one can know that this is a "pLDDT" score) b) there can be multiple per-residue scores (e.g. I could imagine to have separate predictions for accuracy and prob. for a residue being disordered)" Ben
comment:10 by , 4 years ago
Status: | feedback → accepted |
---|
Okay, in tomorrow's build if you click on the "more info..." link for the model in the Log, that table will show the global scores.
Local scores is next, but it could be awhile since I will likely work on some other things first.
comment:11 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Okay, Implemented local scores. Each score becomes a residue attribute (saved in sessions). Each score produces a log entry that makes it easy to color by that score, e.g.:
Color ma-bak-cepc-0250.cif by residue attribute pLDDT_score
color byattribute r:pLDDT_score #1 palette red:yellow:green
4185 atoms, 548 residues, atom pLDDT_score range 20.3 to 98.4
The "Color" link runs the color byattribute command, and the "attribute" link goes to the help page that describes what attributes are.
comment:12 by , 3 years ago
If you want additional functionality let me know and I will re-open this ticket.
From: Tom Goddard
Subject: Re: [chimerax-users] Any interest in reading ModelArchive metadata from mmCIF files?
Date: November 12, 2021 at 2:56:49 PM PST
To: Ben Webb
Cc: ChimeraX Users Help <chimerax-users@…>
Hi Ben,
I believe AlphaFold2 finds the 20 best matching structures in the PDB and uses 4 (not sure how they are selected). I've run AlphaFold many times and the log output says what the 20 matches are but does not appear to say which 4 structures it actually used -- pretty unfortunate. The AlphaFold per-residue confidence scores are in an mmCIF table _ma_qa_metric_local:
#
loop_
_ma_qa_metric_local.label_asym_id
_ma_qa_metric_local.label_comp_id
_ma_qa_metric_local.label_seq_id
_ma_qa_metric_local.metric_id
_ma_qa_metric_local.metric_value
_ma_qa_metric_local.model_id
_ma_qa_metric_local.ordinal_id
A MET 1 2 91.95 1 1
A PHE 2 2 96.89 1 1
A GLU 3 2 98.01 1 1
A ALA 4 2 98.08 1 1
A ARG 5 2 97.76 1 1
A LEU 6 2 96.16 1 1
..
Currently ChimeraX colors AlphaFold models by confidence using the same scores taken from the bfactor column of the atom site table.
#
loop_
_ma_template_ref_db_details.template_id
_ma_template_ref_db_details.db_name
_ma_template_ref_db_details.db_accession_code
1 PDB 3nc1
#
loop_
_ma_template_poly.template_id
_ma_template_poly.seq_one_letter_code
_ma_template_poly.seq_one_letter_code_can
1 DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLIDYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEEYPEHRTNFFLLLQAVNSHCFPAFLAIPPAQFKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLIDYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEEYPEHRTNFFLLLQAVNSHCFPAFLAIPPAQFKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN
#
loop_
_ma_alignment.ordinal_id
_ma_alignment.alignment_id
_ma_alignment.target_template_flag
_ma_alignment.sequence
1 1 2 DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLI---------DYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEE---------YPEHRTNFFLLLQAVNSHCFPAFLAIPPAQ---FKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN
2 1 1 DSYVETLDSMIELFKDYKPGSITLENITRLCQTL-GLESFTEELSNELSR--LSTASKIIVIDVDYNKKQDRIQDVKLVLASNFDNFDYFNQRDGEHEKSNILLNSLTKYPDLKAFHNNLKFLYLLDAYSHIESDSTSHNNGSSDKSLDSSNASFNNQGKLDLFKYFTELSHYIRQCFQDNCCDFKVRTNLNDKFGIYILTQGINGKEVPLAKIYLEENKSDSQYRFYEYIYSQETKSWINESAENFSNGISLVMEIVANAKESNYTDLIWFPEDFISPELIIDKVTCSSNSSSSPPIIDLFSNNNYNSRIQLMNDFTTKLINIKKFDISNDNLDLISEILKWV------------QWSRIVLQNVFKLVSTPSSNSNSSELEPDYQAPFSTSTKDKNSSTSNTE
#
loop_
_ma_qa_metric.id
_ma_qa_metric.name
_ma_qa_metric.description
_ma_qa_metric.type
_ma_qa_metric.mode
_ma_qa_metric.other_details
_ma_qa_metric.software_group_id
1 MPQS 'ModPipe Quality Score' other global
'composite score, values >1.1 are considered reliable' 1
2 zDOPE 'Normalized DOPE' zscore global . 2
3 'TSVMod RMSD' 'TSVMod predicted RMSD (MSALL)' distance global . .
4 'TSVMod NO35' 'TSVMod predicted native overlap (MSALL)' other global . .
#
loop_
_ma_qa_metric_global.ordinal_id
_ma_qa_metric_global.model_id
_ma_qa_metric_global.metric_id
_ma_qa_metric_global.metric_value
1 1 1 0.665346
2 1 2 -0.11
3 1 3 14.527
4 1 4 0.036