[chimerax-users] Any interest in reading ModelArchive metadata from mmCIF files?
Tom Goddard
goddard at sonic.net
Fri Nov 12 14:56:49 PST 2021
Hi Ben,
It would be nice to show templates and sequence alignments used for predicted models from AlphaFold and Modeller. We could output an html table in the log that lists the templates with a link to show the sequence alignment and a link to load and align the template if it is from the PDB.
The AlphaFold models in the EBI AlphaFold database don't appear to say what template structures were used, for instance, I looked at AF-P12004-F1-model_v1.cif
https://alphafold.ebi.ac.uk/entry/P12004 <https://alphafold.ebi.ac.uk/entry/P12004>
I believe AlphaFold2 finds the 20 best matching structures in the PDB and uses 4 (not sure how they are selected). I've run AlphaFold many times and the log output says what the 20 matches are but does not appear to say which 4 structures it actually used -- pretty unfortunate. The AlphaFold per-residue confidence scores are in an mmCIF table _ma_qa_metric_local:
#
loop_
_ma_qa_metric_local.label_asym_id
_ma_qa_metric_local.label_comp_id
_ma_qa_metric_local.label_seq_id
_ma_qa_metric_local.metric_id
_ma_qa_metric_local.metric_value
_ma_qa_metric_local.model_id
_ma_qa_metric_local.ordinal_id
A MET 1 2 91.95 1 1
A PHE 2 2 96.89 1 1
A GLU 3 2 98.01 1 1
A ALA 4 2 98.08 1 1
A ARG 5 2 97.76 1 1
A LEU 6 2 96.16 1 1
..
Currently ChimeraX colors AlphaFold models by confidence using the same scores taken from the bfactor column of the atom site table.
The Model Archive example you gave as an example (https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250 <https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250>) has no templates sequences or alignments in the mmCIF file, and no per-residue scores, but does have some global scores.
Your ModBase example (https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12321 <https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12321>) does have a template sequence and alignment and global scores but no per-residue scores
#
loop_
_ma_template_ref_db_details.template_id
_ma_template_ref_db_details.db_name
_ma_template_ref_db_details.db_accession_code
1 PDB 3nc1
#
loop_
_ma_template_poly.template_id
_ma_template_poly.seq_one_letter_code
_ma_template_poly.seq_one_letter_code_can
1 DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLIDYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEEYPEHRTNFFLLLQAVNSHCFPAFLAIPPAQFKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLIDYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEEYPEHRTNFFLLLQAVNSHCFPAFLAIPPAQFKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN
#
loop_
_ma_alignment.ordinal_id
_ma_alignment.alignment_id
_ma_alignment.target_template_flag
_ma_alignment.sequence
1 1 2 DMACDTFIKIAQKCRRHFVQVQVGEVMPFIDEILNNINTIICDLQPQQVHTFYEAVGYMIGAQTDQTVQEHLIEKYMLLPNQVWDSIIQQATKNVDILKDPETVKQLGSILKTNVRACKAVGHPFVIQLGRIYLDMLNVYKCLSENISAAIQANGEMVTKQPLIRSMRTVKRETLKLISGWVSRSNDPQMVAENFVPPLLDAVLI---------DYQRNVPAAREPEVLSTMAIIVNKLGGHITAEIPQIFDAVFECTLNMINKDFEE---------YPEHRTNFFLLLQAVNSHCFPAFLAIPPAQ---FKLVLDSIIWAFKHTMRNVADTGLQILFTLLQNVAQEEAAAQSFYQTYFCDILQHIFSVVTDTSHTAGLTMHASILAYMFNLVEEGKISTPLNPN
2 1 1 DSYVETLDSMIELFKDYKPGSITLENITRLCQTL-GLESFTEELSNELSR--LSTASKIIVIDVDYNKKQDRIQDVKLVLASNFDNFDYFNQRDGEHEKSNILLNSLTKYPDLKAFHNNLKFLYLLDAYSHIESDSTSHNNGSSDKSLDSSNASFNNQGKLDLFKYFTELSHYIRQCFQDNCCDFKVRTNLNDKFGIYILTQGINGKEVPLAKIYLEENKSDSQYRFYEYIYSQETKSWINESAENFSNGISLVMEIVANAKESNYTDLIWFPEDFISPELIIDKVTCSSNSSSSPPIIDLFSNNNYNSRIQLMNDFTTKLINIKKFDISNDNLDLISEILKWV------------QWSRIVLQNVFKLVSTPSSNSNSSELEPDYQAPFSTSTKDKNSSTSNTE
#
loop_
_ma_qa_metric.id
_ma_qa_metric.name
_ma_qa_metric.description
_ma_qa_metric.type
_ma_qa_metric.mode
_ma_qa_metric.other_details
_ma_qa_metric.software_group_id
1 MPQS 'ModPipe Quality Score' other global
'composite score, values >1.1 are considered reliable' 1
2 zDOPE 'Normalized DOPE' zscore global . 2
3 'TSVMod RMSD' 'TSVMod predicted RMSD (MSALL)' distance global . .
4 'TSVMod NO35' 'TSVMod predicted native overlap (MSALL)' other global . .
#
loop_
_ma_qa_metric_global.ordinal_id
_ma_qa_metric_global.model_id
_ma_qa_metric_global.metric_id
_ma_qa_metric_global.metric_value
1 1 1 0.665346
2 1 2 -0.11
3 1 3 14.527
4 1 4 0.036
So it looks like only ModBase would currently benefit from ChimeraX reading template sequences and alignments. I do not think it would be too hard to implement it. I've made a ChimeraX feature request for that
https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5601
Tom
> On Nov 12, 2021, at 10:20 AM, Ben Webb via ChimeraX-users <chimerax-users at cgl.ucsf.edu> wrote:
>
> Do you have any plans to extend ChimeraX's mmCIF reader to parse and display metadata on theoretical models, such as quality scores or the alignments to template structures?
>
> The folks at PDB have recently done a lot of work to standardize this metadata in the MA mmCIF dictionary:
> https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/
>
> The dictionary has already been adopted by ModelArchive (e.g. AlphaFold2 models) and by ModBase (Modeller models) and I believe that other repositories such as SwissModel are also moving in that direction. See e.g. mmCIF downloads at
> https://www.modelarchive.org/doi/10.5452/ma-bak-cepc-0250
> https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?databaseID=Q12321
>
> (My ulterior motive: we've previously built Chimera web data files to download a ModBase model and the accompanying alignment, and display them in Chimera; now that this data is embedded in the mmCIF file, in principle ChimeraX could do this itself in a less clunky and not ModBase-specific fashion.)
>
> Ben
> --
> ben at salilab.org https://salilab.org/~ben/
> "It is a capital mistake to theorize before one has data."
> - Sir Arthur Conan Doyle
> _______________________________________________
> ChimeraX-users mailing list
> ChimeraX-users at cgl.ucsf.edu
> Manage subscription:
> https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimerax-users/attachments/20211112/ceb469d8/attachment.html>
More information about the ChimeraX-users
mailing list