Opened 4 years ago

Last modified 4 years ago

#4993 assigned enhancement

Method to identify structure as AlphaFold model

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Structure Analysis Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

OK, one more thing. :)

It would be nice to have a supported programmatic way to identify if a given AtomicStructure hails from the AlphaFold database, and to get its UniProt accession if so (for example, to make it easy for my code to fetch the matching PAE matrix). Looks like I can do it by parsing the metadata:

m.metadata['citation data'][-2] gives the title of the AlphaFold manuscript, and
m.metadata['entry data'][0] gives 'AF-{UniProt ID}-F1'

... but that's the sort of thing that could be unpredictably broken.

Change History (6)

comment:1 by Tom Goddard, 4 years ago

It seems like you know how to get the AlphaFold info. It won't become any more reliable if I put it in ChimeraX, though I don't object to putting it in the alphafold module if you provide the code. I don't think others are likely to use it. I suspect in the future we will have an AlphaFold computation service that takes a sequence. That file may have similar citation metadata to the EBI database files and yet the PAE matrix from EBI probably won't match the computed non-EBI structure. Although ChimeraX could try to mark files it it fetches from EBI AlphaFold the user can of course download those files themselves and those would not have our special EBI fetch flag, so your code would not work with those if you relied on such a flag. So I am not sure what you want to do. In general it is a problem trying to associate two files (structure and error matrix) when they are not downloaded or fetched together.

in reply to:  2 ; comment:2 by Tristan Croll, 4 years ago

I was just thinking of specifically marking structures fetched via the AlphaFold plugin, so my code will know it can get the associated PAE matrix. I agree that this can’t be the only code path, and I’ll have to provide a separate mechanism for user-provided data.

comment:3 by Tom Goddard, 4 years ago

The AlphaFold fetch code could add something to the structure metadata (that way it would get saved in sessions). But as you point out the meta data already has in the mmCIF file

_entry.id AF-A2ABV5-F1

which can be obtained as

structure.metadata['entry data'][0]

I think that is a reasonable indicator (prefix "AF-" and suffix "-F1") that it is from the AlphaFold database that will work even if the user opens the AlphaFold file they downloaded instead of using ChimeraX fetch. So I suggest going with that.

There is also a _database_2 table in the mmCIF file, for example,

_database_2.database_code AF-A2ABV5-F1
_database_2.database_id AF

that is probably a better source although I don't think our mmCIF reader currently reads that table. But we could add that table since PDB just announced they are implementing the extended PDB id codes (longer than 4 characters) and that _database_2 table will list the longer id codes.

Last edited 4 years ago by Tom Goddard (previous) (diff)

comment:4 by pett, 4 years ago

Component: UnassignedStructure Analysis

comment:5 by Tom Goddard, 4 years ago

Cc: Tom Goddard added

Testing if adding cc works.

comment:6 by Tom Goddard, 4 years ago

Cc: Tom Goddard removed
Note: See TracTickets for help on using tickets.