Opened 6 years ago
Closed 6 years ago
#1972 closed enhancement (fixed)
Structure factor .cif files: open command extension/disambiguation
Reported by: | Tristan Croll | Owned by: | Tristan Croll |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | Elaine Meng, greg, pett, Tom Goddard, Conrad Huang | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
I've implemented a structure factor .cif reader for the Clipper plugin, using Greg's CIF code to parse the file then converting to Clipper's data structures. The main use of this will be to allow users to optionally fetch (and automatically generate maps from) the experimental x-ray data at the same time as they pull a structure from the PDB. While next to nobody uses .cif format for structure factors in day-to-day work (all the experimental pipelines work primarily with the binary .mtz format), it nevertheless needs to be supported since it is the only format actually archived on the wwPDB. I'll implement a save mtz
command so users can switch to that format if they desire. Two questions:
- I already have an 'open' command for .mtz files:
open sf_file.mtz structureModel <Modelspec> [overSampling <float>]
, but this isn't going to work for .cif without some code to disambiguate from mmCIF. Any suggestions?
- For pulling directly from the PDB, the sensible thing to do would be to overload the
open xxxx fromDatabase pdb
. What syntax would you recommend for this, and could you give me some pointers to get there?
Attachments (8)
Change History (62)
follow-up: 2 comment:2 by , 6 years ago
Structure factor files are a little different from the EDS etc. in that they *must* be associated with an atomic model to be useful, since they contain only the measured amplitudes of each reflection. Without the phases (usually calculated from an FFT of a simulated map generated from the model) they're essentially meaningless. That's why my first thought was to make loading them an optional extra argument when pulling the PDB file itself from the PDB, but I suppose a separate command like open 1a0m from sfactor structureModel #1 ... would do the job just as well. On 2019-05-28 20:01, ChimeraX wrote:
follow-up: 3 comment:3 by , 6 years ago
Yes, it could be convenient to fetch the PDB coordinates and structure factors together. I can't think of an example where we do that, but you could use "open 1a0m structurefactors true" adding some extra option that the PDB fetcher would use to tell it to get the extra data.
comment:4 by , 6 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Tristan, you can't really assign multiple owners to the same ticket. I have become aware of several tickets where I am "co-owner" only because by default I am cc'ed on all mail to chimerax-bugs and other people on the tickets commented on them by mailing chimerax-bugs. You need to assign one owner and cc other people. If you don't really get the owner right, believe me it will get re-assigned. :-)
--Eric
comment:5 by , 6 years ago
Cc: | added |
---|---|
Owner: | changed from | to
comment:6 by , 6 years ago
Sorry about the "multiple owners" mistake - will remember that in future!
The open sf.cif format sfcif
approach doesn't appear to be working correctly (or I'm doing something wrong). If I define the data format and a non-default open command in bundle_info.xml:
<ChimeraXClassifier>ChimeraX :: DataFormat :: Structure factor CIF :: sfcif :: Reflection data :: .cif :: :: :: :: :: Crystallographic reflection file :: </ChimeraXClassifier> <ChimeraXClassifier>ChimeraX :: Open :: Structure factor CIF :: sfcif :: false :: structure_model:Structure, over_sampling:Float</ChimeraXClassifier>
(and double-checking, the mmcif bundle_info.xml
*does* set itself as the default opener for .cif files), then
chimerax-daily 6g2g.cif
... still attempts (and fails) to open the model using my code, unless I force it to use the mmcif reader with:
chimerax-daily --cmd "open 6g2g.cif format mmcif"
The method in my BundleAPI
is below (I of course need to rename the open_mtz()
function to something more general now).
@staticmethod def open_file(session, path, format_name, structure_model=None, over_sampling=1.5): if format_name in ('mtz', 'sfcif'): if structure_model is None: from chimerax.core.errors import UserError raise UserError('Must specify a structure model to associate with crystallographic data') from .cmd import open_mtz return open_mtz(session, path, structure_model=structure_model, over_sampling=over_sampling)
comment:7 by , 6 years ago
Hmm... turns out that the is_default
portion of the Open and Save classifiers currently does nothing. It's parsed in toolshed/installed.py
, but the result is never actually used. Will raise a ticket for that.
comment:8 by , 6 years ago
I guess what I can do for now is define a clipper open ...
command to do the job until the is_default
option works.
comment:9 by , 6 years ago
I've made a little progress on this, to the point where I can successfully fetch a structure factor file from the PDB and create a Clipper data structure - but I'm still not clear on how best to interface it to the command-line. If I define a wrapper function as:
def fetch_wrapper(fetch_func): def _fetch(session, pdb_id, fetch_source='rcsb', ignore_cache=False, structure_factors=False, over_sampling=2.0, **kw): models, status = fetch_func(session, pdb_id, fetch_source=fetch_source, ignore_cache=ignore_cache, **kw) if structure_factors: if len(models) != 1: raise UserError('Structure factors can only be used with a single model!') m = models[0] sf_file = fetch_structure_factors(session, pdb_id, fetch_source=fetch_source, ignore_cache=ignore_cache, **kw) from chimerax.clipper.symmetry import get_map_mgr mmgr = get_map_mgr(m, create=True) mmgr.add_xmapset_from_mtz(sf_file, oversampling_rate = over_sampling) return [mmgr.crystal_mgr], status return _fetch
... then I can test it by hacking around in the shell:
from chimerax.core.fetch import fetch_databases pdb = fetch_databases()['pdb'] from chimerax.core.commands import open as cxopen result = cxopen.open(session, '6g2g', structure_factors=True) # Raises a TypeError because of the unexpected keyword, but needs to be run once to # ensure the standard fetch function is initialized from chimerax.clipper.io.fetch_cif import fetch_wrapper mmcif_fetch = pdb.fetch_function['mmcif'] pdb.fetch_function['mmcif']=fetch_wrapper(mmcif) result = cxopen.open(session, '6g2g', structure_factors=True) # Correctly loads model and generates maps
... but what is the most elegant *correct* way to achieve the same?
comment:10 by , 6 years ago
Minor modification (at least for mmCIF, the fetch_source
argument shouldn't be passed through to the atomic model fetcher or it will fail on non-standard databases):
def fetch_wrapper(fetch_func): def _fetch(session, pdb_id, fetch_source='rcsb', ignore_cache=False, structure_factors=False, over_sampling=2.0, **kw): models, status = fetch_func(session, pdb_id, ignore_cache=ignore_cache, **kw) if structure_factors: if len(models) != 1: raise UserError('Structure factors can only be used with a single model!') m = models[0] sf_file = fetch_structure_factors(session, pdb_id, fetch_source=fetch_source, ignore_cache=ignore_cache, **kw) from chimerax.clipper.symmetry import get_map_mgr mmgr = get_map_mgr(m, create=True) mmgr.add_xmapset_from_mtz(sf_file, oversampling_rate = over_sampling) return [mmgr.crystal_mgr], status return _fetch
Also had a typo in the test code:
pdb.fetch_function['mmcif']=fetch_wrapper(mmcif)
should of course be
pdb.fetch_function['mmcif']=fetch_wrapper(mmcif_fetch)
follow-up: 11 comment:11 by , 6 years ago
Definitely make a ticket if the is_default bundle_info.xml flag is being ignored. Then you can easily work around the problem by adding a custom initialization function to your bundle that is run at startup. To do this add to the bundle_info.xml first line the customInit variable <BundleInfo ... "customInit=true"> and then in yourbundle/src/__init__.py add an initialize() method to your BundleAPI class that will be run at startup. In that initialize function register your fetch handler. An example that does exactly this is the bundle chimerax/src/bundles/map. Take a look at that code.
follow-up: 12 comment:12 by , 6 years ago
Modify the current PDB fetching code to add the structure_factors argument. The routine is in the chimerax/src/bundles/pdb bundle called fetch_pdb(). Add structure_factors to the list of open command options in the pdb bundle bundle_info.xml open pdb classifier line. Then we can put this patch into the ChimeraX distribution. This option will need ISOLDE or at least Clipper so in the fetch code you add you should test whether the user actually has those installed (try except ImportError around import) and if they don't issue a UserError saying that using the structure_factors option requires installing ISOLDE.
comment:13 by , 6 years ago
This seems like an ugly solution. I prefer the "load separately" solution (like coordinate sets) that Tristan mentioned earlier:
open 1a0m from sfactor structureModel #1
This isolates all the code and documentation into the bundle that's actually implementing it and avoids "test imports".
follow-up: 14 comment:14 by , 6 years ago
The x-ray structure factors are extra data associated with a pdb model, not really a separate database. I think open 1a0m structureFactors true to open the PDB and x-ray data is easier on the user and makes sense to the user. The alternative is more cumbersome open 1a0m open 1a0m from sfactor structureModel #1 The primary consideration should be what works best for the user, not what is most convenient for the developer.
comment:15 by , 6 years ago
Cc: | added |
---|
Yes, I understand the convenience argument. However, there are a lot of possible data sources that you could associate with PDB or PDB-like structures. You don't want to wedge them all into the PDB code. For a fairly infrequently used capability I feel its better to have a separate command than to join the PDB and Clipper/ISOLDE bundles at the hip.
Wouldn't a separate load also possibly allow the user to specify their own structure-factors file?
follow-up: 16 comment:16 by , 6 years ago
This is a data source from the PDB database, not some third-party database, so putting the code to fetch structure factors from the PDB with the code that fetches atomic models from the PDB seems reasonable. The open command structureFactors option is to load extra data that cannot be visualized without the associated atomic model (because phases are needed derived from the atomic model), so loading when the atomic model is fetched is sensible. While you say this is an "infrequently used capability", for Tristan's x-ray refinement users this is basic stuff, and to make them type an extra command with extra arguments to fetch some extra data is not reasonable if you care at all about those users. It is a bit untidy that the PDB fetch code can only show that extra data if ISOLDE is available but it is not complex to do that import check.
follow-up: 17 comment:17 by , 6 years ago
Can crystallographers open their own (undeposited) structure factors + atomic coordinates, or do you think that that is an unlikely use case?
follow-up: 18 comment:18 by , 6 years ago
Crystallographers will mostly be using their own structure factors. So this fetch option does not replace the need for open mystructurefactors.mtz structureModel #1 The standard format for structure factors is mtz, not the PDB cif structure factors file.
comment:19 by , 6 years ago
Okay, if the dividing line is "data provided directly by the PDB" and there is also a means for using local data, I guess I am persuaded.
comment:21 by , 6 years ago
OK, I'll take Tom's approach. Happy to revisit later if someone comes up with a neat way to do it from entirely within my own bundle.
comment:22 by , 6 years ago
How about the following changes to __init__.py
for the PDB and mmCIF bundles (of course excluding the auto-copyright statement, which I really need to turn off)? That, plus the addition of ,structure_factors:Bool,over_sampling:Float
to each bundle's bundle_info.xml
seems to do the job quite neatly.
PDB:
diff --git a/src/bundles/pdb/src/__init__.py b/src/bundles/pdb/src/__init__.py index af759f8..121f83b 100644 --- a/src/bundles/pdb/src/__init__.py +++ b/src/bundles/pdb/src/__init__.py @@ -1,3 +1,13 @@ +# @Author: Tristan Croll <tic20> +# @Date: 18-Jan-2019 +# @Email: tic20@cam.ac.uk +# @Last modified by: tic20 +# @Last modified time: 30-May-2019 +# @License: Free for non-commercial use (see license.pdf) +# @Copyright: 2017-2018 Tristan Croll + + + # vim: set expandtab shiftwidth=4 softtabstop=4: # === UCSF ChimeraX Copyright === @@ -17,6 +27,16 @@ from .pdb import process_chem_name, format_nonstd_res_info from chimerax.core.toolshed import BundleAPI +def _fetch_wrapper_dummy(fetch_func): + def fetch(session, pdb_id, fetch_source='rcsb', ignore_cache=False, + structure_factors=False, over_sampling=1.5, **kw): + if structure_factors: + from chimerax.core.errors import UserError + raise UserError('Loading structure factors requires the ChimeraX-Clipper plugin ' + 'available from the Tool Shed.') + return fetch_func(session, pdb_id, ignore_cache=ignore_cache, **kw) + return fetch + class _PDBioAPI(BundleAPI): from chimerax.core.commands import EnumOf @@ -29,7 +49,13 @@ class _PDBioAPI(BundleAPI): # returns (list of models, status message) from . import pdb fetcher = pdb.fetch_pdb_pdbe if database_name == "pdbe" else pdb.fetch_pdb - return fetcher(session, identifier, ignore_cache=ignore_cache, **kw) + try: + from chimerax.clipper.io import fetch_cif + return fetch_cif.fetch_wrapper(fetcher)(session, identifier, ignore_cache=ignore_cache, **kw) + except ImportError: + return _fetch_wrapper_dummy(fetcher)(session, identifier, ignore_cache=ignore_cache, **kw) + + # return fetcher(session, identifier, ignore_cache=ignore_cache, **kw) @staticmethod def open_file(session, stream, file_name, *, auto_style=True, coordsets=False, atomic=True,
mmCIF:
diff --git a/src/bundles/mmcif/src/__init__.py b/src/bundles/mmcif/src/__init__.py index e68440e..2d3c93e 100644 --- a/src/bundles/mmcif/src/__init__.py +++ b/src/bundles/mmcif/src/__init__.py @@ -1,3 +1,13 @@ +# @Author: Tristan Croll <tic20> +# @Date: 18-Mar-2019 +# @Email: tic20@cam.ac.uk +# @Last modified by: tic20 +# @Last modified time: 30-May-2019 +# @License: Free for non-commercial use (see license.pdf) +# @Copyright: 2017-2018 Tristan Croll + + + # vim: set expandtab shiftwidth=4 softtabstop=4: # === UCSF ChimeraX Copyright === @@ -21,6 +31,16 @@ from .mmcif import ( from chimerax.core.toolshed import BundleAPI +def _fetch_wrapper_dummy(fetch_func): + def fetch(session, pdb_id, fetch_source='rcsb', ignore_cache=False, + structure_factors=False, over_sampling=1.5, **kw): + if structure_factors: + from chimerax.core.errors import UserError + raise UserError('Loading structure factors requires the ChimeraX-Clipper plugin ' + 'available from the Tool Shed.') + return fetch_func(session, pdb_id, ignore_cache=ignore_cache, **kw) + return fetch + class _PDBioAPI(BundleAPI): @@ -42,7 +62,11 @@ class _PDBioAPI(BundleAPI): from chimerax.core.errors import UserError raise UserError("Unknown database for fetching mmCIF: '%s'. Known databases are: %s" % (database_name, ", ".join(list(fetchers.keys())))) - return fetcher(session, identifier, ignore_cache=ignore_cache, **kw) + try: + from chimerax.clipper.io import fetch_cif + return fetch_cif.fetch_wrapper(fetcher)(session, identifier, ignore_cache=ignore_cache, **kw) + except ImportError: + return _fetch_wrapper_dummy(fetcher)(session, identifier, ignore_cache=ignore_cache, **kw) @staticmethod def open_file(session, path, file_name, *, auto_style=True, coordsets=False, atomic=True,
follow-up: 23 comment:23 by , 6 years ago
Tristan, attach a patch file to the ticket (can just email with the file as an attachment). Putting the patch inline in a comment is subject to line mangling like wrapping lines so that is not usable.
follow-up: 25 comment:25 by , 6 years ago
OK, diffs are attached. I added a little extra code to the standard open_file() command for each bundle to raise a UserError if the extra arguments are used there. On 2019-05-30 19:25, ChimeraX wrote:
comment:26 by , 6 years ago
The implementation in the patches is too convoluted, passing the original fetcher to the Clipper fetcher makes the code hard to follow. Please add the structure_factors and oversampling options directly to fetch_mmcif in mmcif.py and have that function import clipper if structure_factors is true and call to get the structure factors. Do the same for fetch_pdb.
comment:27 by , 6 years ago
OK. My reasoning behind that was to avoid increasing complexity of the main code, and keep things as agnostic as possible to Clipper's implementation. As requested, I've instead directly modified fetch_mmcif() in mmcif.py. This requires a couple of minor changes I just pushed to the Clipper Git. On 2019-05-31 18:55, ChimeraX wrote:
comment:28 by , 6 years ago
That looks good. Having the code in fetch_mmcif() that handles the structure factors option I think makes it much easier to follow the code when debugging is needed.
When will Clipper versions be on Toolshed to make this option work? I'll put in your code once there is a Clipper toolshed package I can test it against.
Also do you want the equivalent option for PDB format files (in fetch_pdb())?
Is the goal to get this into ChimeraX 0.9? That is just about to be released and has been in code freeze for a week, meaning no new features, only bug fixes. But if it is very important to you we could delay the 0.9 release a few days.
comment:29 by , 6 years ago
Would be great to have this available in the 0.9 release! I was about to upload some test builds for you, but I've run into a minor snag and it's a little late in the day to fix it: until today I've been building against the daily builds, unaware that they were substantially different from the release candidates. The specific problem is that in the daily builds Model.__init__() now adds a TriggerSet to the model, and I've already adapted to this in my various Model subclasses that rely on triggers. Will have to backtrack that before I can give you a working build. On 2019-06-03 22:44, ChimeraX wrote:
comment:30 by , 6 years ago
With no clipper builds on the toolshed to test this new feature it seems unlikely it will make it into 0.9. We could be releasing 0.9 on Thursday -- we will meet then and decide if there are any remaining problems holding up release.
follow-up: 27 comment:31 by , 6 years ago
That’s no big hardship on my part - I can always provide a menu-driven option in the ISOLDE GUI for now. But I *should* be able to put a working Clipper build up tomorrow.
comment:32 by , 6 years ago
Mac and Linux builds are up, and Windows should follow in half an hour or so. On 2019-06-04 20:58, ChimeraX wrote:
follow-up: 29 comment:34 by , 6 years ago
Some notes: - various commands available prefixed with "clipper": - "clipper open {xxxx-sf.cif|xxxx-sf.mtz} structureModel #1" loads the pre-downloaded structure factors and creates maps. Repeated calls will create extra maps (housed under the 'Map Manager' model in the Model tree). - "clipper assoc {VolumeSpec} to {ModelSpec}" will capture the volume object, move it to the 'Data manager' tree for the model, and convert it to an object that behaves the same way as the crystallographic maps with respect to visualisation etc. ... and others as documented by "usage clipper" For maps calculated from structure factors, any change to the model (coordinates, B-factors, occupancies, or addition/removal of atoms) triggers a background recalculation of new maps. At the moment Clipper does *not* support non-identity scene transforms. When first initialising a model and/or adding a real-space map, any existing scene transform is applied to the map and atoms, and all model position properties reset to identity. I'm open to changing this if there's a compelling reason, but given that (a) for x-ray structures the coordinate frame is *not* arbitrary, and (b) the added complexity of working between scene and absolute coordinates gives me a splitting headache, I'd really prefer to keep it this way. On 2019-06-05 18:15, ChimeraX wrote:
comment:35 by , 6 years ago
I tried "open 1a0m structureFactors true" with your mmcif patches and got the space group error reported in ticket #2002.
follow-up: 31 comment:36 by , 6 years ago
Your mmcif.py.diff patch adds an over_sampling argument to fetch_mmcif() but it is not used in the routine.
follow-up: 32 comment:37 by , 6 years ago
Once the over_sampling option is fixed I'm ready to commit the mmcif patch and could get it into 0.9 if 0.9 is not already released.
You did not provide and fetch_pdb patch. If you want structure_factors and over_sampling options for PDB format fetching I'll need a patch for that.
follow-up: 33 comment:38 by , 6 years ago
Argh! The over_sampling argument belongs to the map generation code, not fetch_mmcif. Will fix immediately and make the matching modifications to fetch_pdb. On 2019-06-05 19:13, ChimeraX wrote:
follow-up: 34 comment:39 by , 6 years ago
Amended mmcif.py.diff and pdb.py.diff attached. On 2019-06-05 19:21, ChimeraX wrote:
comment:40 by , 6 years ago
The mmcif patch does not work
$ patch -p4 < ~/Downloads/mmcif.py-1.diff
patching file src/mmcif.py
patch: malformed patch at line 43: + return [mmgr.crystal_mgr], status
Not sure what is wrong although web posts suggest this often happens from copy and pasting a patch. A patch should be made by "diff -c orig fixed > blah.patch".
comment:41 by , 6 years ago
Sorry, my bad - hand-edited the diff file without realising that would break it.
follow-up: 39 comment:44 by , 6 years ago
I was just using “git diff” (without added switches). I’m still a bit of a n00b when it comes to this.
comment:45 by , 6 years ago
When opening with structure factors (e.g. open 6fk0 structureFactors true) it brings up a dialog titled "Choose the dataset to use for map calculations" with a menu of choices. This is unexpected. Generally commands don't throw up modal dialogs. The point of using a typed command is often to avoid the use of the dialogs and mousing. Also the command can of course not show a dialog in "no gui" mode which allows users to run scripts non-interactively. Can a default value for this choice be made? Or can an additional option specify the choice? I see from opening different structures I get different choices. If the choice cannot be made without user interaction then perhaps this "structureFactors" option is just does not make sense because the user has to go through a GUI to make a choice.
One immediate issue with showing this modal dialog is that the log output from opening the mmCIF model is messed up by the modal dialog, it appears in the "Notes" section of open command output instead of at the top level of the log. Not sure how that is happening -- maybe it is not the modal dialog, more likely Clipper is incorrectly adding models to the session. The fetch and open Python functions return models that have not yet been added to the session.
comment:46 by , 6 years ago
Yes, I suppose I could arrange some heuristics to make a default choice when opening from the command line. Yes, Clipper’s top-level data manager does add itself to the session at the end of its __init__() - should be straightforward to rearrange so that happens from the calling code instead.
comment:47 by , 6 years ago
OK, I've rearranged things so that the top-level model doesn't auto-add itself to the session on creation. #2005 and #2006 (along with a couple of others that came up in the course of handling them) are fixed. Still need to quash the GUI dialogs that pop up when there are multiple datasets (also if for some reason there are multiple R-free flag arrays, but that will never happen with a .cif structure factor file), and to provide a save mtz
command. Won't be able to get to those this evening, I'm afraid.
follow-up: 41 comment:48 by , 6 years ago
I've just put a Mac wheel at https://drive.google.com/open?id=1slHQZcf_6TL9ffkwtjwZxw4Pidu3sTBz if you want a new version for testing. Will put a new build on the Tool Shed once the last few issues are tackled.
follow-up: 42 comment:49 by , 6 years ago
I committed your mmcif and pdb patches to ChimeraX daily and 0.9 branches.
comment:51 by , 6 years ago
Using the command-line to open a structure factor file with multiple data arrays will no longer spawn a GUI. I've also implemented a save filename.mtz
with optional boolean arguments preserveInput
(to pass through the original data into the new file) and saveMapCoeffs
(to save the amplitudes and phases for all generated maps). The minimum it will save is the free flag array and the amplitudes/sigmas (F/sigF) that the live map calculations are based on. If the atomspec covers multiple XmapSet
objects, one MTZ file will be written for each (with a -{number} suffix for all but the first).
I think that takes care of everything in this thread. One relatively minor issue remains: open xxx.mtz structureModel #1
creates a record in the file history that can't be re-opened since it only knows about the MTZ file and not the model it was associated with.
follow-up: 44 comment:52 by , 6 years ago
Updated bundles for Linux and Mac are on the Tool Shed. Will add Windows later tonight, and hopefully get a new ISOLDE version out next week.
comment:53 by , 6 years ago
I have changed file history so it will not make an entry when opening an MTZ file "open file.mtz structureModel #1" because such an entry will not work correctly if a different model #1 is present. The change makes file history not remember any files when an option is given whose value is not a boolean, int, float, or string. These are the types that can saved in the history file. Formerly the "structureModel #1" option was simply not saved in history because the value is a Model and the former behavior created a history entry but just dropped this option. That leads to a file history entry that gives an error when clicked. Seems better not to create broken history entries.
follow-up: 46 comment:54 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Seems all of this is working nicely now.