Opened 6 months ago
Last modified 6 months ago
#17550 assigned enhancement
Improve Boltz structure prediction usability — at Version 1
| Reported by: | Tom Goddard | Owned by: | Tom Goddard | 
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Structure Prediction | Version: | |
| Keywords: | Cc: | ||
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX | 
Description (last modified by )
I added a Boltz structure prediction tool to ChimeraX described here
https://www.rbvi.ucsf.edu/chimerax/data/boltz-apr2025/boltz_help.html
There are many ways it could be improved.  This ticket is to list and prioritize the improvements.  Here's an initial list.
1) Provide prediction time estimates.  I thought this would be easy. But now it appears hard particularly for large complexes that appear to split the computation on Windows between GPU and CPU, or on Mac and Linux run into large slowdowns probably due to memory paging.  Possibly though an estimate based on the user's previous runs could be made.  Or known memory requirements could be used, although these may change if newer Boltz versions optimize memory use.
2) Warn if prediction is likely to run out of memory.  This is related to estimating the time.  The goal is to avoid wasting the user's time trying to run predictions that will fail after 30 minutes.
3) Allow common post translational modifications such as phosphorylation.
4) Allow reloading a previous prediction to rerun or make a modification and rerun.  This might work by keeping a history of previous runs that can be chosen.
5) Make sure user is aware that a running prediction will be terminated if ChimeraX is quit.  Can I make it an option to not quit the prediction if ChimeraX exits?
6) Make a prediction history panel that lists each previous prediction so they structures can easily be opened or so the prediction can be rerun possibly after modifications.
7) Currently the descriptions of the chains are lost by Boltz so opening the predicted structure only knows the chain identifiers.  Would be great to get the chain descriptions into the mmcif.  Could make a Boltz pull request that does this.  Could probably easily but the descriptions into the input yaml file which would be ignored by Boltz, but ChimeraX could use them.
8) Allow choosing chain ids for a prediction.  Currently they are automatically assigned.  But the user may want to match another structures chain ids.
9) Torch warns about setting matrix multiplication floating point mode for using tensor cores with an Nvidia gpu is available.  Apparently this can speed up matrix multiplications many fold.  Test if this speeds up predictions without degrading prediction quality, for instance, using 16-bit float.
10) Profile Boltz with torch memory profiler to identify high memory use part and see if Boltz can be made more memory efficient to allow larger predictions, e.g. by switching from float32 to float16 for biggest tensors, or from assigning those big memory hog layers to the CPU even when GPU is available if the prediction is sufficiently large.  Ticket #17555.
11) Figure out if there are any speed optimizations for Torch / Boltz on Intel CPUs on Windows.  Not too likely, but this platform is really slow.
12) Make installing Torch with CUDA an option, or maybe better do it automatically if we detect nvidia graphics.  Can probably just look at the OpenGL driver version in ChimeraX to look for Nvidia.
13) Allow setting up multiple predictions and queueing them to run one at a time.  The history panel proposed in item (6) could list the queued jobs and say whether the are still waiting to run and allow you to select which one to run next overriding the submission order.
14) Allow specifying bonds between ligands and proteins, or maybe between any two molecular components.
15) Enable setting up all-by-all protein dimer predictions.  Would work like the current "alphafold dimers" command, might even just be an enhancement to that command.
16) Enable setting up runs of a protein or complex against a series of N ligands, like what we did with the macromethods course in January 2025 running all FDA compounds against virus targets.
