#14863 closed enhancement (fixed)
fitmap search variable fitting results
| Reported by: | Owned by: | Tom Goddard | |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Volume Data | Version: | |
| Keywords: | Cc: | Elaine Meng | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Roden Luo reports that fitmap search sometimes produces variable results and sometimes identical results when doing multiple runs.
Reported on ChimeraX github issues here
Change History (9)
comment:1 by , 19 months ago
comment:2 by , 19 months ago
I am not sure what change should be made here. You suggest that each time fitmap search is run it should randomize the seed (e.g. based on the time). The other alternative is that each time it is run it should initialize the seed to the same value (e.g. 0) so it produces identical results to the previous run. I'm inclined to think the reproducible result is the better behavior, and that a new "seed" option to the command should be added so that you can get different results by specifying an integer seed value.
comment:3 by , 19 months ago
I also second the latter now. It not only is easier to sort out but also makes more sense in terms of reproducibility. I imagine the "seed" option can receive an integer so the user can take a note or share that number when interested. It might also be good to allow the "seed" to receive "T" for time or "R" for random, and to report that seed at run time in the log so that if the user finds the result interesting, it can be reproduced. Cheers, On Tue, Apr 2, 2024 at 3:18\u202fAM ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> wrote: > >
comment:4 by , 19 months ago
| Cc: | added |
|---|---|
| Resolution: | → fixed |
| Status: | assigned → closed |
| Type: | defect → enhancement |
Ok, I added a "seed" option to the fitmap command used only if the search option is used and with default value 0. It can take an integer value or the string "random". This allows reproducing fitmap search results.
An important note is that the search code uses the current atom positions, so you will get different results even if you use the same random seed if the model has been moved. Since fitmap search moves the model to the best fit location, if you run it twice in a row you will get different results even with the seed = 0 both times. To get the same results you would need to reset the atomic model to the original scene position for example using the "view initial" command.
I did not have it print out the seed used if you specify "seed random" because the Python random.seed() call which uses a random initial seed when no argument is given does not tell you what seed was chosen. I don't think this is an important feature -- it the user wants reproducible results they should either not specify seed so the default of 0 is used, or they should specify an explicit seed number ("seed 15125").
comment:5 by , 19 months ago
Wow! Very fast response! I think the solution is already very good. Just that when I thought of the string "random", I was thinking of the use case where the user is a bit "lazy" to change the seed and wants to explore the random space. And then if a good result shows up, then he or she would be satisfied and proceed to save the structure. But maybe he or she is also interested in saving the seed to reproduce the whole search. If you think it is a valid use scenario, one way to bypass the issue and implement it might be to get a random number first by time, for example, and then log it and pass it to the "seed." But this is really an edge case I am not sure if there will be users like it or not. Cheers, Roden On Thu, Apr 4, 2024 at 3:06\u202fAM ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> wrote: > > > > > >
comment:6 by , 19 months ago
In the sense to guarantee better reproducibility, I would suggest calling "view initial" automatically rather than leaving it to the user. On Thu, Apr 4, 2024 at 5:41\u202fPM ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> wrote: > > > > >
comment:7 by , 19 months ago
I understood your reason for wanting "random" to report the seed used. And I tried to do it, but the Python mechanism for choosing a random seed (which uses built-in hardware level random sources when available) does not provide the seed, and I think that rolling my own poor solution like using the current time is not ideal -- for instance if a script is running this and I used time reported to the second, it may do multiple runs in one second and get the same random seed. Given the extremely narrow use case for reoprting the random seed, I'm not going to do it.
I also looked into making the "view initial" unnecessary. I looked at making the code not use the current scene position of the atoms and instead use their local model coordinates. But the fitting code computes transformations from the current position and making this change require changing too much code for an extremely narrow use. So once, again this is just not worth the effort. We have hundreds of things to work on that will be far more widely used, so I can't spend the time. Even adding the seed option is somewhat crazy given that I expect you are the only one who will ever use it. We need to focus ChimeraX development on things hundreds or thousands of people will use, and there are many such things.
comment:8 by , 19 months ago
Many thanks, Tom! I thought it was only one or two lines. The current is already great! On Fri, Apr 5, 2024 at 4:21 AM ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> wrote: > > >
comment:9 by , 19 months ago
How much code is needed is not important -- what is relevant is how many people will use the feature.
The fitmap code uses the Python random module random() function. That random number generator uses the random.seed() function to set a random seed. But the fitmap code is not setting the seed. So the results from one run to the next of fitmap with the search option are likely to be different, but could be the same if the Python random.seed() has been set by other ChimeraX code to an identical value.
I tried PDB 8gam and EMDB 29900 with
When I started ChimeraX two times and ran this it gave identical results. That makes me suspect that some code in ChimeraX has set the random seed to an initial value. If I start python3 two times and call random.random() I get different values. So it appears that Python randomizes the initial seed. I know there are routines in ChimeraX that set the random seed, for instance code that assigns random colors to the proteins based on their sequence during initial display. So if you open a structure where such colors are assigned the seed will get set to a reproducible value.
If I run fitmap as above twice in the same session I get different results (14 fit instead of 11) as expected.