Opened 5 years ago
Last modified 4 years ago
#3571 assigned enhancement
Periodic cache purging?
Reported by: | Tristan Croll | Owned by: | Greg Couch |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | pett, Tom Goddard | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core ChimeraX Version: 1.0 (2020-06-04 23:15:07 UTC) Description I think it would be wise for ChimeraX to occasionally purge old files from its caches in ~/Downloads. To save disk space (can get quite large for people who look at lots of maps), but more importantly to reduce the instance of people looking at out-of-date versions. Case in point: the models of the SARS-CoV-2 RNA polymerase with the C-terminal register shift. Most of these have now been corrected with new versions in the PDB under the original accession ID, but when I opened them in ChimeraX with "open {PDB ID}" I was still getting the old versions until I manually went and deleted them from ~/Downloads/ChimeraX/PDB. Log: UCSF ChimeraX version: 1.0 (2020-06-04) © 2016-2020 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX OpenGL version: 3.3.0 NVIDIA 450.51.05 OpenGL renderer: TITAN Xp/PCIe/SSE2 OpenGL vendor: NVIDIA Corporation Manufacturer: Dell Inc. Model: Precision T5600 OS: CentOS Linux 7 Core Architecture: 64bit ELF CPU: 32 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz Cache Size: 20480 KB Memory: total used free shared buff/cache available Mem: 62G 3.9G 51G 140M 7.9G 58G Swap: 4.9G 0B 4.9G Graphics: 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1) Subsystem: NVIDIA Corporation Device [10de:11df] Kernel driver in use: nvidia PyQt version: 5.12.3 Compiled Qt version: 5.12.4 Runtime Qt version: 5.12.8
Change History (8)
comment:1 by , 5 years ago
Cc: | added |
---|---|
Component: | Unassigned → Input/Output |
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | ChimeraX bug report submission → Periodic cache purging? |
follow-up: 2 comment:2 by , 5 years ago
On the flip side, it is a real pain waiting for large downloads of EM maps if you delete the ChimeraX downloads directory. I did this once or twice in the past erasing all of Downloads thinking I'd free up some disk space, and regretted it. I believe Greg in the past had code in ChimeraX that could check if a newer database file was available. I think we disabled that because we don't want to impair performance opening your cached data by first checking for a newer file every time. But I could see an improved version of that being the best approach. Opening a cached file opens immediately but also starts in a separate thread a check if a newer file is available. If it finds a newer file it logs a message like "A newer version of PDB 6xcm is available than the cached one you are using. Get the newer version with command "open 6xcm ignoreCache true"" and that command or the word Get could be a link that runs the command making it easy to grab the newer version. A problem with checking for newer versions is of course that every database fetch that will do this will need a new API call to make the check. But if we do it for just the main databases like PDB, EMDB, that would go a long ways to improving the out-of-date data situation.
comment:3 by , 5 years ago
Type: | defect → enhancement |
---|
comment:4 by , 5 years ago
I like the separate thread idea. We could also enhance the ignoreCache open command option from a boolean to a tristate option that optionally updates. Or add another open command option.
A possible issue with the separate thread is for files that are fetched indirectly. For example, it is not easy to change the ignoreCache option for the residue templates that are fetched during mmCIF file reading. And being notified that there is a newer version would mean that the user would need remove the cached file by hand. In this case, a global setting that turned on/off checking for updates for cached files would be best.
comment:5 by , 5 years ago
I doubt that residue templates change enough in relevant ways for us to care about getting "newer" versions.
comment:6 by , 4 years ago
This has just come up in the context of residue templates, courtesy of a bug report by Alexis Rohou. The CCD entry for PEE (di-oleyl-phosphatidylethanolamine) has been wrong forever (fatty acid tails modelled as fully saturated instead of omega-9 unsaturated), but they've just fixed it. Apparently *literally* just - the PDBeChem server still provides the old, wrong version, but the RCSB server doesn't. Since (a) this is a rather common ligand and is likely to be in a lot of people's cache directories, and (b) ISOLDE relies on the CCD templates for adding ligands, it could become a problem if there isn't some mechanism to periodically refresh the cache.
comment:7 by , 4 years ago
Cc: | added; removed |
---|---|
Owner: | changed from | to
The problem is of course that you don't want to degrade performance every time you fetch a structure from PDB which needs CCD templates just to check for the extremely rare situation where a template changed. So we have never had a clear idea about how to fix this that would do more good than harm. Even a periodic check every week or two would be more annoying to users than valuable. To implement the check in a separate thread, run it only at weekly intervals, figure out which ligands to test (every single one the user has ever fetched?), and assure it all works seems almost sure to make ChimeraX less useful. Maybe to handle these cases a command that does the check and purges any out of date entries could be made that a user can run if they know of a problem.
follow-up: 8 comment:8 by , 4 years ago
I see your point. A few thoughts: - it would probably be good to have some form of cache-management GUI tool - basically just a glorified file browser - allowing the user to easily see how much space is being used, purge old/unneeded files, etc. without them needing to find its location for themselves. Particularly for heavy users, the disk usage can get rather large. - If you kept track of the “downloaded” and “last accessed” dates for each cache file, you could run a quick check at startup and print some messages to the log. Something like: “xxx cached files were downloaded more than a month ago and may be out of date. Click here to check for updates” and “yyy cached files totalling zzz MB have not been used in over a month. Click here to delete them.”
Large disk space usage and people unknowingly looking at out-of-date entries certainly are issues. Just addressing the "manual deletion" part, you can instead add "ignoreCache true" to your "open" command.