Opened 4 years ago
Last modified 4 years ago
#5578 assigned enhancement
Run AlphaFold on UCSF Wynton cluster
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Structure Prediction | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Would like to allow ChimeraX to submit AlphaFold jobs to the UCSF Wynton cluster. This will allow using full alphafold databases and handling longer sequences using the Wynton Nvidia A40 GPUs with 48 Gb compared to our current ChimeraX Google Colab AlphaFold which uses GPUs with only 16 Gbytes of memory (typically Nvidia K80 or T4).
Allowing all ChimeraX users to submit jobs will likely use too many compute resources (very rough estimate of about 8-16 fulltime dedicated GPUs per day) that will not be an acceptable use of the UCSF cluster. Possibly our lab can buy GPU nodes for the cluster if we can get funding for it.
Another idea to limit AlphaFold computations is to require all ChimeraX AlphaFold jobs run on Wynton to go immediately into a public database that anyone can do sequence searches on it. Many users will probably be reluctant to reveal the sequence they are working on and this will limit use. At the same time it will maximize value of the expensive computation resources by making sure the computed structures can be reused by anyone.
Attachments (4)
Change History (13)
comment:1 by , 4 years ago
by , 4 years ago
Script to run singularity image to do an AlphaFold prediction
comment:2 by , 4 years ago
Wynton admin Jason Shi installed the 2.2 Tbytes of AlphaFold databases in
/wynton/group/databases/alphafold_CASP14
comment:3 by , 4 years ago
I am working on running AlphaFold-Multimer (released Nov 2, 2021) on Wynton. It can make predictions for assemblies. It requires an updated to the AlphaFold databases which Jason Shi is doing now. I will make a new singularity image for AlphaFold 2.1 that contains the multimer capability and test it.
comment:4 by , 4 years ago
Sam Li has used my AlphaFold 2.0.1 singularity image to predict single proteins. This required some debugging because his sequence file was not under his home directory so "-B $PWD" needed to be added to the singularity command to mount the current directory.
comment:5 by , 4 years ago
I made an AlphaFold 2.1 singularity image. I used a different procedure than my previous AlphaFold 2.0.1 singularity image. For the 2.1 image I first made a Docker container following the instructions in the AlphaFold github readme.md file. This is the main supported installation method for AlphaFold. I included in this docker container cuda support. This image was made on my home Ubuntu 20.04 system with Intel graphics and did not need to have an Nvidia GPU to add the CUDA support. The following commands built the docker container, converted it to a tar file, then converted that to a singularity image and copied to the wynton cluster.
$ sudo docker build -f docker/Dockerfile -t alphafold . $ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE alphafold latest b9c1f8e0a8a9 30 minutes ago 10.1GB nvidia/cuda 11.0-cudnn8-runtime-ubuntu18.04 848be2582b0a 13 months ago 3.6GB nvidia/cuda 11.0-base 2ec708416bb8 14 months ago 122MB $ sudo docker save b9c1f8e0a8a9 -o alphafold_docker.tar $ sudo singularity build alphafold21.sif docker-archive://alphafold_docker.tar $ rsync -av alphafold21.sif plato.cgl.ucsf.edu:alphafold_singularity
Then I modified the alphafold docker/run_alphafold.py script to produce the singularity command to run a prediction instead of using docker. That modified script is attached to this ticket. The script just outputs the singularity command instead of running it right now -- should make it actually run it.
$ python3 run_af21.py --fasta_paths=test.fasta $ env CUDA_VISIBLE_DEVICES=0 TF_FORCE_\ UNIFIED_MEMORY=1 XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 singularity run --nv -B "/w\ ynton/group/databases/alphafold_CASP14" -B "/wynton/home/ferrin/goddard/alphafo\ ld_singularity" /wynton/home/ferrin/goddard/alphafold_singularity/alphafold21.s\ if --fasta_paths=test.fasta --uniref90_database_path=/wynton/group/databases/al\ phafold_CASP14/uniref90/uniref90.fasta --mgnify_database_path=/wynton/group/dat\ abases/alphafold_CASP14/mgnify/mgy_clusters_2018_12.fa --data_dir=/wynton/group\ /databases/alphafold_CASP14 --template_mmcif_dir=/wynton/group/databases/alphaf\ old_CASP14/pdb_mmcif/mmcif_files --obsolete_pdbs_path=/wynton/group/databases/a\ lphafold_CASP14/pdb_mmcif/obsolete.dat --pdb70_database_path=/wynton/group/data\ bases/alphafold_CASP14/pdb70/pdb70 --uniclust30_database_path=/wynton/group/dat\ abases/alphafold_CASP14/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --bfd_\ database_path=/wynton/group/databases/alphafold_CASP14/bfd/bfd_metaclust_clu_co\ mplete_id30_c90_final_seq.sorted_opt --output_dir=/wynton/home/ferrin/goddard/a\ lphafold_singularity/output --max_template_date=2100-01-01 --db_preset=full_dbs\ --model_preset=monomer --benchmark=False --use_precomputed_msas=False --logtos\ tderr
by , 4 years ago
Attachment: | run_af21.py added |
---|
Script to produce singularity command to run AlphaFold on wynton cluster.
comment:6 by , 4 years ago
I ran a prediction of a dimer pdb 7e40 chains A,B using the new AlphaFold 2.1 databases installed by Jason Shi, successfully completed in 2.5 hours using qsub. Was getting memory errors within the first few minutes until I added NVIDIA_VISIBLE_DEVICES in addition to CUDA_VISIBLE_DEVICES environment variable. Seemed like it might have been trying to use the wrong GPU. Also needed to add a mount (singularity -B option) of the older AlphaFold 2.0 databases since the newer ones have some symbolic links to the older ones.
#!/bin/bash # Specify which GPU to use. Important for Wynton cluster queue submissions. GPU_ID="${SGE_GPU:-0}" # Load the version of CUDA we want. module load cuda/11.0 env CUDA_VISIBLE_DEVICES=$GPU_ID NVIDIA_VISIBLE_DEVICES=$GPU_ID TF_FORCE_UNIFIED_MEMORY=1 XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 singularity run --nv -B "/wynton/group/databases/alphafold_CASP14_v2.1.1" -B /wynton/group/databases/alphafold_CASP14 -B "/wynton/home/ferrin/goddard/alphafold_singularity" /wynton/home/ferri\ n/goddard/alphafold_singularity/alphafold21.sif --fasta_paths=7e40_ab.fasta --uniref90_database_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/uniref90/uniref90.fasta --mgnify_database_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/mgnify/mgy_clusters_2018_12.fa --data_dir=/wynton/group/databases/alphafold_CASP14_v2.1.1 --template_mmcif_dir=/wynton/group/databases/alphafold_CASP14_v2.1.1/pdb_mmcif/mmcif_files --obsolete_pdbs_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/pdb_mmcif/obsolete.dat uniprot_database_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/uniprot/uniprot.fasta --pdb_seqres_database_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/pdb_seqres/pdb_seqres.txt --uniclust30_database_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --bfd_database_path=/wynton/group/databases/alphafold_CASP14_v2.1.1/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --output_dir=/wynton/home/ferrin/goddard/alphafold_singularity/output --max_template_date=2021-11-01 --db_preset=full_dbs --model_preset=multimer --benchmark=False --use_precomputed_msas=False --logtostderr --is_prokaryote_list=false
comment:7 by , 4 years ago
I cleaned up my alphafold 2.1 python launch script and now am running homodimer prediction for PDB 6wxq submitted with
$ cd ~/alphafold_singularity $ qsub -q gpu.q -cwd -l h_rt=12:00:00 -l compute_cap=37,gpu_mem=10000M -S /usr/bin/python3 ./run_af21.py --fasta_paths=6wxq.fasta --model_preset=multimer --is_prokaryote_list=true
Failed after an hour due to an AlphaFold docker error where it needs cuda 11.1 but only requests cuda 11.0. (https://github.com/deepmind/alphafold/issues/224) I will rebuild the singularity image to fix this as suggested in the AlphaFold bug report.
comment:8 by , 4 years ago
Adrian Apelin in Nevan Krogan's lab is running 30 AlphaFold jobs on Wynton apparently on virus vector vaccine structures judging by his publications. I sent him email to see how he has setup alphafold on wynton.
comment:9 by , 4 years ago
I cleaned up the Wynton job submission script. Made it not require Google's absl python module for argument parsing, instead use standard argparse. Made it include default qsub options using "#$" comment lines. Ran a test dimer computation
$ qsub run_alphafold21.py --fasta_paths=seq_7k3s.fasta --db_preset=multimer --is_prokaryote_list=false
I made a singularity image for running AlphaFold 2.0.1 about a month ago following this github repository git@…:hyoo/alphafold_singularity.git.
I've attached the alphafold.def and base.def singularity definition files I used to make the image, and a run_af.sh script used to submit Wynton GPU queue jobs with commands like