Opened 20 months ago

Closed 20 months ago

Last modified 19 months ago

#14771 closed defect (not a bug)

AlphaFold2 colab notebook error

Reported by: bregman3@… Owned by: Tom Goddard
Priority: normal Milestone:
Component: Structure Prediction Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Hello
I've been trying to use the colab on ChimeraX, and I used it previously this week and it worked perfectly fine. I'm now getting the following error
2024-03-14 18:44:43,264 Starting prediction on 2024-03-14 UTC time
2024-03-14 18:44:43,264 Installing ColabFold on Google Colab virtual machine.
Using Tesla T4 graphics processor
2024-03-14 18:44:43,493 Running on GPU
2024-03-14 18:44:43,496 Found 5 citations for tools or databases
2024-03-14 18:44:43,497 Query 1/1: af42 (length 42)
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00]
2024-03-14 18:44:44,734 Could not generate input features af42: Invalid character in the sequence:  
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/colabfold/batch.py", line 1512, in run
    = generate_input_feature(query_seqs_unique, query_seqs_cardinality, unpaired_msa, paired_msa,
  File "/usr/local/lib/python3.10/dist-packages/colabfold/batch.py", line 1039, in generate_input_feature
    feature_dict = build_monomer_feature(
  File "/usr/local/lib/python3.10/dist-packages/colabfold/batch.py", line 892, in build_monomer_feature
    **pipeline.make_sequence_features(
  File "/usr/local/lib/python3.10/dist-packages/alphafold/data/pipeline.py", line 40, in make_sequence_features
    features['aatype'] = residue_constants.sequence_to_onehot(
  File "/usr/local/lib/python3.10/dist-packages/alphafold/common/residue_constants.py", line 580, in sequence_to_onehot
    raise ValueError(f'Invalid character in the sequence: {aa_type}')
ValueError: Invalid character in the sequence:  
2024-03-14 18:44:44,736 Done
Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold
cp: cannot stat '*_relaxed_rank_001_*.pdb': No such file or directory
cp: cannot stat '*_scores_rank_001_*.json': No such file or directory

when I put the same sequence into the colab not through ChimeraX, it works. I've tried restarting everything, trying different amino acid sequences and I just get the same error. 
Here is the actual amino acid input sequence: MTTTLPEGVSHRVGFKPHLRVEIVRGEAVYLLSERGTTALQ
I'm not sure how to go about fixing this. 
Thank you
Miriam Bregman

Change History (7)

comment:1 by Eric Pettersen, 20 months ago

Component: UnassignedStructure Prediction
Owner: set to Tom Goddard
Platform: all
Project: ChimeraX
Status: newassigned

Reported by Miriam Bregman

comment:2 by Tom Goddard, 20 months ago

Resolution: not a bug
Status: assignedclosed

Your sequence ends in Q which is not a 1 letter code for the 20 standard amino acids that Alphafold allows.

comment:3 by bregman3@…, 20 months ago

Hi
I’m sorry I’m confused. Q is the code for glutamine. What letter code am I supposed to use?
Thanks
Miriam Bregman

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: Thursday, March 14, 2024 8:31:54 PM
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Bregman, Miriam <bregman3@illinois.edu>
Subject: Re: [ChimeraX] #14771: AlphaFold2 colab notebook error

#14771: AlphaFold2 colab notebook error
-------------------------------------------+-------------------------
          Reporter:  bregman3@…            |      Owner:  Tom Goddard
              Type:  defect                |     Status:  closed
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:  not a bug             |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------
Changes (by Tom Goddard):

 * resolution:   => not a bug
 * status:  assigned => closed

Comment:

 Your sequence ends in Q which is not a 1 letter code for the 20 standard
 amino acids that Alphafold allows.
--
Ticket URL: <https://urldefense.com/v3/__https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/14771*comment:2__;Iw!!DZ3fjg!8qHTWlf7nA5-BtMOWj6o4UMBuyZldRRsyhYQ4FmxK8JvmDVLyOe1oD8R225QiescQbJigBHsyAWeEsMSOZieB40OcU9nApERPVXB$ >
ChimeraX <https://urldefense.com/v3/__https://www.rbvi.ucsf.edu/chimerax/__;!!DZ3fjg!8qHTWlf7nA5-BtMOWj6o4UMBuyZldRRsyhYQ4FmxK8JvmDVLyOe1oD8R225QiescQbJigBHsyAWeEsMSOZieB40OcU9nAqBP4NOX$ >
ChimeraX Issue Tracker

comment:4 by goddard@…, 20 months ago

You are right, Q is fine. The error message says

    Invalid character in the sequence:

I do not see any invalid character. Also the message is supposed to report the invalid character but it shows nothing after the colon. So I suspect you have pasted the sequence into ChimeraX and it contains an invisible character. That could happen if the source you are pasting from does something strange. So try pasting your sequence from som other application, or type it by hand. 

comment:5 by bregman3@…, 20 months ago

Hello
I tried it again without using two sequences and it still does not work.
This is the error I'm getting still
Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/colabfold/batch.py", line 1512, in run = generate_input_feature(query_seqs_unique, query_seqs_cardinality, unpaired_msa, paired_msa, File "/usr/local/lib/python3.10/dist-packages/colabfold/batch.py", line 1039, in generate_input_feature feature_dict = build_monomer_feature( File "/usr/local/lib/python3.10/dist-packages/colabfold/batch.py", line 892, in build_monomer_feature **pipeline.make_sequence_features( File "/usr/local/lib/python3.10/dist-packages/alphafold/data/pipeline.py", line 40, in make_sequence_features features['aatype'] = residue_constants.sequence_to_onehot( File "/usr/local/lib/python3.10/dist-packages/alphafold/common/residue_constants.py", line 580, in sequence_to_onehot raise ValueError(f'Invalid character in the sequence: {aa_type}') ValueError: Invalid character in the sequence: 2024-03-15 16:50:32,023 Done
this is the sequence used: MSTALTNARPDVESANAVALANDHRIALLTARTALEPALAQRYTEDPRSLLAEFGLVAVEPAYAAWGTEDDTHLLIEDLDRTGSGGEGFSIVFTKSDVPFPSVGTARR
there are no breaks, no spaces, or non-AA characters. This sequence works perfectly fine when I use it just through chrome on the colab folder. When I try to use AF on ChimeraX it is also crashing the application.
Thanks
Miriam Bregman
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: Thursday, March 14, 2024 9:14 PM
To: Bregman, Miriam <bregman3@illinois.edu>; goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #14771: AlphaFold2 colab notebook error

#14771: AlphaFold2 colab notebook error
-------------------------------------------+-------------------------
          Reporter:  bregman3@…            |      Owner:  Tom Goddard
              Type:  defect                |     Status:  closed
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:  not a bug             |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------
Comment (by goddard@…):

 {{{
 You are right, Q is fine. The error message says

     Invalid character in the sequence:

 I do not see any invalid character. Also the message is supposed to report
 the invalid character but it shows nothing after the colon. So I suspect
 you have pasted the sequence into ChimeraX and it contains an invisible
 character. That could happen if the source you are pasting from does
 something strange. So try pasting your sequence from som other
 application, or type it by hand.
 }}}
--
Ticket URL: <https://urldefense.com/v3/__https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/14771*comment:4__;Iw!!DZ3fjg!5rJEpvNm6JWMSmtBmKRYO4OMztRIr6yJl4ocII2vW78_IlFkUBRChVcrwGqi3__2kAfpQxXoj34gorbD1dU9sh7dP4TcqZZSlm2h$ >
ChimeraX <https://urldefense.com/v3/__https://www.rbvi.ucsf.edu/chimerax/__;!!DZ3fjg!5rJEpvNm6JWMSmtBmKRYO4OMztRIr6yJl4ocII2vW78_IlFkUBRChVcrwGqi3__2kAfpQxXoj34gorbD1dU9sh7dP4TcqTI-8iN7$ >
ChimeraX Issue Tracker

comment:6 by Tom Goddard, 19 months ago

ChimeraX Alphafold prediction is broken due to a change in Google Colab. Sorry it took a while to figure out since I am on vacation and only have a phone. I am not sure when it will get fixed, may be days or maybe in April when I am back to work. The problem is sending the sequence to Google Colab no longer works. I don't see any simple fix.

comment:7 by Tom Goddard, 19 months ago

Fixed in the daily build.

More details in ticket #14777.

Note: See TracTickets for help on using tickets.