Opened 18 hours ago

Last modified 11 hours ago

#19410 assigned enhancement

Make a ChimeraX help chatbot using Google NotebookLM

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Documentation Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

In August 2025 Matthias Vorlander made a ChimeraX AI chatbot that answered questions about how to use ChimeraX based on ChimeraX documentation used to train the chatbot. Matthias used Google's NotebookLM. He did not use many input documents for training. But it worked quite well in Elaine and my testing.

We should make an improved version that uses all the ChimeraX documentation sources: User Guide, tutorials, presentations, mailing list messages, Programmer's Guide, python code recipes....

Attachments (1)

chimerax_notebooklm.png (871.4 KB ) - added by Tom Goddard 18 hours ago.
Screenshot of ChimeraX Help NotebookLM web interface.

Download all attachments as: .zip

Change History (15)

comment:1 by Tom Goddard, 18 hours ago

Here is Matthias' description of what he did:

Begin forwarded message:

From: Vorländer,Matthias Kopano
Subject: ChimeraX help chatbot
Date: August 14, 2025 at 2:24:49 AM PDT
To: Elaine Meng

Hi Elaine,

For my lab, I uploaded some of the ChimeraX documentation to a google notebook (until I hit the upload limit of the free version), inspired by the phenix chatbot ( phenix:https://notebooklm.google.com/notebook/8d07beb9-a0c1-4488-8406-16af381d94cc). Even with the upload limit of a free notebook, I find it gives very good answers, unlike ChatGPT. I wondered if your team would consider maintaining such a resource, kept in sync with the last docs? One could also train it on the ChimeraX mailing list archive. This could take a lot of the simpler mailing list requests, and I still find it helpful in my day to day work.

If you would like to check my basic version, please test it here:https://notebooklm.google.com/notebook/bbe7fbb1-8eaf-4077-988e-6c034b291877. For now, please refrain from sharing this via the mailing list as the uploaded documentation is incomplete and I cannot keep it up to date.

Best,
Matthias

comment:2 by Tom Goddard, 18 hours ago

Here Matthias' describes in more detail how he made the chatbot:

Begin forwarded message:

From: Vorländer,Matthias Kopano
Subject: Re: ChimeraX help chatbot
Date: August 18, 2025 at 12:11:25 AM PDT
To: Tom Goddard
Cc: Chimera Staff

Hi Tom,

Thanks for the mail, happy to hear you like the idea!

Regarding your questions: I haven’t tried GPT5 much yet (although it seems to be better than its predecessors), but earlier versions (using our labs pro account, i.e latest models) moften hallucinated answers and got basic selection syntax wrong, frequently mixing up Chimera and ChimeraX. I even tried re-training my own GPT, using the mailing list archive, ChimeraX recipes, the source code, and the documentation, but results were very disappointing.

I then used those materials to upload for the Google Notebook. The big difference there is that the Notebooks don’t have any pre-existing knowledge as far as I understand, but only use the sources you provide. Due to the limit of 50 items, I tried to concatenate materials such as the individual help pages and the mailing list archives, using the material I had gathered for GPT retraining. These were however not processed, presumably due to word count limits. I then simply uploaded the help pages for the commands I personally use the most. So in a way it is a quick and dirty training that took about 20 minutes to do.

And yes, Boltz is missing because I had gathered the docs before you implemented it in ChimeraX. I am not familiar with Notebook automations, but in an ideal world the sources would be auto-updated with the latest ChimeraX release!

Fingers crossed for the grant application, I think this would be a great addition to the fantastic support that you all provide to the community! I would be happy to do some real-world testing once things are put together.

Best wishes,
Matthias

comment:3 by Tom Goddard, 18 hours ago

I made a NotebookLM ChimeraX Help chatbot using more data sources than the initial version from August. Here is my version

https://notebooklm.google.com/notebook/56fd2d00-eb20-4b52-93ba-1ffe36a87a7c

I included the following sources:

ChimeraX User's Guide (286K words, 266 html files)
ChimeraX tutorials and presentations (216K words, 144 html files)
ChimeraX mailing list archives 2020-2025 (1105K words, 9000 messages)
ChimeraX Programmer's Guide (195K words, 78 html files)
ChimeraX Recipes example code (22K words, 59 markdown files)

I've only tested it with a dozen questions about how to do selections and coloring and zones and asked it to write a bit of Python code. The only question it had a problem with is that I asked it to write python for a ChimeraX command that counts the number of cysteine residues given a residue specifier. It made one small mistake (trying to print residues.atomspec in the output message) that it fixed when I told it the error that was raised. The answers are always lengthy, a page or two. Each user has a configuration setting that can request shorter answers that I have not tried.

I don't know how this is going to hold up to real-world testing. If you try it let me know what you think. I assembled the input files quickly, about a half day of work and did not include images and did not try to preserve links (e.g. the extensive crosslinking in the User Guide is lost) and linked text files like command scripts or python code are not included. NotebookLM is a pain. The free version accepts only 50 data files each up to 500K words. The "pro" version for $20 per month only allows 300 input files. So I can't give it separate html files because there are too many. Instead I converted all html to markdown (since NotebookLM understands markdown formatting but can only convert html to plain text ruining tables and other formating). I concatenated markdown files to stay within the limits. I only used 7 data files so far (3 for the mailing list archives, 1 each for the other sources).

The Phenix team has a Phenix Help chatbot using NotebookLM created by Tom Terwilliger. I spent time with Nigel Moriarty from the Phenix team at the all-day Bay Area cryoEM meeting at Genentech Hall last Friday and he encouraged me to try NotebookLM for ChimeraX. The chatbot has the virtue that users will learn more about how to operate ChimeraX versus telling the Claude AI agent to do things for you where you will likely ignore how it is done.

I've attached an image of ChimeraX Help NotebookLM.

by Tom Goddard, 18 hours ago

Attachment: chimerax_notebooklm.png added

Screenshot of ChimeraX Help NotebookLM web interface.

comment:4 by Tom Goddard, 18 hours ago

Eric tried the chatbot and says:

Begin forwarded message:

From: Eric Pettersen
Subject: Re: ChimeraX help chatbot
Date: November 19, 2025 at 5:19:20 PM PST
To: Tom Goddard
Cc: "Vorländer,Matthias Kopano" , Chimera Staff

Gave it a little bit of spin. Seemed to do a good job of thoroughly answering a question like "How do I analyze an MD trajectory?". In some cases perhaps too thorough of a job. When I asked "How do I hide solvent?" it immediately said how to do it with a command and the menus (though it did refer to the Select menu as a "context menu"), but then droned on about how to hide hide ions, how to hide solvent that had drifted away during MD, ad nauseam. Also, because you were forced to concatenate input to NotebookLM, clicking through to the reference links got you to a page with many, many more things that the thing you were actually interested in.

--Eric

comment:5 by Tom Goddard, 18 hours ago

My follow-up chatbot tests exploring problems identified by Eric:

Hi Eric,

Thanks for trying it! When I ask "How do I hide solvent?" it didn't talk about solvent drifting away during MD because I never asked it about MD. Like other AI chat it remembers all your previous questions and the answer the current question will take into account what you were asking before.

My hide water answer was 2 pages long with a lot of alternatives (command, gui, deleting, solvent vs :HOH, ... on and on). Using the settings icon (3 little sliders at the top right of the chat panel) you can as for "Short" answer length. Didn't make it much shorter for me. Then I tried using "Custom style" and told it "Brief answers". Still gave over a page. Then I tried "Explain only the most common method for doing a task." -- still a page long answer. I'm not sure if any trick will convince it to be succinct.

It is unfortunate that the numbered links in the answers go to these massive files with for instance the whole user guide. But at least it goes to the relevant place and highlights the text it was referring to. So I think that is not too bad. But the links within those files don't work (for instance user guide cross referencing links) which is annoying but possibly fixable by converting the links to absolute URLs to the original source html pages. Also I observe it loves to link to the mailing list messages which often have a more specialized focus instead of the user guide. That makes getting more info (that would be found in the user guide) harder.

I tried the chatbot on today's user question about how to find the lumen volume when there are holes -- a rather difficult problem. The chatbot found my discussion of this in both the mailing list and recipes files from 2021 and summarized it pretty accurately. Nice. But again it could not link to my recipe page with the image demonstrating it.

Tom

comment:6 by Tom Goddard, 18 hours ago

I corrected a problem with the presentations and tutorials input material for the chatbot where it had included hundreds of html files listing ChimeraX citations that were in plato directory www/data/homePage/. I removed those from the chatbot input since I don't think they will help it answer questions. That reduced the number of presentation and tutorial html pages from 889 to 144.

comment:7 by Tom Goddard, 14 hours ago

I added absolute https links to the user guide markdown used by the ChimeraX Help NotebookLM. Unfortunately when you click on such a link it shows a new page that says "Redirecting to https:..." but then does not actually redirect and you have to then click the link in that new Redirecting page to get to linked location. This is tedious and seems like broken behavior.

I also made the header of each html page that is concatenated in the user guide markdown give a link to the original html "Source file [html-file-name]". Formerly I had it name the source file without a link like "# File [html-file-name]" where the # means it is a markdown title so it is in bigger bold text. But I could not make it both a title and contain a link. When I tried the link was not rendered by NotebookLM.

I could also add the links to the presentations and tutorials markdown and the programmers guide markdown. I haven't done those yet because going to the links is so cumbersome.

Version 0, edited 14 hours ago by Tom Goddard (next)

comment:8 by Tom Goddard, 14 hours ago

NotebookLM puts links in its answers that are circles with numbers in them, e.g. (1). Hovering over one shows a popup window with a page of text from a source document that is being cited. That displays correctly. Clicking on a circle link shows the full original source document in a side panel. Unfortunately the full source document panel does not scroll to the location of the reference. Since the source documents are huge (e.g. the entire user guide), this side panel is basically useless. That is the behavior if the source document is markdown. If the source document is text then it does scroll to the correct location in the side panel and highlights the cited text in the source.

This failure to show the location in the source document looks like another NotebookLM bug. I wondered if this failure was because I deleted and updated different versions of the user guide (without links, with links). I tried in a new NotebookLM model with just the user guide and it was broken there as well. Searches online also mention this failure to show the correct location in the side panel even though it correctly shows the cited text in the pop-up panel.

I checked how the Phenix chatbot avoids this. It does not show any circle citations, and has no sources side-panel. Searches online say this a "chat only" feature available only to "premium users" (paid subscriptions) and is enabled when creating a sharing link. I do not see that option in my free Google NotebookLM.

I think not showing the web links would make the ChimeraX help much less useful as it gives you no way to get more precise detailed info. But since the sources side panel is not working with markdown and the link redirection is very cumbersome, the whole use of links is seriously compromized.

comment:9 by Tom Goddard, 13 hours ago

I made presentations and tutorials and programming guide markdown sources use https links to the original material.

It would be helpful to also convert mailing list text to include links to the HyperKitty mailing list message archive. That way attachments and the whole message thread is more easily viewed. Those links will require figuring out the hashes used in the URLS for each mailing list message.

It would also be helpful to make the ChimeraX recipes markdown include links to the Github source pages. That way the user can see the images illustrating the recipes.

comment:10 by Tom Goddard, 12 hours ago

I added online source urls for the chimerax recipes at the to top of each recipe in the input markdown. I also converted the internal markdown relative links to online links.

comment:11 by Tom Goddard, 12 hours ago

I added online links to ChimeraX mailing list messages at the top of each text message in the 3 notebook mailing list source files. Unfortunately in text format NotebookLM just shows the links as plain text, so you would have to copy and paste them to a browser to get to the page. In the Mac Safari web browser ctrl-click on the text url gives a popup menu that will open that URL in a new tab, so slightly easier than copy and paste.

comment:12 by Tom Goddard, 12 hours ago

I have online URLs in all the ChimeraX Help NotebookLM source materials now. Unfortunately it is a bit cumbersome to show those linked pages because Google instead takes you to a redirect page, and also because links in email messages appear just as plain text. Still these links to the source online material are better than nothing.

comment:13 by Tom Goddard, 11 hours ago

The presentations and tutorials markdown file was about 27 Mbytes but 25 Mbytes was from a Matthias Vorlander tutorial, apparently from some generated html. So I removed that directory to reduce that markdown file to 1.4 Mbytes hoping that NotebookLM will be a bit more responsive when each of the sources are now under 3 Mbytes.

comment:14 by Tom Goddard, 11 hours ago

"How to hide solvent in less than 20 words" produces a very terse answer.

The most common method is using the command: hide solvent or hide :HOH.

Note: See TracTickets for help on using tickets.