Opened 6 years ago

Last modified 5 years ago

#2871 assigned enhancement

Add speech recognition for running commands in VR

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: VR Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Conrad has added the speech ChimeraX bundle that uses the PyPi SpeechRecognition module, PyAudio and the Google Speech Recognition service as an initial try at speech input. He reports it works ok in a completely silent environment.

Testing in the VizVault with VivePro VR headset microphone produced poor results. It recognized "open" and "close" one time and then mostly does not log any response to spoken input. Sometimes is logs a phrase from attempted commands given over a 60 second period. Sometimes it logs "cannot recognize speech". To debug further it would need to say when audio is being submitted and play back that recorded audio.

PyAudio failed to install from PyPi due to a compilation problem on Windows 10 with Microsoft Visual Studio 2015, so a wheel from a separate site was used.

Change History (2)

comment:1 by Tom Goddard, 6 years ago

Molecular Zoo has pretty robust speech input implemented in Unity in this file by Ray Altenberg while an intern at UCSF.

https://github.com/alanbrilliant/MolecularZoo/blob/master/Assets/Scripts/VoiceRecog.cs

It is using UnityEngine.Windows.Speech specifically the KeywordRecognizer which recognizes fixed words like Oxygen or Reset and does not require an internet connection, and DictationRecognizer which transcribes free speech after the word "create" is said in MolZoo and then does a PubChem search. Both work quite well. The keyword recognizer uses a vocabulary of about 25 words or phrases and is especially robust.

Here is documentation on the Unity KeywordRecognizer and DictationRecognizer classes

https://docs.unity3d.com/ScriptReference/Windows.Speech.KeywordRecognizer.html

https://docs.unity3d.com/ScriptReference/Windows.Speech.DictationRecognizer.html

More details about these Unity capabilities are given my Microsoft here

https://docs.microsoft.com/en-us/windows/mixed-reality/voice-input-in-unity

It looks likely that the dictation recognizer is using the Windows 10 online speech recognition capabilities used by Cortana and described here

https://support.microsoft.com/en-us/help/4468250/windows-10-speech-voice-activation-inking-typing-privacy

Possibly the keyword dictation which does not use an internet connection uses Windows Speech Recognition

https://en.wikipedia.org/wiki/Windows_Speech_Recognition

comment:2 by Elaine Meng, 5 years ago

Cc: Tom Goddard removed
Owner: changed from Conrad Huang to Tom Goddard
Note: See TracTickets for help on using tickets.