Movies and TV shows love to depict robots who can understand and talk back to humans. Shows like Westworld, movies like Star Wars and I, Robot are filled with such marvels. But what if all of this exists in this day and age? Which it certainly does. You can write a program that understands what you say and respond to it.
All of this is possible with the help of speech recognition. Using speech recognition in Python, you can create programs that pick up audio and understand what is being said. In this tutorial titled ‘Everything You Need to Know About Speech Recognition in Python’, you will learn the basics of speech recognition.
What is Speech Recognition?
Speech Recognition incorporates computer science and linguistics to identify spoken words and converts them into text. It allows computers to understand human language.
Figure 1: Speech Recognition
Speech recognition is a machine's ability to listen to spoken words and identify them. You can then use speech recognition in Python to convert the spoken words into text, make a query or give a reply. You can even program some devices to respond to these spoken words. You can do speech recognition in python with the help of computer programs that take in input from the microphone, process it, and convert it into a suitable form.
Speech recognition seems highly futuristic, but it is present all around you. Automated phone calls allow you to speak out your query or the query you wish to be assisted on; your virtual assistants like Siri or Alexa also use speech recognition to talk to you seamlessly.
How Does Speech Recognition work?
Speech recognition in Python works with algorithms that perform linguistic and acoustic modeling. Acoustic modeling is used to recognize phenones/phonetics in our speech to get the more significant part of speech, as words and sentences.
Figure 2: Working of Speech Recognition
Speech recognition starts by taking the sound energy produced by the person speaking and converting it into electrical energy with the help of a microphone. It then converts this electrical energy from analog to digital, and finally to text.
It breaks the audio data down into sounds, and it analyzes the sounds using algorithms to find the most probable word that fits that audio. All of this is done using Natural Language Processing and Neural Networks. Hidden Markov models can be used to find temporal patterns in speech and improve accuracy.
Picking and Installing a Speech Recognition Package
To perform speech recognition in Python, you need to install a speech recognition package to use with Python. There are multiple packages available online. The table below outlines some of these packages and highlights their specialty.
Package |
Functionality |
Installation |
Apiai |
Includes natural language processing for identifying a speaker’s intent |
$ pip install apiai |
Google-cloud-speech |
Offers basic speech to text conversion |
$pip install virtualenv virtualenv <your-env> <your-env>\Scripts\activate <your-env>\Scripts\pip.exe install google-cloud-speech |
Speech Recognition |
Offers easy audio processing and microphone accessibility |
pip install SpeechRecognition |
Watson-developer-cloud |
Watson developer cloud is an Artificial Intelligence API that makes creating, debugging, running, and deploying APIs easy. It can be used to perform basic speech recognition tasks. |
pip install-upgrade watson-developer-cloud |
Table 1: Picking and installing a speech recognition package
For this implementation, you will use the Speech Recognition package. It allows:
- Easy speech recognition from the microphone.
- Makes it easy to transcribe an audio file.
- It also lets us save audio data into an audio file.
- It also shows us recognition results in an easy-to-understand format.
Speech Recognition in Python: Converting Speech to Text
Now, create a program that takes in the audio as input and converts it to text.
Figure 3: Importing necessary modules
Let’s create a function that takes in the audio as input and converts it to text.
Figure 4: Converting speech to text
Now, use the microphone to get audio input from the user in real-time, recognize it, and print it in text.
Figure 5: Converting audio input to text
As you can see, you have performed speech recognition in Python to access the microphone and used a function to convert the audio into text form. Can you guess what the user had said?
Opening a URL With Speech
Now that you know how to convert speech to text using speech recognition in Python, use it to open a URL in the browser. The user has to say the name of the site out loud. You can start by importing the necessary modules.
Figure 6: Importing modules
Now, use speech to text to take input from the microphone and convert it into text. Then you can use the microphone function to get feedback and then convert it into speech using google. Then, using a get function in the web module, make a browser request for the site you want to open.
Figure 7: Opening a website using speech recognition
Now, run the function and get the output.
Figure 8: Opening a website using speech recognition
As you can see from the above figure, the query has successfully run, otherwise, an error message would have been thrown. Can you guess which website was opened?
Speech Recognition in Python Demo: Guess a Word Game
Now, use speech recognition to create a guess-a-word game. The computer will pick a random word, and you have to guess what it is. You start by importing the necessary packages.
Figure 9: Importing packages
Now, create a function to recognize what is being said from the microphone. The function is the same, but you have to include exception handling in the program.
Figure 10: Handling microphone exceptions
Now, initialize your recognizer class and take in the microphone input. You will also check to see if the audio was legible and if the API call malfunctioned.
Figure 11: Converting speech to text
Now, initialize the microphone. You will also create a list that contains the various words from which the user will have to guess. You will also give the user the instructions for this game.
Figure 12: Setting up the microphone
Now, create a function that takes in microphone input thrice, checks it with the selected word, and prints the results.
Figure 13: Setting up the game
The image below shows the various output messages and the output of the program.
Figure 14: Game output
From the output, you can see that the word chosen was ‘apple’. The user got three guesses and was wrong. You can also see the error message which appeared because the user wasn’t audible.
Conclusion
In this Speech Recognition in Python tutorial you first understood what speech recognition is and how it works. You then looked at various speech recognition packages and their uses and installation steps. You then used Speech Recognition, a python package to convert speech to text using the microphone feature, open a URL simply by speech, and created a Guess a word game.
We hope this helped you understand the basics of Speech Recognition. To learn more about deep learning and machine learning, check out Simplilearn's Caltech Coding Bootcamp.
If you need any clarifications on this Speech Recognition in Python tutorial, do share them with us by mentioning them in this page's comments section. We will have our experts review them and reply to your comments at the earliest!
Happy learning!