Movies and TV shows love to depict robots who can understand and talk back to humans. Shows like Westworld, movies like Star Wars and I, Robot are filled with such marvels. But what if all of this exists in this day and age? Which it certainly does. You can write a program that understands what you say and respond to it.

All of this is possible with the help of speech recognition. Using speech recognition in Python, you can create programs that pick up audio and understand what is being said. In this tutorial titled ‘Everything You Need to Know About Speech Recognition in Python’, you will learn the basics of speech recognition.

What is Speech Recognition?

Speech Recognition incorporates computer science and linguistics to identify spoken words and converts them into text. It allows computers to understand human language.

Speech_Recognition_In_Python_1

Figure 1: Speech Recognition

Speech recognition is a machine's ability to listen to spoken words and identify them. You can then use speech recognition in Python to convert the spoken words into text, make a query or give a reply. You can even program some devices to respond to these spoken words. You can do speech recognition in python with the help of computer programs that take in input from the microphone, process it, and convert it into a suitable form.

Speech recognition seems highly futuristic, but it is present all around you. Automated phone calls allow you to speak out your query or the query you wish to be assisted on; your virtual assistants like Siri or Alexa also use speech recognition to talk to you seamlessly.

Want a Top Software Development Job? Start Here!

Full Stack Developer - MERN StackExplore Program
Want a Top Software Development Job? Start Here!

How Does Speech Recognition work?

Speech recognition in Python works with algorithms that perform linguistic and acoustic modeling. Acoustic modeling is used to recognize phenones/phonetics in our speech to get the more significant part of speech, as words and sentences.

Speech_Recognition_In_Python_2

Figure 2: Working of Speech Recognition

Speech recognition starts by taking the sound energy produced by the person speaking and converting it into electrical energy with the help of a microphone. It then converts this electrical energy from analog to digital, and finally to text. 

It breaks the audio data down into sounds, and it analyzes the sounds using algorithms to find the most probable word that fits that audio. All of this is done using Natural Language Processing and Neural Networks. Hidden Markov models can be used to find temporal patterns in speech and improve accuracy.

Picking and Installing a Speech Recognition Package

To perform speech recognition in Python, you need to install a speech recognition package to use with Python. There are multiple packages available online. The table below outlines some of these packages and highlights their specialty.

Package

Functionality

Installation

Apiai

Includes natural language processing for identifying a speaker’s intent

$ pip install apiai

Google-cloud-speech

Offers basic speech to text conversion

$pip install virtualenv

virtualenv <your-env>

<your-env>\Scripts\activate

<your-env>\Scripts\pip.exe install google-cloud-speech

Speech Recognition

Offers easy audio processing and microphone accessibility

pip install SpeechRecognition

Watson-developer-cloud

Watson developer cloud is an Artificial Intelligence API that makes creating, debugging, running, and deploying APIs easy. It can be used to perform basic speech recognition tasks.

pip install-upgrade watson-developer-cloud

Table 1: Picking and installing a speech recognition package

For this implementation, you will use the Speech Recognition package. It allows:

  • Easy speech recognition from the microphone.
  • Makes it easy to transcribe an audio file.
  • It also lets us save audio data into an audio file.
  • It also shows us recognition results in an easy-to-understand format.

Speech Recognition in Python: Converting Speech to Text

Now, create a program that takes in the audio as input and converts it to text.

Speech_Recognition_In_Python_3

Figure 3: Importing necessary modules

Let’s create a function that takes in the audio as input and converts it to text.

Speech_Recognition_In_Python_4

Figure 4: Converting speech to text

Now, use the microphone to get audio input from the user in real-time, recognize it, and print it in text.

Speech_Recognition_In_Python_5

Figure 5: Converting audio input to text

As you can see, you have performed speech recognition in Python to access the microphone and used a function to convert the audio into text form. Can you guess what the user had said?

Opening a URL With Speech

Now that you know how to convert speech to text using speech recognition in Python, use it to open a URL in the browser. The user has to say the name of the site out loud. You can start by importing the necessary modules.

Speech_Recognition_In_Python_6

Figure 6: Importing modules

Now, use speech to text to take input from the microphone and convert it into text. Then you can use the microphone function to get feedback and then convert it into speech using google. Then, using a get function in the web module, make a browser request for the site you want to open.

Speech_Recognition_In_Python_7.

Figure 7: Opening a website using speech recognition

Now, run the function and get the output.

Speech_Recognition_In_Python_8.

Figure 8: Opening a website using speech recognition

As you can see from the above figure, the query has successfully run, otherwise, an error message would have been thrown. Can you guess which website was opened?

Want a Top Software Development Job? Start Here!

Full Stack Developer - MERN StackExplore Program
Want a Top Software Development Job? Start Here!

Speech Recognition in Python Demo: Guess a Word Game

Now, use speech recognition to create a guess-a-word game. The computer will pick a random word, and you have to guess what it is. You start by importing the necessary packages.

Speech_Recognition_In_Python_9

Figure 9: Importing packages

Now, create a function to recognize what is being said from the microphone. The function is the same, but you have to include exception handling in the program.

Speech_Recognition_In_Python_10

Figure 10: Handling microphone exceptions

Now, initialize your recognizer class and take in the microphone input. You will also check to see if the audio was legible and if the API call malfunctioned. 

Speech_Recognition_In_Python_11

Figure 11: Converting speech to text

Now, initialize the microphone. You will also create a list that contains the various words from which the user will have to guess. You will also give the user the instructions for this game.

Speech_Recognition_In_Python_12

Figure 12: Setting up the microphone

Now, create a function that takes in microphone input thrice, checks it with the selected word, and prints the results. 

Speech_Recognition_In_Python_13

Figure 13: Setting up the game

The image below shows the various output messages and the output of the program.

Speech_Recognition_In_Python_14

Figure 14: Game output

From the output, you can see that the word chosen was ‘apple’. The user got three guesses and was wrong. You can also see the error message which appeared because the user wasn’t audible.

Conclusion 

In this Speech Recognition in Python tutorial you first understood what speech recognition is and how it works. You then looked at various speech recognition packages and their uses and installation steps. You then used Speech Recognition, a python package to convert speech to text using the microphone feature, open a URL simply by speech, and created a Guess a word game. 

We hope this helped you understand the basics of Speech Recognition. To learn more about deep learning and machine learning, check out Simplilearn's Caltech Coding Bootcamp

If you need any clarifications on this Speech Recognition in Python tutorial, do share them with us by mentioning them in this page's comments section. We will have our experts review them and reply to your comments at the earliest!

Happy learning!