Nowadays, videos have become an integral part of our lives. Videos educate us and provide the necessary information. In this article, we will learn how to extract text speech from video using Python.
Extract Speech Text from the Video
To extract speech text from video in Python, we require the following modules to install, Here we are using Python PIP to install different modules.
Moviepy Module
The moviepy module in Python is used to perform basic operations on a video. It is used in the video editing process to perform functions like cutting, adding text, merging videos, and many more. You can install the moviepy module by writing the following command in your terminal.
pip install moviepy
Note: This module automatically installs FFmpeg. However, you might prompt to install it in some cases. You can refer to the links here to install FFmpeg on Linux and on Windows.
SpeechRecognition
The speechrecognition module in Python provides an easy way to interact with speech and audio files. You can install the SpeechRecognition module in Python using the following command:
pip install SpeechRecognition
Steps to Extract Speech Text from Video in Python
Step 1: Import the required modules
The first step is to import the required modules, i.e., moviepy and speech_recognition.
import moviepy.editor as mp import speech_recognition as sr
Step 2: Load the video
The next step is to load the video who’s the speech we want to extract. For this, we will use the VideoFileClip() function of moviepy module.
mp.VideoFileClip("file_path")
Step 3: Extract audio from the video
Then extract the audio from the video using the audio attribute and then write the ‘.mp4’ file to the ‘.wav’ file using the write_audiofile() function.
audio.write_audiofile("fimename.wav")
Step 4: Load audio
Load the newly converted audio file using the AudioFile() function of the speech recognition module.
with sr.AudioFile("neveropen.wav") as source: data = r.record(source)
Step 5: Convert audio to text
The final step is to convert the data extracted from the audio to text format. This can be done using the recognize_google() function and passing the extracted data as the parameter.
text = r.recognize_google(data)
Code Implementation:
Now, let us see the full implementation of the code to extract speech text from a video in Python. We will take neveropen.mp4 as an example video for this problem statement. Make sure the associated video is present in the folder where the script is located.
Python3
import moviepy.editor as mp import speech_recognition as sr # Load the video video = mp.VideoFileClip( "neveropen.mp4" ) # Extract the audio from the video audio_file = video.audio audio_file.write_audiofile( "neveropen.wav" ) # Initialize recognizer r = sr.Recognizer() # Load the audio file with sr.AudioFile( "neveropen.wav" ) as source: data = r.record(source) # Convert speech to text text = r.recognize_google(data) # Print the text print ( "\nThe resultant text from video is: \n" ) print (text) |
Output: