Sunday, November 17, 2024
Google search engine
HomeLanguagesOpenAI Whisper

OpenAI Whisper

In today’s time, data is available in many forms like tables, images, text, audio, or video. We use this data to gain insights and make predictions for certain events using various machine learning and deep learning techniques. There are many techniques that help us work on tables, images, texts, and videos but there are not a lot of techniques to work on audio data. It is still not very easy to work on audio data directly and extract information. Luckily, audio can be converted to textual data which allows extracting information. There are many tools available to convert audio to text, one such tool is Whisper.

What is Whisper?

Whisper is, in general, a voice recognition model. It is a multi-task model that is capable of speech recognition in many languages, voice translation, and language detection. Due to its intensive training on vast amounts of multilingual and multitask supervised data, Whisper is able to distinguish and understand a wide range of accents, dialects, and speech patterns. Thanks to this extensive training, Whisper can deliver very accurate and contextually relevant transcriptions even in challenging acoustic environments. Its versatility makes it suitable for a wide range of uses, such as converting audio recordings into text, enabling real-time transcription during live events, and fostering seamless communication between speakers of various languages.

Whisper not only has a lot of potential to increase efficiency and accessibility, but it also contributes to bridging the communication gap between various industries. Experts in fields like journalism, customer service, research, and education can benefit from its versatility and accuracy as a tool since it helps them streamline their procedures, gather important data, and promote effective communication.

How to use OpenAI API for Whisper in Python?

Step 1: Install Openai library in Pythonython environment

!pip install -q openai

Step 2: Import Openai library and add your API KEY in the environment

Import the openai library and assign your generated API KEY by replacing “YOUR_API_KEY” with your API key in the below code

Python3




import openai
# add your API key here
openai.api_key = "YOUR_API_KEY"


Step 3: Open your audio file and pass it to the desired module

There are 2 modules available for Whisper module:

1. Transcribe: This module transcribes your audio file into the input language. Model parameters for this module are:

  • file [required]: The audio file to transcribe, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.
  • model [required]: ID of the model to use. Only whisper-1 is currently available.
  • prompt [optional]: An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
  • response_format [optional]: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
  • temperature [optional]: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
  • language [optional]: The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
# opening the audio file in read mode
audio_file = open("FILE LOCATION", "rb")
# calling the module using this line and passing the model name and audio file
# there is only one model available for speech-to-text conversion
transcript = openai.Audio.transcribe(file="audio file", model="whisper-1")
transcript

2. Translate: This module translates your audio file into English language. Model parameters for this module are:

  • file [required]: The audio file to translate, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.
  • model [required]: Model name which you wish to use. Only whisper-1 is currently available.
  • prompt [optional]: An optional text to guide the model’s style or continue a previous audio segment. The prompt should be in English.
  • response_format [optional]: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
  • temperature [optional]: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
# opening the audio file in read mode
audio_file = open("FILE LOCATION", "rb")
# calling the module using this line and passing the model name and audio file
# there is only one model available for speech-to-text conversion
transcript = openai.Audio.translate(file="audio file", model="whisper-1")
transcript

Note: Audio file size should not be larger then 25 MB. If the file size is greater than 25 MB then you should break the file into smaller chunks.

Example Implementation of Whisper using OpenAI in Python

Audio Data link : WhisperAI

1. Implementing Transcribe module

Audio we will be using for trying out the Transcribe module:

We will execute the following code see the results:

Python3




# transcript using openai module
audio_file = open("/content/gfg_offline_classes_en.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
transcript['text']


Output:

Do you miss the interactive environment of a classroom and face-to-face interaction with an expert or a mentor? If you do, then I have great news for you. Lazyroar is starting a classroom program in Noida and I am here to invite you for the same. We are going to begin our classroom program on full stack development, where we are going to focus on skills that are required to make you employable and personalized learning to help you achieve your goals. We encourage you to sign up and be a part of this new exciting journey. So see you at the classes.

2. Implementing Translate module

Audio we will be using for trying out the Translate module:

We will execute the following code see the results:

Python3




# translate using openai module
audio_file= open("/content/q-qkQfAMHGw_128.mp3", "rb")
transcript = openai.Audio.translate("whisper-1", audio_file)
transcript['text']


Output:

Prompt engineering is a word that you must have heard somewhere. But do you know what is its exact use? And where is it used exactly in the software industry? If not, then let's know. Number 1, Rapid innovation. So any company wants to develop and deploy its new product as soon as possible. And give new services to its customers as soon as possible. So that it remains competitive in its entire tech market. So here prompt engineering comes in a lot of use. Number 2 is cost saving. So prompt engineering allows any company to save its total time and cost. Apart from this, the entire development process streamlines it. Due to which the time to develop the product is reduced and its cost is reduced. Number 3 is demand for automation. So whatever you see in your environment today, everyone wants their entire process to be automated. And prompt engineering allows this. It allows to make such systems that totally automate the process that is going on in your company. So now you know the importance of prompt engineering. If you know more important things than this, then quickly comment below.

Frequently Asked Question (FAQs)

1. What is Whisper AI used for?

Whisper AI is a multi-task model that is capable of speech recognition in many languages, voice translation, and language detection.

2. Is Whisper AI free to use?

Unlike GPT and DALL-E, Whisper is an open-source and free model.

3. What is the Whisper model?

Whisper is an automatic speech recognition model trained on 680,000 hours of multilingual data collected from the web.

4. Does Whisper accept .mp4 files?

Yes, you can use Whisper on audio files with extension: mp3, mp4, mpeg, mpga, m4a, wav, or webm.

5. Where can I find the documentation for Whisper model?

You can find the Readme file in their GitHub repository [https://github.com/openai/whisper].

6. Is Whisper model different from OpenAI Whisper?

No, OpenAI Whisper API and Whisper model are the same and have the same functionalities.

Conclusion

In this article we discussed about Whisper AI, and how it can be used transform audio data to textual data. This textual data can be used to gain insight and apply machine learning or deep learning algorithms. WhisperAI promises to open up new opportunities for voice technology as its capabilities develop, making voice-driven applications more effective, inclusive, and user-friendly. WhisperAI raises the bar for speech recognition and transcription by utilising AI, enabling people and organisations to interact more effectively in a quickly changing digital environment.

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments