Enhancing Podcast Accessibility: A Guide to LLM Text Highlighting

21 July 2024

0

Introduction

Imagine loving a podcast and wishing to remember the best bits, but it’s all sound, no text. What do you do? That’s where cool tools like LLMs and Audio-to-Text translators step in. They magically turn spoken words into written notes, letting you easily pick out the gems and create handy bullet points. So, your favorite podcast moments are just a transcription away! Since its first debut in November 2022, LLM has been all the rage. LLM can be used for various tasks, and text summarization is an essential application. We can have summarization to many other modes apart from text, such as audio & video. We can use LLM to enhance podcast accessibility and generate bulleted highlights for ease of use or take notes for future reference.

PaLM (Pathways Language LLM) is a critical LLM established by Google AI last year in April 2022. This year, in March 2023, PaLM 2’s second version was released, an improved and more updated version. It is intended to have superior bilingual, coding, and thinking abilities. The advantage of using PaLM 2 LLM API over other LLMs is that its API is freely available. Unlike OpenAI’s ChatGPT, it performs better and has improved reasoning abilities than other LLMs.

In this article, we will be learning how to use these tools, namely PaLM 2 API and Maker Suite, to create a simple Podcast Text Highlighter and learn how to optimize the settings of the LLM model to generate better-bulleted summaries. Learn the features of these tools and try to understand different use cases where they can be used. So let’s get started!

Learning Objectives

Understand the PaLM Model and features.
Learn about the model settings of PaLM.
Implement a Python project that generates a bulleted summary of a Podcast audio.

This article was published as a part of the Data Science Blogathon.

Overview of PaLM 2

PALM 2 is a massive NN model with 540 billion parameters, which is scaled using the Pathways method to achieve breakthrough performance. PaLM 540B outperforms the current state of the art on a variety of multi-step reasoning tasks and outperforms average human performance on the just-released BIG-bench benchmark, achieving breakthrough performance. It learns the relationship between words and phrases and can use this knowledge for different tasks.

Pathways AI Architecture

Pathways is a new way of AI architecture thinking that addresses many of the weaknesses of existing systems. Machine learning models tend to overspecialize at single tasks when they could excel at many. Below are the underlying concepts of this architecture:

Multiple Task: The basic idea is instead of training thousands of different models to do different tasks from scratch. We use the same model and try to extend its capabilities to perform new tasks similar to how humans approach doing any task.
Multimodal: Pathways could simultaneously enable multimodal models encompassing vision, auditory, and language understanding. So whether the model is processing the word “leopard,” the sound of someone saying “leopard,” or a video of a leopard running the same response is activated internally: the concept of a leopard. The result is a more insightful model and less prone to mistakes and biases.
Sparse and Efficient: We may create a single model that is “sparsely” active, which means that only a few channels within the network are activated as needed. In reality, the model dynamically learns which portions of the network are good at particular jobs – it knows how to route tasks through the most critical sections of the model. Because we don’t engage the complete network for every activity, this type of architecture not only has a more extraordinary ability to learn a range of tasks, but it’s also faster and much more energy efficient.

PALM 2 Features

Palm 2 has been trained in over 100 languages and can pass language proficiency exams at the expert level. It is the second largest model in parameter size; the first is GPT-4 with 1 trillion parameters. It has highly efficient training on 6k chips(TPU v4) across 2 pods or clusters. PaLM uses a standard Transformer model architecture in a decoder-only.

SwiGLU Activations

It is used in intermediate MLP layers, which have a better performance quality than ReLU, GeLU or Swish. SwiGLU activations are more efficient than traditional activation functions, and they also help improve LLMs’ stability. SwiGLU uses a gating mechanism, which allows it to activate neurons based on the input it receives selectively. This can help to reduce overfitting and improve generalization. The SwiGLU activation function is a piecewise linear function that is defined as follows:

SwiGLU(x) = max(x, 0) + min(α(x – ReLU(x)), 0)

where x is the input to the function, ReLU(x) is the rectified linear unit function (i.e., max(x, 0)), and α is a tunable parameter that controls the shape of the negative part of the function.

The SwiGLU activation function is designed to address some of the limitations of the ReLU function, which can result in “dead” neurons that do not contribute to the output of a neural network. By introducing a piecewise linear negative slope, the SwiGLU function can help to prevent this problem and improve the performance of neural networks.

Difference between ReLU & SwiGLU | Podcast Accessibility | LLM Text Highlighting — Difference between ReLU & SwiGLU – Source

Parallel Formations

A parallel formulation is used in every transformer block instead of the serialized one used in the standard formulation. The parallel formulation enables 15% faster training at larger scales. Parallel formulation is a new way of training LLMs that allows them to be trained much faster than traditional LLMs. Traditional LLMs are trained on a single GPU, which can be prolonged. Parallel formulation will enable LLMs to be trained on multiple GPUs simultaneously, significantly speeding up the training process. Here is an example of how parallel formulation works. Imagine that we have an LLM that is trained on a single GPU. The LLM has a vocabulary of 10,000 words, and a vector of 100 dimensions represents each word. The LLM is trained on a dataset of 1 million sentences.

We need to iterate over the dataset and update the LLM’s parameters for each sentence to train the LLM. This process can be prolonged, especially if the dataset is large. With parallel formulation, we can prepare the LLM on multiple GPUs simultaneously. We can divide the dataset into 1000 batches, and each batch can be trained on a separate GPU. This significantly speeds up the training process because we can simultaneously prepare the LLM on 1000 batches.

Multi-Query Attention

The key/value is shared for each head instead of just one, which results in cost savings at autoregressive decoding time. We can say that in multi-head attention, the entire attention computation is replicated h times, whereas, in multi-query attention, each “head” of the query value Q has the same K and V transformation applied to it. The amount of computation performed by incremental MQA is similar to that of incremental MHA. The critical difference is the reduced amount of data read/written from memory with MQA.

RoPE Embeddings

Rotary Positional Embedding is a new type of positional embedding that unifies absolute and relative approaches and gives superior results. It incorporates the “relative” positions of two tokens rather than absolute positions while calculating the Self Attention. Transformers employ self-attention or cross-attention mechanisms that are agnostic to the order of tokens. This means the model perceives the input tokens as a set rather than a sequence. It thereby loses crucial information about the relationships between tokens based on their positions in the sequence. To mitigate this, positional encodings embed information about the token positions directly into the model.

This type of position embedding uses a rotation matrix to include explicit relative position dependency in the self-attention formulation. Rotary embeddings are essential for natural language processing because they allow models to understand better the context in which words are used. When a model has a better idea of the position of the input tokens, it can produce more accurate predictions. For example, a language model that uses RoPE might better understand that “I love pizza” and “Pizza is what I love” have different meanings due to word position. A model can make more nuanced predictions with a better understanding of relative positioning.

No Biases

No biases were applied in dense and layer norms, which increased training stability for large models. This increases the training efficiency and stability of LLM and allows them to reduce redundant parameters and increase space utilization and scaling.

Model Variations

Palm provides many different variants of the model of different sizes. They have named various models based on animal names and their sizes.

Gecko is the smallest and fastest model that can work on edge devices like mobile even when offline.
Otter – Larger than Gecko and can perform complex tasks
Unicorn is more significant than otter and can be used for chat, text, etc.
Bison is the most significant and stable model of Palm and is widely used for text chat.

Model Parameter Settings

The model parameters help us to modify and generate different responses for our prompt. Let us try to understand them one by one:

Temperature

This influences the randomness of the model’s responses. A high temperature closer to 1 results in more diverse output and creative responses instead of the dry set of definitions. Suppose we want to understand the meaning of a particular word and its usage in this case, we do not require a creative response but dictionary meaning so we can keep the temperature closer to 0(deterministic responses). If we want to write an innovative article or story, we can maintain the temperature closer to 1.

Example of Temperature at value 1 | Podcast Accessibility | LLM Text Highlighting — Example of Temperature at value 1

Token Limit

A token refers to a chunk of text and determines how much text a model can process. A larger token limit lets the model gain a broader scope of information at a time, and a smaller limit restricts the amount of tokens it can handle. Example – Palm 2 can now take 8,000 tokens simultaneously as input.

Top – K

When generating text, the model considers many possible words to follow the current one. The top-k sampling restricts which next-word choices to k most likely words. A lower k-parameter value makes the content more predictable, but a higher number makes it more diversified.

Top -p

It is the probability threshold for considering words and controls the diversity of output. The model keeps considering the next word out of the top k choices until total probability reaches the top-p value. This means that rather than focusing on the top few most likely words, the model might accept less likely words if they achieve the top-p probability together, resulting in a more diversified output. A higher probability results in a more diverse combination.

Maximum Outputs

This denotes the number of outputs generated for a particular input that is, we can specify if we want to see more than one output of model response and accordingly consider which one to take. Below in the image, we can see the example where we get 2 responses for the same input when we set Max Output to 2.

Example of LLM response when maximum output is set 2

Python Implementation of Podcast Text Highlighter

Flowchart of Python Implementation | Podcast Accessibility | LLM Text Highlighting — Flowchart of Python Implementation Source

1: Download Podcast Audio

We can download any podcast audio using this link by pasting our podcast url. Here, we use the Indian Express podcast url.

2: Load and Install Libraries

!pip install openai-whisper
import whisper

3: Transcribe Audio to Text

Initially, we used the “tiny” model variant, and then we used the “base” variant, which is more extensive and gives better results regarding the spelling of words and grammar. We transcribe two audio podcasts.

Note: After downloading the mp3 audio of the podcast from the link as mentioned above, upload it in your colab environment files and paste the path of the audio file in transcribe function as shown.

# Load whisper model
whisper_model = whisper.load_model("base")

# Transcribe audio
def transcribe(file_path: str) -> str:
    # `fp16` defaults to `True`, which tells the model to attempt to run on GPU.
    # we'll run this on the CPU for local demonstration purposes by setting it to `False`.
    transcription = whisper_model.transcribe(file_path, fp16=False)
    return transcription['text']
  
transcript = transcribe('/content/CATCH-UP-2023-10th-October-v1.mp3')
print(transcript)

Output

#OUTPUT
This is the catch-up on 3 things for the Indian Express, and I am Flora Swine. 
It's the 10th of October, and here are the headlines. Four days after the Hamas attack, the 
Israeli Army said today that they have regained control of the Gaza border. 
It warned the population to flee to neighboring Egypt in a grim 
reminder of the expected retaliation. The Israeli Army also 
reported the discovery of the bodies of 1500 Hamas militants within Israeli territory
. The ongoing conflict has claimed approximately 1,600 lives, with 900 casualties in 
Israel and nearly 700 in Gaza. Meanwhile, Prime Minister Narendra Modi took to 
extradite and said that he spoke with Israeli Prime Minister Benjamin Netanyahu, 
assuring him that India stands firmly with Israel and is difficult to guard. He also 
said that India strongly and unequivocally condemned terrorism in all its forms and 
manifestations. Chief Justice of India, D.Y. Chandrachud, said today that the 
The Supreme Court's role is not to micromanage issues that arise across the country. He 
stressed that local matters are best left to the jurisdiction of the respective High 
Court. He was presiding over a three-judge bench. The CGI Maynthese remarks while 
hearing a matter related to captive elephants and said, Court, we have to  
have a broader functional understanding as a court. What is the role of the Supreme Court in the 
nation? Not to deal with micromanagement of issues that arise all over the country. 
Two militants linked to the terror outfit Lashkare Thaibarvak were killed in an encounter 
with security forces in the Soviet district of Jaman Kashmir today. The encounter broke 
out when the security forces launched an anti-militancy operation in the Al-Sipura area, 
acting on intelligence regarding the presence of militants. The disease militants 
have been identified as Morifat Magbul and Jazim Farok. Chintanubhadhai was sentenced 
to life imprisonment today for his involvement in abetting and conspiring to murder 
his estranged wife, Hema Obadhai, in 2015. The Sessions Court also imposed life 
imprisonment sentences on three co-accused, namely Vijay Rajvahar, Pradeep Rajvahar, 
and Shivkuma Rajvahar. On Saturday, the prosecution sought the death penalty for 
all four individuals. The ICC Men's World Cup 2023 has two matches slated for today. 
Pakistanis facing Shilankain Hagradwadwal Bangladesh is taking on England in Haramshalla. 
In other World Cup news, New Zealand beats the Dutch to win their second game in a row at
the competition. The previously triumphed over defending Champions England in the 
tournament opener, placing them at the top of the points table. This was a catch-up on
three things by the Indian Express.

4- Generate a Summary using Maker Suite

Now, we use this podcast summary as training input, prepare its sample model response independently, and use the other as test input. We go to this site and generate a bullet summary.

Maker Suite to generate Bulleted Text Summary Source

We adjust the model parameter settings to generate summaries.

5: Generate Code using Maker Suite

Generate the code using the API Key of Palm API. We have generated our own API key from the site.

"""
At the command line, only need to run once to install the package via pip:

$ pip install google-generativeai
"""

import google.generativeai as palm
palm.configure(api_key="API_KEY")

defaults = {
  'model': 'models/text-bison-001',
  'temperature': 1,
  'candidate_count': 1,
  'top_k': 40,
  'top_p': 0.95,
  'max_output_tokens': 1024,
  'stop_sequences': [],
  'safety_settings': [{"category":"HARM_CATEGORY_DEROGATORY","threshold":4},{"category":"HARM_CATEGORY_TOXICITY","threshold":4},{"category":"HARM_CATEGORY_VIOLENCE","threshold":4},{"category":"HARM_CATEGORY_SEXUAL","threshold":4},{"category":"HARM_CATEGORY_MEDICAL","threshold":4},{"category":"HARM_CATEGORY_DANGEROUS","threshold":4}],
}
Sentence = "This is the catch up on three things for the Indian Express and I am Flora Swain. It\'s the 10th of October and here are the headlines. Four days after the Hamas attacked the Israeli army said today that they have regained control of the Gaza border. It warned the population there to flee to neighboring Egypt while they can in a grim reminder of the retaliation that is expected to follow. The Israeli army also reported the discovery of the bodies of 1500 Hamas militants within Israeli territory. The ongoing conflict has claimed approximately 1600 lives with 900 casualties in Israel and nearly 700 in Gaza. Meanwhile, Prime Minister Narendra Modi took to X today and said that he spoke with Israeli Prime Minister Benjamin Netanyahu assuring him that India stands firmly with Israel and this difficult art. He also said that India strongly and unequivocally condemns terrorism in all its forms and manifestations. Chief Justice of India D.Y. Chandrachud said today that the Supreme Court's role is not to micromanage issues that arise across the country. He stressed that local matters are best left to the jurisdiction of the respective high courts. Prziding over a three-judge bench the CGI made these remarks while hearing a matter related to captive elephants and said, quote, we have to as a court have broader functional understanding. What is the role of the Supreme Court in the nation? Not to deal with micromanagement of issues which arise all over the country. Unquote. Two militants linked to the terror outfit Lashkaretayabah were killed in an encounter with security forces in the Soapian District of Jammun Kashmir today. The encounter broke out after security forces launched an anti-militancy operation in the Alsepura area acting on intelligence regarding the presence of militants. The deceased militants have been identified as Mureffat Maghbul and Jasm Farukh. Chintanubhadi Haya was sentenced to life imprisonment today for his involvement in a betting and conspiring to murder his estranged wife, Hema Upadhyay in 2015. The Sessions Court also imposed life imprisonment sentences on three co-accused, namely Vijay Rajpur, Pradeep Rajpur and Shivkumar Rajpur. On Saturday the prosecution have sought the death penalty for all four individuals. The ICC men's World Cup 2023 has two matches slated for today. Pakistan is facing Sri Lanka in Hyderabad while Bangladesh is taking on England in Haramshalla. In other World Cup news New Zealand beat the Dutch to win their second game in a row at the competition. They previously triumphed over defending champions England in the tournament opener, placing them at the top of the points table. This was the Catchup on Three Things by the Indian Express."
prompt = f"""Transform a sentence into a bulleted list.
Sentence:  This is the catch up on three things for the Indian Express and I'm Flora Swain. It's the 11th of October and here are the headlines. Days after the Hamas attack, the Israeli military said that it is carrying out strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation at one of its posts near the blue line. As for reports, there was a massive buildup of troops along the Israel Gaza border as the country prepared for a ground invasion in the coming days. More than 2,000 people have lost their lives so far in the war which started on Saturday. The Supreme Court today took a serious exception to AIM's authorities seeking clarification of its order from the 9th of October which allowed the abortion of a 26-week-old fetus. The AIM's court cited some fresh concerns and asked why the concerns were not conveyed to the court earlier when it had sought a medical opinion on the women's request seeking permission for medical termination of pregnancy. A special bench of justices, B.V. Nagaratma and Hema Kohli also pulled up the center for approaching Chief Justice of India D.Y. Chandrachud's bench on Tuesday against its order. Samajwadi party president Akhilesh Yadav was denied permission to go inside the J.K.N.R. and International Center to offer floral tribute to Freedom Fighter J.K.N.R. on his birth anniversary. Officials cited security reasons for not allowing the former UPCM into the center today. After he was denied permission, Akhilesh reads the building and jumped the center's boundary wall along with other SP leaders and workers. K.H.N.R. ensued on the spot while the police tried to stop them from entering the premises. The poster girl of Kerala's adult literacy program, K.R.Y.H.A. Amma, died at the age of 101 at her house in Alapurha today. In 2018, she made headlines by becoming the top scorer in the state literacy mission's flagship adult literacy program Akshana Laksham. At the age of 96, K.R.Y.H.A. scored 98 out of 100 marks in the exam that tested writing and mathematical skills. CM Pinery Vijayan in his condolence message said K.R.Y.A. was Kerala's pride and a model the individual. Indian Afghanistan are battling each other in the ninth match of the ICC Cricket World Cup 2023 at the Orange JT Stadium in New Delhi today. India added to your favourites for this match having convincingly won their opening match against Australia. On the other hand, Afghanistan lost their opening match to Bangladesh but they will be looking to perform better against India. This was the Catch Up on Three Things by the Indian Express.
Bulleted: * Israeli military carried out strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation.
 * SC took a serious exception to AIIMS authorities seeking clarification of its order on abortion of a 26-week-old fetus.
* Akhilesh Yadav was denied permission to go inside the J.K.N.R. and International Center to offer floral tribute to Freedom Fighter J.K.N.R. * Poster girl of Kerala's adult literacy program, K.R.Y.H.A. Amma, died at the age of 101.
* India Afghanistan are battling each other in the ninth match of the ICC Cricket World Cup 2023 at the Orange JT Stadium in New Delhi today.
Sentence: {Sentence}
Bulleted:"""

response = palm.generate_text(
  **defaults,
  prompt=prompt
)
print(response.result)

Final Output

Below is the resulting output of our podcast. Most of the content is accurate except for spelling and names of proper nouns, such as Dharamshala and Lashkar-e-Taiba, etc.

LLM Output of Bulleted List . Source — LLM Output of Bulleted List

The Israeli army regained control of the Gaza border and warned the population to flee to neighboring Egypt.
PM Narendra Modi spoke with Israeli PM Benjamin Netanyahu. India strongly condemns terrorism.
CJI DY Chandrachud said SC’s role is not to micromanage issues. Local matters are best left to HC.
2 militants linked to Lashkarteayabah were killed in an encounter with security forces in J&K.
Chintanubhadi Haya was sentenced to life imprisonment for his involvement in betting and conspiring to murder his estranged wife, Hema Upadhyay in 2015.
ICC men’s World Cup 2023 has two matches slated for today.
Pakistan faces Sri Lanka in Hyderabad, while Bangladesh is taking on England in Hharamshalla.
New Zealand beat the Dutch to win their second game in the competition.

Conclusion

Powerful tools, LLMs can combine with other tools to generate quick prototypes, enabling us to test and experiment with various LLM use cases. Since LLM is a very new technology, its potential use and implementation require a lot of back-and-forth experiments. This is where tools like Maker Suite empower data science and analytics professionals to quickly bring their ideas into code using minimal time and effort and focusing on fine-tuning and improving the data and other essential elements.

Key Takeaways

We learned about the basic concept of PALM 2 API and its features.
Also understood about various model parameter settings and how to optimize them for a particular desired prompt output
We saw the different aspects of the Google Maker Suite tool and utilized it to generate our LLM code.
We used Whisper API and Palm API to generate bulleted summaries of relevant and accurate podcasts.

Frequently Asked Questions

Q1. Is PaLM API free to use?

A. Yes, the PALM API is open to the public for free use but production isn’t free.

Q2. What are the different models available in Maker Suite?

A. For now, Maker Suite only allows one model, Text-Bison.

Q3. Which is better, GPT-4 or PaLM API 2?

A. GPT-4 has around 1 trillion parameters compared to 540 B parameters of PaLM. Also, it supports multimodal features such as images as input and output. So GPT-4 offers more features and services.

Q4. Can we get responses in other languages?

A. PaLM supports responses in other languages but is available only in one model, which is not open for public review and is a paid service.

Q5. What are the Safety Settings in PaLM API?

A. The safety settings in Palm API prevent any violent, derogatory, Medical, or Sexual content in the model responses. In our podcast summary, we block violent content, but once we change the settings and reduce the filter, we can get proper output.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.