How to extract youtube data in Python?

27 July 2024

2

YouTube statistics of a YouTube channel can be used for analysis and it can also be extracted using python code. A lot of data like viewCount, subscriberCount, and videoCount can be retrieved. This article discusses 2 ways in which this can be done.

Method 1: Using YouTube API

First we need to generate an API key. You need a Google Account to access the Google API Console, request an API key, and register your application. You can use Google APIs page to do so.

To extract data we need the channel id of the YouTube channel whose stats we want to view. To get the channel id visit that particular YouTube channel and copy the last part of the URL (In the examples given below channel id of GeeksForGeeks channel are used).

Approach

First create youtube_statistics.py
In this file extract data using YTstats class and generate a json file will all the data extracted.
Now create main.py
In main import youtube_statistics.py
Add API key and channel id
Now using the first file data corresponding to the key given will be retrieved and saved to json file.

Example :

Code for main.py file :

Python3

from youtube_statistics import YTstats
 
# paste the API key generated by you here
API_KEY = "AIzaSyA-0KfpLK04NpQN1XghxhSlzG-WkC3DHLs"
 
 # paste the channel id here
channel_id = "UC0RhatS1pyxInC00YKjjBqQ"
 
yt = YTstats(API_KEY, channel_id)
yt.get_channel_statistics()
yt.dump()

Code for youtube_statistics.py file :

Python3

import requests
import json
 
 
class YTstats:
 
    def __init__(self, api_key, channel_id):
        self.api_key = api_key
        self.channel_id = channel_id
        self.channel_statistics = None
 
    def get_channel_statistics(self):
        url = f'https://www.googleapis.com/youtube/v3/channels?part=statistics&id={self.channel_id}&key={self.api_key}'
 
        json_url = requests.get(url)
        data = json.loads(json_url.text)
 
        try:
            data = data["items"][0]["statistics"]
        except:
            data = None
 
        self.channel_statistics = data
        return data
 
    def dump(self):
        if self.channel_statistics is None:
            return
 
        channel_title = "GeeksForGeeks"
        channel_title = channel_title.replace(" ", "_").lower()
 
        # generate a json file with all the statistics data of the youtube channel
        file_name = channel_title + '.json'
        with open(file_name, 'w') as f:
            json.dump(self.channel_statistics, f, indent=4)
        print('file dumped')

Output:

Method 2: Using BeautifulSoup

Beautiful Soup is a Python library for pulling data out of HTML and XML files. In this approach we will use BeautifulSoup and Selenium to scrape data from YouTube channels. This program will tell the views, time since posted, title and urls of the videos and print them using Python’s formatting.

Approach

Import module
Provide url of the channel whose data is to be fetched
Extract data
Display data fetched.

Example:

Python3

# import required packages
from selenium import webdriver
from bs4 import BeautifulSoup
 
# provide the url of the channel whose data you want to fetch
urls = [
    'https://www.youtube.com/channel/UC0RhatS1pyxInC00YKjjBqQ'
]
 
 
def main():
    driver = webdriver.Chrome()
    for url in urls:
        driver.get('{}/videos?view=0&sort=p&flow=grid'.format(url))
        content = driver.page_source.encode('utf-8').strip()
        soup = BeautifulSoup(content, 'lxml')
        titles = soup.findAll('a', id='video-title')
        views = soup.findAll(
            'span', class_='style-scope ytd-grid-video-renderer')
        video_urls = soup.findAll('a', id='video-title')
        print('Channel: {}'.format(url))
        i = 0  # views and time
        j = 0  # urls
        for title in titles[:10]:
            print('\n{}\t{}\t{}\thttps://www.youtube.com{}'.format(title.text,
                                                                   views[i].text, views[i+1].text, video_urls[j].get('href')))
            i += 2
            j += 1
 
 
main()

Output

How to extract youtube data in Python?

Method 1: Using YouTube API

Python3

Python3

Method 2: Using BeautifulSoup

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

I tried a Xiaomi mid-ranger for the first time in years, and I’m glad the Pixel 8a exists in the US

Recent Comments

EDITOR PICKS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR POSTS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR CATEGORY

ABOUT US

FOLLOW US