How to get the Daily News using Python

26 July 2024

4

In this article, we are going to see how to get daily news using Python. Here we will use Beautiful Soup and the request module to scrape the data.

Modules needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.

pip install bs4

requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.

pip install requests

Stepwise Implementation:

Step 1: First of all, make sure to import these libraries.

Python3

import requests 
from bs4 import BeautifulSoup

Step 2: Then to get the HTML contents of https://www.bbc.com/news, add these 2 lines of code:

Python3

url='https://www.bbc.com/news'

response = requests.get(url)

Step 3: Get specific HTML tag

In order to find the HTML tags within which news headlines are contained, head over to https://www.bbc.com/news and inspect a news headline by right-clicking it and clicking “inspect”:

You will see that all headlines are contained within “<h3>” tags. Therefore, to scrape all “<h3>” tags within this webpage, add these lines of code to your script:

First, we define “soup” as the HTML content of the BBC news webpage. Next, we define “headlines” as an array of all “<h3>” tags found within the webpage. Finally, the script paddles through the “headlines” array and displays all of its contents one by one, ridding each element of its outerHTML and displaying only its text contents using the “text.strip()” method.

Python3

soup = BeautifulSoup(response.text, 'html.parser') 
headlines = soup.find('body').find_all('h3') 
for x in headlines: 
    print(x.text.strip())

Below is the implementation:

Python3

import requests 
from bs4 import BeautifulSoup 
  
url = 'https://www.bbc.com/news'
response = requests.get(url) 
  
soup = BeautifulSoup(response.text, 'html.parser') 
headlines = soup.find('body').find_all('h3') 
for x in headlines: 
    print(x.text.strip()) 

Output:

Cleaning the data

You might have noticed that your output contains duplicate news headlines and text contents that aren’t news headlines.

Create a list of all the text elements you want to get rid of:

unwanted = [‘BBC World News TV’, ‘BBC World Service Radio’, ‘News daily newsletter’, ‘Mobile app’, ‘Get in touch’]

Then print text elements only if they are not in this list by putting:

print(x.text.strip())

Below is the implementation:

Python3

import requests 
from bs4 import BeautifulSoup 
  
url = 'https://www.bbc.com/news'
response = requests.get(url) 
  
soup = BeautifulSoup(response.text, 'html.parser') 
headlines = soup.find('body').find_all('h3') 
unwanted = ['BBC World News TV', 'BBC World Service Radio', 
            'News daily newsletter', 'Mobile app', 'Get in touch'] 
  
for x in list(dict.fromkeys(headlines)): 
    if x.text.strip() not in unwanted: 
        print(x.text.strip()) 

Output:

How to get the Daily News using Python

Modules needed

Stepwise Implementation:

Python3

Python3

Python3

Below is the implementation:

Python3

Cleaning the data

Below is the implementation:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

OnePlus’ decision to ditch Samsung’s OLED screens could backfire in the US

Recent Comments

EDITOR PICKS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR POSTS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR CATEGORY

ABOUT US

FOLLOW US