Implementing News Parser using Template Method Design Pattern in Python

23 July 2024

0

While defining algorithms, programmers often neglect the importance of grouping the same methods of different algorithms. Normally, they define algorithms from start to end and repeat the same methods in every algorithm. This practice leads to code duplication and difficulties in code maintenance – even for a small logic change, the programmer has to update the code in several places.

A common example is building authentication using social network accounts. The authentication process using different social network accounts are similar in general but slightly varies in implementation level. If you are defining the algorithms for different accounts from start to end without separating the common methods, it leads to code duplication and difficulties in code maintenance.

Template Design Pattern is a design pattern in Python that provides a dedicated pattern to prevent code duplication. In this design pattern, the same methods will be implemented in the abstract class, and the algorithms that are derived from this abstract class can reuse the methods. It has a template method that facilitates the method call for every derived algorithm. Let’s look into the benefits of the template design pattern.

It allows a class to control and expose its parts
It provides great extensibility
Avoids code duplication
Ease of code maintenance

News Parser Implementation

Let’s implement a news parser to get the latest news from different sites. Here, we consider the RSS Feed and Atom Feed to fetch the latest news. Both of these feeds are based on XML protocol, with a few differences in XML structure. You can check the XML structure of RSS and Atom.

Here, our template design pattern consists of two concrete classes – YahooNewsParser and GoogleNewsParser – and these are derived from an abstract class called AbstractNewsParser. This abstract class contains the template method – print_latest_news() – that calls the primitive operation methods. Here, the primitive operation methods include both common algorithms as well as different algorithms, in which common algorithms are defined in the abstract class itself, and different algorithms are redefined in the respective concrete classes.

NewsParser

From the above diagram, it is clear that get_url() and parse_content() primitive operation methods are redefined in respective concrete classes. This is because the URL and XML structure differs w.r.t to the feed. So, it is necessary to redefine these methods to achieve the required functionalities. The other primitive methods such as get_raw_content() and content_crop() are common methods and are defined in the abstract class itself. The template method, print_lates_news(), is responsible for calling these primitive methods. Let’s get into the code implementation.

Python3

import abc 
import urllib.request 
from xml.dom.minidom import parseString 
  
  
class AbstractNewsParser(object, metaclass=abc.ABCMeta): 
    def __init__(self): 
        
        # Restrict creating abstract class instance 
        if self.__class__ is AbstractNewsParser: 
            raise TypeError('Abstract class cannot be instantiated') 
  
    def print_latest_news(self): 
        """ A Template method, returns 3 latest news for every 
    news website """
        url = self.get_url() 
        raw_content = self.get_raw_content(url) 
        content = self.parse_content(raw_content) 
        cropped = self.content_crop(content) 
  
        for item in cropped: 
            print('Title: ', item['title']) 
            print('Content: ', item['content']) 
            print('Link: ', item['link']) 
            print('Published ', item['published']) 
            print('Id: ', item['id']) 
  
    @abc.abstractmethod 
    def get_url(self): 
        pass
  
    def get_raw_content(self, url): 
        return urllib.request.urlopen(url).read() 
  
    @abc.abstractmethod 
    def parse_content(self, content): 
        pass
  
    def content_crop(self, parsed_content, max_items=3): 
        return parsed_content[:max_items] 
  
  
class YahooNewsParser(AbstractNewsParser): 
    def get_url(self): 
        return 'https://news.yahoo.com/rss/'
  
    def parse_content(self, raw_content): 
        yahoo_parsed_content = [] 
        dom = parseString(raw_content) 
  
        for node in dom.getElementsByTagName('item'): 
            yahoo_parsed_item = {} 
            try: 
                yahoo_parsed_item['title'] = node.getElementsByTagName('title')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                yahoo_parsed_item['title'] = None
  
            try: 
                yahoo_parsed_item['content'] = node.getElementsByTagName('description')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                yahoo_parsed_item['content'] = None
  
            try: 
                yahoo_parsed_item['link'] = node.getElementsByTagName('link')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                yahoo_parsed_item['link'] = None
  
            try: 
                yahoo_parsed_item['id'] = node.getElementsByTagName('guid')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                yahoo_parsed_item['id'] = None
  
            try: 
                yahoo_parsed_item['published'] = node.getElementsByTagName('pubDate')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                yahoo_parsed_item['published'] = None
  
            yahoo_parsed_content.append(yahoo_parsed_item) 
  
        return yahoo_parsed_content 
  
  
class GoogleNewsParser(AbstractNewsParser): 
    def get_url(self): 
        return 'https://news.google.com/atom'
  
    def parse_content(self, raw_content): 
        google_parsed_content = [] 
        dom = parseString(raw_content) 
  
        for node in dom.getElementsByTagName('entry'): 
            google_parsed_item = {} 
  
            try: 
                google_parsed_item['title'] = node.getElementsByTagName('title')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                google_parsed_item['title'] = None
  
            try: 
                google_parsed_item['content'] = node.getElementsByTagName('content')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                google_parsed_item['content'] = None
  
            try: 
                google_parsed_item['link'] = node.getElementsByTagName('href')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                google_parsed_item['link'] = None
  
            try: 
                google_parsed_item['id'] = node.getElementsByTagName('id')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                google_parsed_item['id'] = None
  
            try: 
                google_parsed_item['published'] = node.getElementsByTagName('title')[0].\ 
                    childNodes[0].nodeValue 
            except IndexError: 
                google_parsed_item['published'] = None
  
            google_parsed_content.append(google_parsed_item) 
  
        return google_parsed_content 
  
  
class NewsParser(object): 
    def get_latest_news(self): 
        yahoo = YahooNewsParser() 
        print('Yahoo: \n', yahoo.print_latest_news()) 
        print() 
        print() 
        google = GoogleNewsParser() 
        print('Google: \n', google.print_latest_news()) 
  
  
if __name__ == '__main__': 
    newsParser = NewsParser() 
    newsParser.get_latest_news() 

Output

Yahoo News Parser

Google News Parser

A template design pattern provides the best design solution when you have an algorithm that has the same behavior with a different implementation process. It helps to design a standard structure for an algorithm in such a way that the derived classes can redefine the steps without changing the structure.

Implementing News Parser using Template Method Design Pattern in Python

News Parser Implementation

Python3

Output

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

I tried a Xiaomi mid-ranger for the first time in years, and I’m glad the Pixel 8a exists in the US

Recent Comments

EDITOR PICKS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR POSTS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR CATEGORY

ABOUT US

FOLLOW US