Thursday, December 26, 2024
Google search engine
HomeLanguagesPython | Sorting URL on basis of Top Level Domain

Python | Sorting URL on basis of Top Level Domain

Given a list of URL, the task is to sort the URL in the list based on the top-level domain.
A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet. Example – org, com, edu.
This is mostly used in a case where we have to scrap the pages and sort URL according to top-level domain. It is widely used in open-source projects and serves as handy snippet for use.

Input :
url = ["https://www.isb.edu", "www.google.com", 
"http://cyware.com", "https://www.gst.in", 
"https://www.coursera.org", "https://www.create.net", 
"https://www.ontariocolleges.ca"]

Output :
['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu', 
'https://www.gst.in', 'https://www.create.net',
 'https://www.coursera.org']

Explanation:
The Tld for the above list is in sorted order
['.ca','.com','.com','.edu','.in','.net','.org']

Below are some ways to do the above task.

Method 1: Using sorted
You can split the input and then use sorting to sort according to TLD.




#Python code to sort the URL in the list based on the top-level domain.
  
#Url list initialization
Input = ["https://www.isb.edu", "www.google.com", "http://cyware.com",
  
#Function to sort in tld order
def tld(Input):
    return Input.split('.')[-1]
  
#Using sorted and calling function
Output = sorted(Input,key=tld)
  
#Printing output
print("Initial list is :")
print(Input)
print("sorted list according to TLD is")
print(Output)


Initial list is :

['https://www.isb.edu', 'www.google.com', 'http://cyware.com',
 'https://www.gst.in', 'https://www.coursera.org', 
'https://www.create.net', 'https://www.ontariocolleges.ca']

Sorted list according to TLD is :

['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu',
 'https://www.gst.in', 'https://www.create.net', 'https://www.coursera.org']

Method 2: Using Lambda
The most concise and readable way to sort the URL in the list based on the top-level domain is using lambda.




#Python code to sort the URL in the list based on the top-level domain.
  
#Url list initialization
Input = ["https://www.isb.edu", "www.google.com", "http://cyware.com",
  
#Using lambda and sorted 
Output = sorted(Input,key=lambda x: x.split('.')[-1])
  
#Printing output
print("Initial list is :")
print(Input)
print("sorted list according to TLD is")
print(Output)


Initial list is :

['https://www.isb.edu', 'www.google.com', 'http://cyware.com',
 'https://www.gst.in', 'https://www.coursera.org', 
'https://www.create.net', 'https://www.ontariocolleges.ca']

Sorted list according to TLD is :

['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu',
 'https://www.gst.in', 'https://www.create.net', 'https://www.coursera.org']

Method 3: Using reversed
Reversing the input and splitting it and then applying a sort to sort URL according to TLD




#Python code to sort the URL in the list based on the top-level domain.
  
#Url list initialization
Input = ["https://www.isb.edu", "www.google.com", "http://cyware.com",
  
#Internal function for reversed
def internal(string):
    return list(reversed(string.split('.')))
  
#Using sorted and calling internal for reversed
Output = sorted(Input, key=internal)
  
#Printing output
print("Initial list is :")
print(Input)
print("sorted list according to TLD is")
print(Output)


Initial list is :

['https://www.isb.edu', 'www.google.com', 'http://cyware.com',
 'https://www.gst.in', 'https://www.coursera.org', 
'https://www.create.net', 'https://www.ontariocolleges.ca']

Sorted list according to TLD is :

['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu',
 'https://www.gst.in', 'https://www.create.net', 'https://www.coursera.org']

RELATED ARTICLES

Most Popular

Recent Comments