The NCBI provides an online search system named Entrez. This provides access to a wide range of databases of the molecular biology and it also provides an integrated global query system which supports the boolean operators and the field search. The results are returned from all databases containing information like number of hits, links to originating database, etc from each database.
Functions used
Biopython Entrez comes equipped with 2 methods to perform search operation on databases:
- Biopython has an Entrez specific method named esearch() to search any one of the Entrez databases. It accepts to positional parameters database and the term which we have to search. If wrong database is assigned then it will raise an error.
Syntax:
Bio.Entrez.esearch(database, term)
- To search any query across all the databases, egquery() method is used. It is similar to the Entrez.esearch() methods except it only takes the term parameter skipping the database parameter.
Syntax:
Bio.Entrez.egquery(term)
Approach
- Import the required modules.
- Set your email to identify who is connected with the database.
- Set the Entrez tool parameter, it is Biopython by default.
- Use any of the methods provided above with appropriate parameters.
- The data returned will be in XML format, so to get this data in python object Entrez.read() method is used to read the object
- Read the information provided.
Implementation using both methods is given below:
Example 1: Using esearch()
Python3
# Import libraries from Bio import Entrez # Setting email Entrez.email = 'jeetesh1@yopmail.com' # Setting Entrez tool parameter Entrez.tool = 'Demoscript' # Searching for database info = Entrez.esearch(db = "nucleotide" , term = "genome" ) # reading records record = Entrez.read(info) # Showing records print (record) |
Output:
Example 2: Using egquery()
Python3
# Import libraries from Bio import Entrez # Setting email Entrez.email = 'jeetesh1@yopmail.com' # Setting Entrez tool parameter Entrez.tool = 'Demoscript' # Searching for database info = Entrez.egquery(term = "genome" ) record = Entrez.read(info) for row in record[ "eGQueryResult" ]: print (row[ "DbName" ], row[ "Count" ]) |
Output :