This article was published as a part of the Data Science Blogathon
In this article, we will learn to connect the Snowflake database with Python using an external browser authentication and key pair authentication. Also, we will learn how to install the snowflake python connector and snowflake SQLAlchemy package to perform the snowflake data read and write operations.
Introduction to Snowflake
If you are reading this post then you might be aware of the snowflake database. It is a cloud computing-based data warehouse that supports SQL for its operations. It is getting extremely popular due to its scalability, multi-language support, ability to handle massive data and the list goes on. It is compatible with Amazon web services, Microsoft Azure, and Google Cloud Platform.
Image is taken from https://www.snowflake.com/
In this article, our main focus is not on the snowflake DB here but the connection of snowflake with python and few errors which we encountered while connecting snowflake to python.
Different ways of connecting Python to Snowflake DB
Here I have given a quick overview of how we can connect snowflake to python using the snowflake connector, SQLAlchemy engine, and private key. I have also provided the references(from snowflake documentation) at the end of the article if you wish to explore more on this topic. Let us check the different ways of connections in detail.
Python to Snowflake DB connection Using Snowflake Connector with external browser authentication
First of all, we need to install pandas supported python connector using the below command via Jupyter notebook
!pip install snowflake-connector-python[pandas]
If you are working on your client’s environment then you need to provide the below details in order to connect your python to snowflake DB. An important point to note here is that password is not required in the case of SSO login when we provide authenticator as ‘externalbrowser‘. Google Chrome is the most compatible browser for this operation.
import snowflake.connector import pandas as pd import numpy as np ctx = snowflake.connector.connect( account = '<account name>', user = '<username>', schema = '<schema name>', warehouse='<warehouse name>', role = '<As specified by Snowflake Admin>', authenticator='externalbrowser', ) cur = ctx.cursor()
Now we can perform multiple operations like read or write data from snowflake to python and vice versa with the help of this cursor object.
Python to Snowflake DB connection Using Snowflake Sqlalchemy with external browser authentication
Since we have installed the python connector earlier now we will install the Snowflake SQLAlchemy package using the below command via Jupyter notebook
!pip install --upgrade snowflake-sqlalchemy
Assuming you are working in the client’s environment and using SSO external browser authentication below query is handy for connection.
import pandas as pd import numpy as np from sqlalchemy import create_engine from snowflake.sqlalchemy import URL url = URL( account = '<account name>', user = '<username>', database = '<database name> , schema = '<schema name>', warehouse= '<warehouse name>', role = '<As specified by Snowflake Admin>', authenticator='externalbrowser', ) engine = create_engine(url) connection = engine.connect()
- Data Read and Write operation
Now we can fetch the data from the snowflake DB table in python data frame via the below simple commands
query = '''select * from <snowflake DB tablename>''' data = pd.read_sql(query, connection)
Similarly, we can also write python data to snowflake tables as below. This operation will truncate and load the snowflake table. We can use ‘append’ in place of ‘replace’ if we want delta load.
data.to_sql('<snowflake table>', engine, if_exists='replace', index=False, index_label=None)
- Snapshot of connection using ‘externalbrowser‘ and data load
In the below snapshot, you can see how to connect python to snowflake using external browser authentication. Connection details are hidden as it was client-specific data.
Python to Snowflake DB connection Using Key Pair Authentication and reuse the existing connection to create an engine for SQLAlchemy
In addition to regular details (e.g. account name, user, database, etc) we need two other important details. One of them is of the path where we have stored the private key in snowflake and second the password to access the key.
Below is the python code which is required when we need to connect snowflake using a private key generated earlier in the snowflake environment. We will use this connection to create an SQLAlchemy engine as well due to which it would be easy to read or write data using SQL query.
from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives.asymmetric import rsa from cryptography.hazmat.primitives.asymmetric import dsa from cryptography.hazmat.primitives import serialization with open(path, "rb") as key: p_key= serialization.load_pem_private_key(key.read(),password='<password>'.encode(),backend=default_backend()) pkb = p_key.private_bytes(encoding=serialization.Encoding.DER, format=serialization.PrivateFormat.PKCS8,encryption_algorithm=serialization.NoEncryption()) ctx = snowflake.connector.connect(user=user,account=account,private_key=pkb,warehouse=warehouse,database=database) if ctx: eng = sqlalchemy.create_engine(url,poolclass=sqlalchemy.pool.StaticPool,creator = lambda:ctx) else: eng=sqlalchemy.create_engine(url,creator=get_connect) con = eng.connect() url = URL(account = '<accountname>',user = '<username>',database = '<databasename>',warehouse= '<warehouse>',role = '<as specified by snowflake admin>')
Error during key pair connection
In case you encounter the below error message after executing the above code
“Could not deserialize key data“.
Without fail, you need to check the path you have provided where your RSA key resides. Sometimes it happens that your key is corrupted or not accessible due to any reason then also you can get the above error. In that case, you may notify your snowflake DBA to generate a new RSA key.
We can fetch data from snowflake tables and store them as pandas data frames using the below code.
cur = ctx.cursor() query = '''select * from <snowflake DB tablename>''' cur.execute(query) data = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description]) cur.close()
Closing Notes
Always remember to close the cursor when you have executed all the commands related to that cursor and you no longer needed the connection.
In case we need to access two different schema tables then we can also create two different cursors to execute different queries related to those two tables.
Hope you found this article valuable in your quest of connecting Python and the Snowflake database.
References
https://docs.snowflake.com/en/user-guide/sqlalchemy.html
https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
https://docs.snowflake.com/en/user-guide/python-connector-example.html
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.