How to Convert Pandas to PySpark DataFrame ?

28 July 2024

3

In this article, we will learn How to Convert Pandas to PySpark DataFrame. Sometimes we will get csv, xlsx, etc. format data, and we have to store it in PySpark DataFrame and that can be done by loading data in Pandas then converted PySpark DataFrame. For conversion, we pass the Pandas dataframe into the CreateDataFrame() method.

Syntax: spark.createDataframe(data, schema)

Parameter:

data – list of values on which dataframe is created.

schema – It’s the structure of dataset or list of column names.

where spark is the SparkSession object.

Example 1: Create a DataFrame and then Convert using spark.createDataFrame() method

Python3

# import the pandas
import pandas as pd
 
# from  pyspark library import
# SparkSession
from pyspark.sql import SparkSession
 
# Building the SparkSession and name
# it :'pandas to spark'
spark = SparkSession.builder.appName(
  "pandas to spark").getOrCreate()
 
# Create the DataFrame with the help
# of pd.DataFrame()
data = pd.DataFrame({'State': ['Alaska', 'California',
                               'Florida', 'Washington'],
                      
                     'city': ["Anchorage", "Los Angeles",
                              "Miami", "Bellevue"]})
 
# create DataFrame
df_spark = spark.createDataFrame(data)
 
df_spark.show()

Output:

Example 2: Create a DataFrame and then Convert using spark.createDataFrame() method

In this method, we are using Apache Arrow to convert Pandas to Pyspark DataFrame.

Python3

import the pandas
import pandas as pd
 
# from  pyspark library import
# SparkSession
from pyspark.sql import SparkSession
 
# Building the SparkSession and name
# it :'pandas to spark'
spark = SparkSession.builder.appName(
  "pandas to spark").getOrCreate()
 
# Create the DataFrame with the help
# of pd.DataFrame()
data = pd.DataFrame({'State': ['Alaska', 'California',
                               'Florida', 'Washington'],
                      
                     'city': ["Anchorage", "Los Angeles",
                              "Miami", "Bellevue"]})
 
 
# enabling the Apache Arrow for converting
# Pandas to pySpark DF(DataFrame)
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
 
# Creating the DataFrame
sprak_arrow = spark.createDataFrame(data)
 
# Show the DataFrame
sprak_arrow.show()

Output:

Example 3: Load a DataFrame from CSV and then Convert

In this method, we can easily read the CSV file in Pandas Dataframe as well as in Pyspark Dataframe. The dataset used here is heart.csv.

Python3

# import the pandas library
import pandas as pd
 
# Read the Dataset in Pandas Dataframe
df_pd = pd.read_csv('heart.csv')
 
# Show the dataset here head()
# will return top 5 rows
df_pd.head()

Output:

Python3

# Reading the csv file in
# Pyspark DataFrame
df_spark2 = spark.read.option(
  'header', 'true').csv("heart.csv")
 
# Showing the data in the form of
# table and showing only top 5 rows
df_spark2.show(5)

Output:

We can also convert pyspark Dataframe to pandas Dataframe. For this, we will use DataFrame.toPandas() method.

Syntax: DataFrame.toPandas()

Returns the contents of this DataFrame as Pandas pandas.DataFrame.

Python3

# Convert Pyspark DataFrame to
# Pandas DataFrame by toPandas()
# Function head() will show only
# top 5 rows of the dataset
df_spark2.toPandas().head()

Output:

How to Convert Pandas to PySpark DataFrame ?

Python3

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How Much Does ExpressVPN Cost? Pricing & Plans in 2024 by Raven Wu

10 Best VPNs to Watch Porn Anonymously in 2024 by Penka Hristovska

Galaxy S25 specs leak showcasing 12GB RAM but oddly low CPU

ROG Phone 9 and 9 Pro specs leak but don’t break major ground

Recent Comments

EDITOR PICKS

How Much Does ExpressVPN Cost? Pricing & Plans in 2024 by Raven Wu

10 Best VPNs to Watch Porn Anonymously in 2024 by Penka Hristovska

Galaxy S25 specs leak showcasing 12GB RAM but oddly low CPU

POPULAR POSTS

How Much Does ExpressVPN Cost? Pricing & Plans in 2024 by Raven Wu

10 Best VPNs to Watch Porn Anonymously in 2024 by Penka Hristovska

Galaxy S25 specs leak showcasing 12GB RAM but oddly low CPU

POPULAR CATEGORY

ABOUT US

FOLLOW US