In this article, we are going to get the value of a particular cell in the pyspark dataframe.
For this, we will use the collect() function to get the all rows in the dataframe. We can specify the index (cell positions) to the collect function
Creating dataframe for demonstration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data with 5 row values data = [[ "1" , "sravan" , "company 1" ], [ "2" , "ojaswi" , "company 2" ], [ "3" , "bobby" , "company 3" ], [ "4" , "rohith" , "company 2" ], [ "5" , "gnanesh" , "company 1" ]] # specify column names columns = [ 'Employee ID' , 'Employee NAME' , 'Company Name' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data,columns) # display dataframe dataframe.show() |
Output:
collect(): This is used to get all rows of data from the dataframe in list format.
Syntax: dataframe.collect()
Example 1: Python program that demonstrates the collect() function
Python3
# display dataframe using collect() dataframe.collect() |
Output:
[Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′),
Row(Employee ID=’2′, Employee NAME=’ojaswi’, Company Name=’company 2′),
Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′),
Row(Employee ID=’4′, Employee NAME=’rohith’, Company Name=’company 2′),
Row(Employee ID=’5′, Employee NAME=’gnanesh’, Company Name=’company 1′)]
Example 2: Get a particular row
In order to get a particular row, We can use the indexing method along with collect. In pyspark dataframe, indexing starts from 0
Syntax: dataframe.collect()[index_number]
Python3
# display dataframe using collect() print ( "First row :" ,dataframe.collect()[ 0 ]) print ( "Third row :" ,dataframe.collect()[ 2 ]) |
Output:
First row : Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′)
Third row : Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′)
Example 3: Get a particular cell
We have to specify the row and column indexes along with collect() function
Syntax: dataframe.collect()[row_index][column_index]
where, row_index is the row number and column_index is the column number
Here we access values from cells in the dataframe.
Python3
# first row - second column print ( "first row - second column :" , dataframe.collect()[ 0 ][ 1 ]) # Third row - Third column print ( "Third row - Third column :" , dataframe.collect()[ 2 ][ 1 ]) # Third row - Third column print ( "Third row - Third column :" , dataframe.collect()[ 2 ][ 2 ]) |
Output:
first row - second column : sravan Third row - Third column : bobby Third row - Third column : company 3