🪅
Data Analytics
  • Apache Spark
    • Hardware
    • Distributed Computing
    • Data Wrangling with Spark
      • Data I/O
      • Spark DataFrames
      • Helpful Functions
Powered by GitBook
On this page
  • Reading & Writing Data
  • Imperative vs Declarative Programming
  1. Apache Spark
  2. Data Wrangling with Spark

Data I/O

Reading & Writing Data

Following code iluustrates how to read and write files using dataframes:

import pyspark
from pyspark import SparkConf
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Spark SQL example").getOrCreate()
print(spark.sparkContext.getConf().getAll())

path = "data/sparkify_log_small.json"
user_log = spark.read.json(path)

print(user_log.printSchema())
print(user_log.describe())
print(user_log.show(n=1))
print(user_log.take(5))

out_path = "data/sparkify_log_small.csv"
user_log.write.save(out_path, format="csv", header=True)

user_log_2 = spark.read.csv(out_path, header=True)

print(user_log_2.printSchema())
print(user_log_2.take(2))
print(user_log_2.select("userID").show())

Following are the details about the above code block:

  • First, SparkConf and SparkSession are imported

  • Since Spark is being used locally, both a sparkcontext and a sparksession are already running. The parameters can be updated, such our application's name.

  • Next, the SparkSession config is being printed.

  • After that, data is being read from a json file into a Spark Dataframe and then written to a csv file.

Imperative vs Declarative Programming

There are 2 different ways to manipulate data in Spark. The first is Imperative Programming which uses DataFrames and Python. Second is Declarative Programming using SQL.

Imperative programming is concerned about the "How" while Declarative programming cares about the "What". In most cases, Declarative systems are an abstraction layer over an Imperative system that takes care of figuring out the necessary steps to achieve the result.

PreviousData Wrangling with SparkNextSpark DataFrames

Last updated 3 years ago

Page cover image