Introduction
Welcome to Mobilize.Net SnowConvert for Apache Spark (Scala). Let us be your guide on the road to a successful migration.

What is SnowConvert for Apache Spark (Scala)?

SnowConvert is a software that understands Scala, identifies and transforms the usages of the Apache Spark API to their Snowpark functional equivalent.

SnowConvert Terminology

Here are a few terms/definitions, so you know what we mean when we start dropping them all over the documentation:
  • SnowConvert: A software that converts, securely and automatically, your Apache Spark (Scala) project into the Snowflake's Snowpark.
  • Conversion/transformation rule: Rules that allow SnowConvert to convert from a portion of source code to the expected target code.
  • Parse: An initial process done by SnowConvert to understand the source code and build up an internal data structure to process conversion rules.
In the following few pages, you'll learn more about the conversions capabilities of SnowConvert for Apache Spark (Scala). If you're ready to start, visit the Getting Started page in this documentation. For more information about SnowConvert in general, visit our SnowConvert for Apache Spark (Scala) information page.

Code Conversions

Spark to Snowpark

SnowConvert for Spark takes in scala source code and converts uses of Spark API in the source code to the corresponding Snowpark equivalence.

Example of Spark to Snowpark

Apache Spark Code
Here's an example of the conversion of a simple Spark Application that reads data, filter, join and calculate average, and shows results:
1
import org.apache.spark.sql._
2
import org.apache.spark.sql.functions._
3
import org.apache.spark.sql.SparkSession
4
5
object SimpleApp {
6
def avgJobSalary(session: SparkSession, dept: String) {
7
val employees = session.read.csv("path/data/employees.csv")
8
val jobs = session.read.csv("path/data/jobs.csv")
9
10
val jobsAvgSalary = employees.
11
filter($"Department" === dept).
12
join(jobs).
13
groupBy("JobName").
14
avg("Salary")
15
16
// Salaries in department
17
jobsAvgSalary.select(collect_list("Salary")).show()
18
19
// avg Salary
20
jobsAvgSalary.show()
21
}
22
}
Copied!
The Converted Snowflake Code:
1
import com.snowflake.snowpark._
2
import com.snowflake.snowpark.functions._
3
import com.snowflake.snowpark.Session
4
5
object SimpleApp {
6
def avgJobSalary(session: Session, dept: String) {
7
val employees = session.read.csv("path/data/employees.csv")
8
val jobs = session.read.csv("path/data/jobs.csv")
9
10
val jobsAvgSalary = employees.
11
filter($"Department" === dept).
12
join(jobs).
13
groupBy("JobName").
14
avg("Salary")
15
16
// Salaries in department
17
jobsAvgSalary.select(array_agg("Salary")).show()
18
19
// avg Salary
20
jobsAvgSalary.show()
21
}
22
}
Copied!
As you can see, most of the structure is the same. But there are some cases where a more significant change was needed, for example:
  • the import of com.snowflake.snowpark package
  • the change from SparkSession to Session
  • or the call from collect_list to array_agg.