spark-app-build

Building Spark applications thru Build Tools
Building Spark applications thru Build Tools

Building Jars thru tools like sbt & mill

1. Create Scala project in IntelliJ
File > New Project > Scala

Project Name: Spark-Eg
Build System: sbt
SBT Version: 1.10.1
Scala: 2.12.13

2. This will generate a Scala project structure

spark-eg
|---src
|   |---main
|   |   |---scala
|   |---test
|       |---scala
|---target
|---build.sbt
|---external libs

3. Create a folder structure in src/main/scala like com.test.sparkeg1
4. Create a Scala object in sparkeg1 package.

package com.test.sparkeg
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{DataTypes, IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.implicits._

object SparkEg1 extends App {
print("Starting Spark Session.")

val spark = SparkSession
.builder()
.master("local[*]")
.appName("spark_demo")
.getOrCreate()

println(spark.version)

val cols = Seq("language","count")
val data = Seq(("Java",3434),("Python",5656))

val rdd1 = spark.sparkContext.parallelize(data)
val df1 = spark.createDataFrame(rdd1).toDF()
df1.show()
spark.close()
}

5. Edit build.sbt config

//extension attached to the generated jar
thisProject / version := "0.1.0-snapshot"

thisProject / organization := "ExamplesWorld"

scalaVersion := "2.12.13"

version := "0.1.0"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.8"
// https://mvnrepository.com/artifact/org.fusesource.jansi/jansi
libraryDependencies += "org.fusesource.jansi" % "jansi" % "1.12"

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.2.0")

lazy val proj = project.in(file("spark-eg"))
.enablePlugins(/* any plugins*/)
.settings("
mainClass := Some("com.test.sparkeg.SparkEg1"),//main class name, will be added to the manifest file
name := "sparkeg1"//jar file name
)

6. Right click on Project, spark-eg > Module settings > Platform settings > Global libraries
Add Scala library 2.12.13 if not mentioned.
7. For more SBT settings
File > Settings > Build, Execution, Deployment > Build Tools > sbt
8. Build Project, we can find the snapshot jar in the target > scala 2.12 > spark-eg_2.12_0.1.0.jar
Alternatively we can find one more jar in module specific folder in the project structure > target > scala-2.12 > sparkeg1_2.12-0.1.0-SNAPSHOT.jar
9. Alternatively we can find sbt shell cli, we can compile, build and run jar from it.
10. sbt cli commands:
clean - deletes the classes directory
compile - compiles the source files
package - builds jar file
run - run the jar file
11. We can add more modules (right click on project root > New > Module) and can have separate build file(build.sbt) per module.
12. Project structure after building app from sbt.

spark-eg(project root)
|---project
|---spark-eg
|   |---target
|       |---scala-2.12
|           |---sparkeg1_2.12-0.1.0-SNAPSHOT.jar(generated jar)
|---src
|   |---main
|   |   |---scala
|   |       |---com.test.sparkeg
|   |           |---SparkEg1
|   |---test
|       |---scala
|---target
|   |---scala-2.12
|       |---spark-eg_2.12-0.1.0.jar(generated jar)
|---build.sbt
|---external libs

Compiled on WEDNESDAY, 21-AUGUST-2024, 05:24:07 AM IST

Comments

Popular posts from this blog

hadoop-installation-ubuntu

jenv-tool

hive-installation-in-ubuntu