Create a Java Maven Application for Apache Spark in Eclipse
In this blog post , you learn how to create an Apache Spark application written in JAVA using Apache Maven with Eclipse IDE. This article uses Apache Maven as the build system.
Creating the Java Spark Application in Eclipse involves the following:
- Use Maven as the build system.
- Update Project Object Model (POM) file to include the Spark dependencies.
- Write your application in JAVA
- Generate a JAR file that can be submitted to Spark Cluster.
- Submit spark applications using spark-submit.
Prerequisites:
- Apache Spark installed on your machine.
- Oracle JAVA Development Kit.This article used openjdk version 1.8.0_275
- A JAVA IDE.This article uses Eclipse IDE.
Use Eclipse to create new application:
- Start Eclipse and select New Project.
- Select new Maven Project from the list and select next.
- Select Create a simple project (skip archetype selection) checkbox and click on next.
- Provide relevant values for GroupId, and ArtifactId. Name The following values are used in this tutorial:
- Group Id: com.spark.example
- Artifact Id : SparkSimpleApp
5 . Click on Finish
6 . Once the project is imported on the left pane expand the project and open pom.xml file.
7 . Now we update the pom.xml file to define the dependencies for our Spark Java Application.
8 . Within <project>\<properties>
add the following segments:
<maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target>
9 . Within <project>\<dependencies>
add the following segments:
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.12</artifactId><version>2.4.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.12</artifactId><version>2.4.0</version></dependency>
10 . Save changes to pom.xml and it will automatically install the dependencies. (Note that it may take some time to install)
11. Once the dependencies are installed,in the left panel you can see Maven Dependencies, and once you expand it you can view all the dependencies installed.
12 . From the left pane navigate to src/main/java, right click and select New Java Class. Provide appropriate class Name,and click on Finish.
13. Replace the existing sample code with the following code and save the changes.The following code reads data from employees.json file.Prints the schema and the actual data and write the data into new json file.
import org.apache.spark.SparkConf;import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;import org.apache.spark.sql.SparkSession;public class Main {public static void main(String[] args) {// TODO Auto-generated method stubSparkSession spark = SparkSession.builder().appName("json-demo").getOrCreate();Dataset<Row> employees = spark.read().json("/home/rahul/Desktop/eclipse-ee/employees.json");employees.printSchema();employees.show();employees.write().json("file:///home/rahul/Desktop/eclipse-ee/employees_spark.json");}}
14. Once you save the file, right click on pom.xml file and select Run As Maven Build. This will create the JAR file in the target folder.
Run the application on the Apache Spark cluster
- Copy the application JAR to anywhere in your system.We need the path of the location where it is copied.
- Run the following command from terminal
spark-submit --class "Main" --master local[2] "<path-to-your-jar-file>"
That’s it for this blog post. In this article, we covered how to create an Apache Spark JAVA application.