Cara Install dan Menjalankan Apache Spark pada MacOS Catalina
(Ingat! Ini adalah tutorial bagi pengguna MacOS; buat pengguna Windows tutorialnya ada di Membuat dan Menjalankan Aplikasi Apache Spark dengan Intellij IDEA pada OS Windows )
#1 xcode-select
Gunakan command berikut untuk install xcode-select, kemudian cek versi yang terinstall:
Wayans-MacBook-Pro:~ wmwijaya$ install xcode-select
Wayans-MacBook-Pro:~ wmwijaya$ xcode-select -v
xcode-select version 2373.
#2 Java (silakan ikuti prosedur install Java di https://www.oracle.com/technetwork/java/javase/using-jdk-jre-macos-catalina-5781620.html ).
Pastikan versi Java yang sudah di-install dengan command berikut:
Wayans-MacBook-Pro:~ wmwijaya$ java -version
java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
Jika ada lebih dari satu versi Java yang terinstall di MacOs Anda, dan ingin switch ke versi tertentu (misal: Java versi 11 LTS), silakan simak langkah-langkahnya di http://www.teknologi-bigdata.com/2020/02/switch-versi-java-di-macos-catalina.html
#3 Homebrew (silakan ikuti prosedur install Homebrew di https://brew.sh/ )
Homebrew wajib ada karena kita akan install Apache Spark dengan Package Manager Homebrew.
#4 Scala
Gunakan command berikut untuk install Scala dengan Homebrew, kemudian cek versi yang ter-install:
Wayans-MacBook-Pro:~ wmwijaya$ brew install scala
Updating Homebrew...
==> Auto-updated Homebrew!
...
==> Downloading https://downloads.lightbend.com/scala/2.13.2/scala-2.13.2.tgz
######################################################################## 100.0%
==> Caveats
To use with IntelliJ, set the Scala home to:
/usr/local/opt/scala/idea
==> Summary
🍺 /usr/local/Cellar/scala/2.13.2: 41 files, 22.5MB, built in 24 secondsWayans-MacBook-Pro:~ wmwijaya$ scala -versionScala code runner version 2.13.2 -- Copyright 2002-2020, LAMP/EPFL and Lightbend, Inc.
Install Apache Spark dengan Homebrew
Jalankan command berikut untuk install Apache Spark dengan Homebrew:
Wayans-MacBook-Pro:~ wmwijaya$ brew install apache-spark
Updating Homebrew...
==> Downloading https://www.apache.org/dyn/closer.lua?path=spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
==> Downloading from https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
######################################################################## 100.0%
🍺 /usr/local/Cellar/apache-spark/2.4.5: 1,059 files, 250.9MB, built in 9 minutes 10 seconds
Kemudian verifikasi hasil install dengan perintah berikut:
Wayans-MacBook-Pro:~ wmwijaya$ spark-shell
20/05/13 23:49:25 WARN Utils: Your hostname, Wayans-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.24 instead (on interface en0)
20/05/13 23:49:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/05/13 23:49:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.1.24:4040
Spark context available as 'sc' (master = local[*], app id = local-1589384979970).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
Sampai disini kita sudah berhasil install Apache Spark pada MacOS Catalina, yang terinstal pada direktori:
/usr/local/Cellar/apache-spark/
Membuat dan Menjalankan Aplikasi Wordcount dengan Apache Spark
Nah, setelah Apache Spark berhasil di-install dengan benar dan tanpa error, berikutnya kita coba membuat dan menjalankan aplikasi Wordcount dengan Apache Spark.
Untuk keperluan ini, dibutuhkan Java IDE.
Disini kita pakai Intellij IDEA (silakan gunakan IDE kesukaan masing-masing).
Langkah-langkah membuat dan menjalankan program Wordcount dengan Apache Spark adalah sebagai berikut:
#1 Buat project baru > pilih Maven project > beri nama (disini pakai nama "spark-local-mapred").
#2 Edit file pom.xml seperti berikut:
- <?xml version="1.0" encoding="UTF-8"?>
- <project xmlns="http://maven.apache.org/POM/4.0.0"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
- <groupId>org.example</groupId>
- <artifactId>spark-local-mapred</artifactId>
- <version>1.0-SNAPSHOT</version>
- <properties>
- <maven.compiler.target>1.8</maven.compiler.target>
- <maven.compiler.source>1.8</maven.compiler.source>
- </properties>
- <dependencies>
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.11</artifactId>
- <version>2.4.5</version>
- </dependency>
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-sql_2.11</artifactId>
- <version>2.4.5</version>
- </dependency>
- </dependencies>
- </project>
#3 Buat file WordCount.java untuk membaca text file dari direktori "input", kemudian mengerjakan word count (menghitung jumlah kemunculan tiap kata dalam text file yang diproses), dan menampilkan hasilnya serta menyimpannya di direktori "output".
- package com.teknologibigdata.spark;
- //WordCount.java
- import org.apache.spark.sql.*;
- import scala.Tuple2;
- import java.io.BufferedReader;
- import java.io.IOException;
- import java.io.InputStreamReader;
- import java.util.ArrayList;
- import java.util.Arrays;
- import java.util.List;
- import java.util.regex.Pattern;
- public class WordCount {
- private static final Pattern SPACE = Pattern.compile(" ");
- public List<String> enStopwords = new ArrayList<>();
- public final SparkSession spark;
- public WordCount() throws IOException {
- spark = SparkSession
- .builder()
- .appName("WordCount")
- .master("local[1]")//for local standalone execution
- .getOrCreate();
- readStopwords();
- }
- private void readStopwords() throws IOException {
- BufferedReader bfr = new BufferedReader(
- new InputStreamReader(
- WordCount.class.getResourceAsStream("/en_stopwords.txt")
- )
- );
- String line = null;
- while ((line = bfr.readLine()) != null) {
- enStopwords.add(line);
- }
- }
- public static void main(String[] args) throws IOException {
- if (args.length < 2) {
- System.err.println("Usage: JavaWordCount <inputFile> <outputFile>");
- System.exit(1);
- }
- WordCount wc = new WordCount();
- List<String> stopwords = wc.enStopwords;
- Dataset<String> textDf = wc.spark.read().textFile(args[0]);
- textDf.show(10);
- Dataset<Row> wordCount = textDf
- .flatMap(line -> Arrays.asList(SPACE.split(line.replaceAll("\\W", " "))).iterator(), Encoders.STRING())
- .filter(str -> !str.isEmpty())
- .filter(str->!stopwords.contains(str))
- .map(word -> new Tuple2<>(word.toLowerCase(), 1L), Encoders.tuple(Encoders.STRING(), Encoders.LONG()))
- .toDF("word", "one")
- .groupBy("word")
- .sum("one").orderBy(new Column("sum(one)").desc())
- .withColumnRenamed("sum(one)", "count");
- wordCount.show(10);
- wordCount.write().format("csv").save(args[1]);
- }
- }
#4 Jalankan programnya dengan Program Arguments: input/silicon-valley.txt output (silicon-valley.txt adalah nama text file yang diproses).
Satu lagi..., jangan lupa kopi file en_stopwords.txt dari OpenNLP
Berikut adalah screenshot dari console Intellij IDEA saat program dijalankan:
|
|
Oh iya, project structure-nya adalah seperti berikut:
Project Structure |
Demikian, selamat mencoba!
Comments
https://kumaran.com/legacy-application-migration-services/
https://kumaran.com/mainframe-cobol-migration-nxtran/
https://kumaran.com/powerbuilder-migration/
https://kumaran.com/oracle-forms-migration/
Kumaran Systems
mainframe legacy modernization
powerbuilder migration
oracle forms modernization