Apache Spark: Perangkat Lunak Analisis Terpadu untuk Big Data

August 31, 2018

Apache Spark adalah engine ( perangkat lunak ) analisis terpadu super cepat untuk memproses data dalam skala besar; meliputi Big Data dan machine learning. Secara lebih detailnya, Apache Spark dapat didefinisikan sebagai engine ( perangkat lunak ) untuk memproses data dalam skala besar secara in-memory, dilengkapi dengan API pengembangan yang elegan dan ekspresif guna memudahkan para pekerja data dalam mengeksekusi pekerjaan-pekerjaan yang membutuhkan perulangan akses yang cepat terhadap data yang diproses, seperti halnya streaming, machine learning, maupun SQL, secara efisien.

Apache Spark terdiri atas Spark Core ( inti ) dan sekumpulan library perangkat lunak. Inti dari Spark adalah distributed execution engine, dan API Java, Scala maupun Python disediakan sebagai platform untuk mengembangkan aplikasi ETL ( Extract, Transform, Load ) terdistribusi. Kemudian, library perangkat lunak tambahan, yang dibangun diatas inti ( core )-nya, memfasilitasi berbagai jenis pekerjaan yang berhubungan dengan streaming, SQL, dan machine learning.

Komponen Apache Spark (hortonworks.com)

Spark didesain untuk data science dan menyediakan abstraksi yang membuat data science menjadi lebih mudah. Para data scientist ( ilmuwan data ) sering menggunakan machine learning, yaitu sekumpulan teknik dan algorithma yang dapat belajar dari data-data yang diberikan. Algorithma-algorithma ini banyak yang sifatnya iterative ( melakukan perulangan kalkulasi ), sehingga kemampuan Spark untuk menempatkan data-data yang diproses pada cache di memory, berperan sangat besar dalam peningkatan kecepatan bagi pemrosesan data yang sifatnya iterative tersebut. Kemampuan Spark ini telah menjadikan Spark sebagai engine yang ideal bagi implementasi algorithma-algorithma machine learning. Berkaitan dengan hal ini, Spark juga menyertakan Mllib, library perangkat lunak yang menyediakan implementasi algorithma-algorithma machine learning untuk teknik-teknik data science yang sudah umum, seperti Classification, Regression, Collaborative Filtering, Clustering, and Dimensionality Reduction.

Sebagai perangkat lunak untuk memproses data dalam skala besar, Apache Spark memiliki sejumlah keunggulan, diantaranya:

Kecepatan. Apache Spark mampu bekerja 100 kali lebih cepat dibanding Hadoop. Berkat penggunaan state-of-the-art DAG scheduler, query optimizer, dan physical execution engine, Apache Spark dapat mencapai performa tinggi baik dalam hal pemrosesan data yang sifatnya batch maupun streaming.
Mudah Digunakan. Dapat menggunakan bahasa program Java, Scala, Python, R, dan SQL untuk mengembangkan aplikasi yang menggunakan Apache Spark. Spark menyediakan lebih dari 80 operator level tinggi yang dapat memudahkan pengembang untuk membangun aplikasi secara parallel. Apache Spark juga dapat digunakan secara interaktif dari shell Scala, Python, R, dan SQL.
Memiliki Cakupan yang Luas. Apache Spark menggabungkan SQL, streaming, dan analitik yang kompleks; menyediakan setumpuk library perangkat lunak meliputi SQL dan DataFrames, MLlib untuk machine learning, GraphX, dan Spark Streaming. Pengembang aplikasi dapat menggabungkan semua library ini dengan mulus dalam satu aplikasi yang sama.
Dapat dijalankan Dimana-mana. Apache Spark dapat dijalankan di Hadoop YARN, Apache Mesos, Kubernetes, dengan mode standalone maupun cluster, atau di platform cloud seperti EC2. Pada dasarnya, Spark dapat mengakses berbagai tipe sumber data seperti halnya HDFS, Apache Cassandra, Apache HBase, Apache Hive, dan ratusan sumber data lainnya.

Sejak peluncurannya, Apache Spark telah dengan cepat diadopsi oleh perusahaan-perusahaan dari berbagai jenis bidang industri. Raksasa dunia Internet seperti halnya Netflix, Yahoo!, dan eBay telah menjalankan Spark dalam skala super besar, secara kolektif memproses data dalam hitungan petabytes pada kluster yang terdiri atas 8000 nodes ( komputer ). Spark tumbuh dengan cepat menjadi komunitas open source terbesar di bidang Big Data, terdiri atas lebih dari 1000 kontributor dan 250+ organisasi.

Berminat untuk mencoba menjalankan aplikasi berbasis Apache Spark? Silakan ikuti tutorial berikut:

1. Mode Standalone "Membuat dan Menjalankan Aplikasi Apache Spark dengan Intellij IDEA pada OS Windows"

2. Mode Fully Distributed "Amazon Elastic MapReduce (EMR) : Menjalankan Apache Spark mode Fully Distributed dengan Biaya kurang dari Rp 1.500,-"

keduanya dikemas secara sedernana dan straight forward.

Sumber data yang dapat diakses Apache Spark (databriks.com)

Ref:
1. Hortonworks, "What Apache Spark Does?," https://hortonworks.com/apache/spark/ [Accessed 29 7 2018].

2. Apache, "Apache Spark," https://spark.apache.org/. [Accessed 29 7 2018].

3. Databricks, "What is Apache Spark?," https://databricks.com/spark/about. [Accessed 29 7 2018].

Comments

Unknown said…

Terimakasih atas ulasan tentang big data pada postingan blog ini, sangat mudah dimengerti.

November 20, 2018 at 10:29 PM

WM Wijaya said…

Sama2,
terima kasih sudah menyimak.
Semoga bermanfaat!

December 29, 2018 at 2:35 PM

Codex said…

Saya ingin banyak belajar, bisa kah saya memang kontak email kakak

April 2, 2019 at 12:10 PM

WM Wijaya said…

silakan: wijaya1414{at_mark}gmail{dot}com

April 4, 2019 at 9:00 PM

maurya said…

Really nice post. Thank you for sharing amazing information.
salesforce certification training
how to prepare for an interview
pg courses after bsc
data science tools
oracle interview questions for freshers

November 9, 2020 at 5:18 PM

havairby-mutaqin-nusur said…

"...dilengkapi dengan API..." saya izin bertanya API itu apa? suatu program kah?

December 10, 2020 at 10:58 AM

360digitmgdelhi said…

You totally coordinate our desire and the assortment of our data.
data scientist course delhi

January 27, 2021 at 4:12 PM

360DigiTMG said…

Standard visits recorded here are the simplest strategy to value your vitality, which is the reason why I am heading off to the site regularly, looking for new, fascinating information. Many, bless your heart!
data science training

January 28, 2021 at 7:54 PM

admin said…

Informasi diatas kurang lengkap ? temukan artikel terkait disalah satu web kami.

Blog Pendidikan ;
Blog Guru ;
Blog Mahasiswa ;
Blog Dosen ;
Blog Siswa ;
Blog Pelajar ;
Blog Ilmu ;
Blog Indonesia ;
Blog EDU ;

Terimakasih, semoga bermanfaat !

April 6, 2021 at 1:25 PM

Sekayu Ngoding said…

This comment has been removed by the author.

May 25, 2021 at 9:26 AM

Fyndhere said…

Thanks for this post can you please help me to find relevant materials in nearby stores of hyderabad

May 26, 2021 at 6:36 PM

Pallavi reddy said…

I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
artificial intellingence training in chennai

June 18, 2021 at 2:17 AM

AI Training in Hyderabad said…

Nice blog. Good work. Clear explanation and informative content. Keep sharing more blogs.
Artificial Intelligence Course in Hyderabad with placements

July 22, 2021 at 2:21 AM

Anonymous said…

Data science training in pune

Data science classes in pune

July 26, 2021 at 9:11 PM

3RI Technologies said…

Hi,
It was great and informative while reading your article on Apache Spark, I liked the content it's very easy to understand and useful. Thanks for sharing Data Science Training in Pune

July 27, 2021 at 6:02 PM

data science said…

I was basically inspecting through the web filtering for certain data and ran over your blog. I am flabbergasted by the data that you have on this blog. It shows how well you welcome this subject. Bookmarked this page, will return for extra. data science course in jaipur

July 31, 2021 at 6:22 PM

Ramesh Sampangi said…

Thanks for sharing this blog with us. Good content and excellent work. Keep maintain this great work.
AI Patasala Data Science Training in Hyderabad
AI Patasala Artificial Intelligence Course in Hyderabad

August 11, 2021 at 9:37 PM

360DigiTMGAurangabad said…

Good information you shared. keep posting.
artificial intelligence courses in aurangabad

August 23, 2021 at 5:57 PM

Ramesh Sampangi said…

Nice blog and good information.
Data Science Course with placements in Hyderabad

September 3, 2021 at 2:17 AM

data science said…

Really appreciated for sharing this article. Very Informative.
If you are looking for advancement in your career, want to learn the data science process and its techniques, Visit Learnbay.co website to know details related to data science courses in Bangalore.
https://www.learnbay.co/data-science-course/data-science-course-in-bangalore/

October 12, 2021 at 4:59 PM

Ms in Germany said…

Excellent goods from you, man. I have understand your stuff previous to and you are simply extremely fantastic. I actually like what you’ve obtained right here, really like what you are saying and the way in which by which you are saying it. You are making it enjoyable and you still care to stay sensible. I can’t wait to read much more from you. This is really a wonderful site. Ms in Germany

November 27, 2021 at 5:43 PM

Ramesh Sampangi said…

Fantastic blog, really nice blog, and useful to all. Informative and knowledgeable content. Thanks for sharing this blog with us. Keep sharing more stuff like this.
AI Patasala Data Science Courses in Hyderabad

January 7, 2022 at 3:15 PM

ibas said…

Terima kasih penjelasannya, saya akhirnya paham

March 4, 2022 at 10:12 AM

Dean said…

Many good points in this piece. Please check out our Instagram Reel Downloader, which is one of the best tools of its kind out there.

April 16, 2022 at 1:27 AM

Dina Shaw said…

If you're looking for a place to download pornhub videos, please try the Pornhub downloader. It's free and unlimited.

April 16, 2022 at 1:29 AM

Data Science said…

They're produced by the very best degree developers who will be distinguished for your polo dress creating.
You'll find polo Ron Lauren inside exclusive array which include particular classes for men, women.
360DigiTMG data science course

May 4, 2022 at 5:33 PM

nagabhushan said…

Viably, the article is actually the best point on this library related issue. I fit in with your choices and will enthusiastically foresee your next updates. data science course in pune

May 6, 2022 at 12:20 AM

Data Science said…

It's late finding this act. At least, it's a thing to be familiar with that there are such events exist.
I agree with your Blog and I will be back to inspect it more in the future so please keep up your act.
360DigiTMG data science course

May 11, 2022 at 6:21 PM

Data Science said…

Any way I’ll be subscribing to your feed and I hope you post again soon. business analytics course in hyderabad

May 16, 2022 at 4:34 PM

Data Science said…

Its as if you had a great grasp on the subject matter, but you forgot to include your readers.
Perhaps you should think about this from more than one angle.
data science institutes in hyderabad

May 18, 2022 at 1:46 PM

Nidhi Kalyan said…

Residential Plots in Mathura Vrindavan - Find your residential plots in at the most affordable price. Reserve your plots today.

August 5, 2022 at 10:12 AM

Deep Verma said…

We offer you the best Residential Plots in Mathura Vrindavan. We offer our customers more than just satisfaction.

August 5, 2022 at 10:59 AM

REKHA SINGH said…

BusinessBooks are the leading supplier of outsourced tax agent and Account outsourcing services in Melbourne, Australia. As a company who have a true understanding of what it takes to grow your business, BusinessBooks know the importance of providing a service that is effective whilst being efficient.

August 5, 2022 at 11:53 AM

Amit Kumar said…

Home Extensions Melbourne, Australia - Bullseye Home Builders are experts in building extensions to your home and are one of the top service providers for your home extension needs.

August 5, 2022 at 12:16 PM

Genzmmo said…

Thanks for sharing this great article we appreciate it, we provide down video from facebook freely and unlimited.

September 18, 2022 at 4:01 PM

Studylivezone said…

Thanks a lot for giving us such a helpful information. You can also visit our website for amity solved synopsis

September 26, 2022 at 2:08 PM

Genzmmo said…

uraqt.xyz Entertainment information and creative ideas.

October 21, 2022 at 10:27 PM

sirisha gen said…

Creative blog post thanks for sharing.

ERP software company in Hyderabad

Ecommerce website development company in Hyderabad

November 14, 2022 at 6:21 PM

Mahar hanan nawaz said…

This comment has been removed by the author.

January 14, 2023 at 12:48 PM

medicaly said…

If I want to introduce the best Instagram downloader site, it is undoubtedly Instro. Because in this site you can easily download reels, videos, photos, albums and all other things.

https://downloadinsta.app/

March 22, 2023 at 3:10 PM

Digital Aacharya said…

Nice article to read about capital budget topic. I also learn many things through this blog. Thanks for sharing. Who wants to know about what is digital marketing and where to learn digital marketing visit our Digital Marketing Training Institute.

March 31, 2023 at 12:08 PM

Anonymous said…

Great post , thanks for sharing valuable information, keep posting Software Testing Course in Pune

April 10, 2023 at 2:04 PM

Alia parker said…

Hazard management is a critical component of occupational health and safety, emergency management, environmental protection, and public safety. By systematically identifying and addressing hazards, organizations and communities can minimize risks, prevent accidents and injuries, and protect the well-being of individuals and the environment.

June 12, 2023 at 9:01 PM

IT Education Centre said…

Nice article.
Data science classes in Nagpur

August 12, 2023 at 2:10 PM

Anonymous said…

Great Post.Thanks for sharing.
Data science course in Nagpur

August 23, 2023 at 12:03 PM

Anonymous said…

This comment has been removed by the author.

August 25, 2023 at 4:35 PM

Anonymous said…

amazing writeup, keep posting. If you are intresting in data science course in satara then click on this website.

August 31, 2023 at 6:23 PM

Kajal said…

Thanks for sharing this informative blog with us.

Web Development training in Chandigarh

September 6, 2023 at 6:58 PM

Anonymous said…

informative blog, keep posting data science classes in satara

September 7, 2023 at 6:05 PM

Real Talks said…

Thanks for sharing this informative blog with us. hashtags for instagram

January 9, 2024 at 2:29 PM

backilns said…

"Discover, explore, and optimize your hashtag game with our user-friendly platform! Elevate your social media presence effortlessly.#HashtagHeaven #BoostYourReach #SocialMediaSuccess"
valentine hashtag

February 1, 2024 at 9:43 PM

seep learn said…

Apache Spark is a powerful open-source distributed computing system that is widely used for big data processing and analytics. Here’s an overview of Apache Spark in the context of big data: Image Processing Projects For Final Year Students

What is Apache Spark? Big Data Projects For Final Year Students
Apache Spark is a unified analytics engine for large-scale data processing. It provides:

Speed: Spark can perform computations up to 100 times faster than traditional MapReduce processing due to its in-memory computing capabilities.

Ease of Use: Spark offers easy-to-use APIs in multiple languages (Scala, Java, Python, R), making it accessible to developers and data scientists.

July 3, 2024 at 7:58 PM

Search This Blog

Teknologi Big Data

Apache Spark: Perangkat Lunak Analisis Terpadu untuk Big Data

Comments

Popular posts from this blog

Apa itu Big Data : Menyimak Kembali Definisi Big Data, Jenis Teknologi Big Data, dan Manfaat Pemberdayaan Big Data

MapReduce: Besar dan Powerful, tapi Tidak Ribet

Cara Sederhana Install Hadoop 2 mode Standalone pada Windows 7 dan Windows 10

Tutorial Python: Cara Mudah Web Scraping menggunakan Beautiful Soup

HBase: Hyper NoSQL Database

HDFS: Berawal dari Google untuk Big Data

Aplikasi iPhone : RETaS Read English Tanpa Kamus!

Memahami Definisi Big Data

Bagaimana Cara Membaca Google Play eBook Secara Offline?