Apache Spark: Perangkat Lunak Analisis Terpadu untuk Big Data

       Apache Spark adalah engine ( perangkat lunak ) analisis terpadu super cepat untuk memproses data dalam skala besar; meliputi Big Data dan machine learning. Secara lebih detailnya, Apache Spark dapat didefinisikan sebagai engine ( perangkat lunak ) untuk memproses data dalam skala besar secara in-memory, dilengkapi dengan API pengembangan yang elegan dan ekspresif guna memudahkan para pekerja data dalam mengeksekusi pekerjaan-pekerjaan yang membutuhkan perulangan akses yang cepat terhadap data yang diproses, seperti halnya streaming, machine learning, maupun SQL, secara efisien.

       Apache Spark terdiri atas Spark Core ( inti ) dan sekumpulan library perangkat lunak. Inti dari Spark adalah distributed execution engine, dan API Java, Scala maupun Python disediakan sebagai platform untuk mengembangkan aplikasi ETL ( Extract, Transform, Load ) terdistribusi. Kemudian, library perangkat lunak tambahan, yang dibangun diatas inti ( core )-nya, memfasilitasi berbagai jenis pekerjaan yang berhubungan dengan streaming, SQL, dan machine learning.



Komponen Apache Spark (hortonworks.com)

       Spark didesain untuk data science dan menyediakan abstraksi yang membuat data science menjadi lebih mudah. Para data scientist ( ilmuwan data ) sering menggunakan machine learning, yaitu sekumpulan teknik dan algorithma yang dapat belajar dari data-data yang diberikan. Algorithma-algorithma ini banyak yang sifatnya iterative ( melakukan perulangan kalkulasi ), sehingga kemampuan Spark untuk menempatkan data-data yang diproses pada cache di memory, berperan sangat besar dalam peningkatan kecepatan bagi pemrosesan data yang sifatnya iterative tersebut. Kemampuan Spark ini telah menjadikan Spark sebagai engine yang ideal bagi implementasi algorithma-algorithma machine learning. Berkaitan dengan hal ini, Spark juga menyertakan Mllib, library perangkat lunak yang menyediakan implementasi algorithma-algorithma machine learning untuk teknik-teknik data science yang sudah umum, seperti Classification, Regression, Collaborative Filtering, Clustering, and Dimensionality Reduction.

       Sebagai perangkat lunak untuk memproses data dalam skala besar, Apache Spark memiliki sejumlah keunggulan, diantaranya:
  1. Kecepatan. Apache Spark mampu bekerja 100 kali lebih cepat dibanding Hadoop. Berkat penggunaan state-of-the-art DAG scheduler, query optimizer, dan physical execution engine, Apache Spark dapat mencapai performa tinggi baik dalam hal pemrosesan data yang sifatnya batch maupun streaming.
  2. Mudah Digunakan. Dapat menggunakan bahasa program Java, Scala, Python, R, dan SQL untuk mengembangkan aplikasi yang menggunakan Apache Spark. Spark menyediakan lebih dari 80 operator level tinggi yang dapat memudahkan pengembang untuk membangun aplikasi secara parallel. Apache Spark juga dapat digunakan secara interaktif dari shell Scala, Python, R, dan SQL.
  3. Memiliki Cakupan yang Luas. Apache Spark menggabungkan SQL, streaming, dan analitik yang kompleks; menyediakan setumpuk library perangkat lunak meliputi SQL dan DataFrames, MLlib untuk machine learning, GraphX, dan Spark Streaming. Pengembang aplikasi dapat menggabungkan semua library ini dengan mulus dalam satu aplikasi yang sama.
  4. Dapat dijalankan Dimana-mana. Apache Spark dapat dijalankan di Hadoop YARN, Apache Mesos, Kubernetes, dengan mode standalone maupun cluster, atau di platform cloud seperti EC2. Pada dasarnya, Spark dapat mengakses berbagai tipe sumber data seperti halnya HDFS, Apache Cassandra, Apache HBase, Apache Hive, dan ratusan sumber data lainnya.
       Sejak peluncurannya, Apache Spark telah dengan cepat diadopsi oleh perusahaan-perusahaan dari berbagai jenis bidang industri. Raksasa dunia Internet seperti halnya Netflix, Yahoo!, dan eBay telah menjalankan Spark dalam skala super besar, secara kolektif memproses data dalam hitungan petabytes pada kluster yang terdiri atas 8000 nodes ( komputer ). Spark tumbuh dengan cepat menjadi komunitas open source terbesar di bidang Big Data, terdiri atas lebih dari 1000 kontributor dan 250+ organisasi.

Berminat untuk mencoba menjalankan aplikasi berbasis Apache Spark? Silakan ikuti tutorial berikut:


keduanya dikemas secara sedernana dan straight forward.

Sumber data yang dapat diakses Apache Spark (databriks.com) 

Ref:
1. Hortonworks, "What Apache Spark Does?," https://hortonworks.com/apache/spark/ [Accessed 29 7 2018].
2. Apache, "Apache Spark," https://spark.apache.org/. [Accessed 29 7 2018].
3. Databricks, "What is Apache Spark?," https://databricks.com/spark/about. [Accessed 29 7 2018].

Comments

Unknown said…
Terimakasih atas ulasan tentang big data pada postingan blog ini, sangat mudah dimengerti.
WM Wijaya said…
Sama2,
terima kasih sudah menyimak.
Semoga bermanfaat!
Codex said…
Saya ingin banyak belajar, bisa kah saya memang kontak email kakak
WM Wijaya said…
silakan: wijaya1414{at_mark}gmail{dot}com
"...dilengkapi dengan API..." saya izin bertanya API itu apa? suatu program kah?
360digitmgdelhi said…
You totally coordinate our desire and the assortment of our data.
data scientist course delhi
360DigiTMG said…
Standard visits recorded here are the simplest strategy to value your vitality, which is the reason why I am heading off to the site regularly, looking for new, fascinating information. Many, bless your heart!
data science training
admin said…
Informasi diatas kurang lengkap ? temukan artikel terkait disalah satu web kami.

Blog Pendidikan ;
Blog Guru ;
Blog Mahasiswa ;
Blog Dosen ;
Blog Siswa ;
Blog Pelajar ;
Blog Ilmu ;
Blog Indonesia ;
Blog EDU ;

Terimakasih, semoga bermanfaat !
Sekayu Ngoding said…
This comment has been removed by the author.
Fyndhere said…
Thanks for this post can you please help me to find relevant materials in nearby stores of hyderabad
Pallavi reddy said…
I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
artificial intellingence training in chennai
Nice blog. Good work. Clear explanation and informative content. Keep sharing more blogs.
Artificial Intelligence Course in Hyderabad with placements
Anonymous said…
Data science training in pune

Data science classes in pune
Hi,
It was great and informative while reading your article on Apache Spark, I liked the content it's very easy to understand and useful. Thanks for sharing Data Science Training in Pune
data science said…



I was basically inspecting through the web filtering for certain data and ran over your blog. I am flabbergasted by the data that you have on this blog. It shows how well you welcome this subject. Bookmarked this page, will return for extra. data science course in jaipur

Ramesh Sampangi said…
Thanks for sharing this blog with us. Good content and excellent work. Keep maintain this great work.
AI Patasala Data Science Training in Hyderabad
AI Patasala Artificial Intelligence Course in Hyderabad
data science said…
Really appreciated for sharing this article. Very Informative.
If you are looking for advancement in your career, want to learn the data science process and its techniques, Visit Learnbay.co website to know details related to data science courses in Bangalore.
https://www.learnbay.co/data-science-course/data-science-course-in-bangalore/
Ms in Germany said…
Excellent goods from you, man. I have understand your stuff previous to and you are simply extremely fantastic. I actually like what you’ve obtained right here, really like what you are saying and the way in which by which you are saying it. You are making it enjoyable and you still care to stay sensible. I can’t wait to read much more from you. This is really a wonderful site. Ms in Germany
Ramesh Sampangi said…
Fantastic blog, really nice blog, and useful to all. Informative and knowledgeable content. Thanks for sharing this blog with us. Keep sharing more stuff like this.
AI Patasala Data Science Courses in Hyderabad
Dean said…
Many good points in this piece. Please check out our Instagram Reel Downloader, which is one of the best tools of its kind out there.
Dina Shaw said…
If you're looking for a place to download pornhub videos, please try the Pornhub downloader. It's free and unlimited.
Data Science said…
They're produced by the very best degree developers who will be distinguished for your polo dress creating.
You'll find polo Ron Lauren inside exclusive array which include particular classes for men, women.
360DigiTMG data science course
nagabhushan said…
Viably, the article is actually the best point on this library related issue. I fit in with your choices and will enthusiastically foresee your next updates. data science course in pune
Data Science said…
It's late finding this act. At least, it's a thing to be familiar with that there are such events exist.
I agree with your Blog and I will be back to inspect it more in the future so please keep up your act.
360DigiTMG data science course

Data Science said…
Any way I’ll be subscribing to your feed and I hope you post again soon. business analytics course in hyderabad
Data Science said…
Its as if you had a great grasp on the subject matter, but you forgot to include your readers.
Perhaps you should think about this from more than one angle.
data science institutes in hyderabad
Nidhi Kalyan said…
Residential Plots in Mathura Vrindavan - Find your residential plots in at the most affordable price. Reserve your plots today.
Deep Verma said…
We offer you the best Residential Plots in Mathura Vrindavan. We offer our customers more than just satisfaction.
REKHA SINGH said…
BusinessBooks are the leading supplier of outsourced tax agent and Account outsourcing services in Melbourne, Australia. As a company who have a true understanding of what it takes to grow your business, BusinessBooks know the importance of providing a service that is effective whilst being efficient.
Amit Kumar said…
Home Extensions Melbourne, Australia - Bullseye Home Builders are experts in building extensions to your home and are one of the top service providers for your home extension needs.
Genzmmo said…
Thanks for sharing this great article we appreciate it, we provide down video from facebook freely and unlimited.
Studylivezone said…
Thanks a lot for giving us such a helpful information. You can also visit our website for amity solved synopsis
Genzmmo said…
uraqt.xyz Entertainment information and creative ideas.
This comment has been removed by the author.
medicaly said…
If I want to introduce the best Instagram downloader site, it is undoubtedly Instro. Because in this site you can easily download reels, videos, photos, albums and all other things.

https://downloadinsta.app/
Nice article to read about capital budget topic. I also learn many things through this blog. Thanks for sharing. Who wants to know about what is digital marketing and where to learn digital marketing visit our Digital Marketing Training Institute.
Anonymous said…
Great post , thanks for sharing valuable information, keep posting Software Testing Course in Pune
Alia parker said…
Hazard management is a critical component of occupational health and safety, emergency management, environmental protection, and public safety. By systematically identifying and addressing hazards, organizations and communities can minimize risks, prevent accidents and injuries, and protect the well-being of individuals and the environment.




IT Education Centre said…
Anonymous said…
This comment has been removed by the author.
Anonymous said…
amazing writeup, keep posting. If you are intresting in data science course in satara then click on this website.
Kajal said…
Thanks for sharing this informative blog with us.

Web Development training in Chandigarh
Anonymous said…
informative blog, keep posting data science classes in satara
Real Talks said…
Thanks for sharing this informative blog with us. hashtags for instagram
backilns said…
"Discover, explore, and optimize your hashtag game with our user-friendly platform! Elevate your social media presence effortlessly.#HashtagHeaven #BoostYourReach #SocialMediaSuccess"
valentine hashtag
seep learn said…
Apache Spark is a powerful open-source distributed computing system that is widely used for big data processing and analytics. Here’s an overview of Apache Spark in the context of big data: Image Processing Projects For Final Year Students

What is Apache Spark? Big Data Projects For Final Year Students
Apache Spark is a unified analytics engine for large-scale data processing. It provides:

Speed: Spark can perform computations up to 100 times faster than traditional MapReduce processing due to its in-memory computing capabilities.

Ease of Use: Spark offers easy-to-use APIs in multiple languages (Scala, Java, Python, R), making it accessible to developers and data scientists.

Popular posts from this blog

MapReduce: Besar dan Powerful, tapi Tidak Ribet

Tutorial Python: Cara Mudah Web Scraping menggunakan Beautiful Soup

Cara Sederhana Install Hadoop 2 mode Standalone pada Windows 7 dan Windows 10

Apa itu Big Data : Menyimak Kembali Definisi Big Data, Jenis Teknologi Big Data, dan Manfaat Pemberdayaan Big Data

HBase: Hyper NoSQL Database

HDFS: Berawal dari Google untuk Big Data

Aplikasi iPhone : RETaS Read English Tanpa Kamus!

Big Data dan Rahasia Kejayaan Google

Bagaimana Cara Membaca Google Play eBook Secara Offline?