Chevron Left
Back to Distributed Computing with Spark SQL

Learner Reviews & Feedback for Distributed Computing with Spark SQL by University of California, Davis

4.5
stars
659 ratings

About the Course

This course is all about big data. It’s for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines. The first module introduces Spark and the Databricks environment including how Spark distributes computation and Spark SQL. Module 2 covers the core concepts of Spark such as storage vs. compute, caching, partitions, and troubleshooting performance issues via the Spark UI. It also covers new features in Apache Spark 3.x such as Adaptive Query Execution. The third module focuses on Engineering Data Pipelines including connecting to databases, schemas and data types, file formats, and writing reliable data. The final module covers data lakes, data warehouses, and lakehouses. Students build production grade data pipelines by combining Spark with the open-source project Delta Lake. By the end of this course, students will hone their SQL and distributed computing skills to become more adept at advanced analysis and to set the stage for transitioning to more advanced analytics as Data Scientists....

Top reviews

GT

Jun 9, 2020

I highly recommend this course for anyone in the BI and Data space interested in learning Spark. The course gives an easy to understand to the framework and applicable hands on examples.

KS

May 13, 2020

Amazing course that really cuts through the fundamentals of using distributed computing power to analyze and manipulate data. Well organised structure on fundamentals

Filter by:

126 - 150 of 162 Reviews for Distributed Computing with Spark SQL

By Виктор И

Jun 15, 2022

Неплохое введение в распределённые вычисления с помощью Spark. Порой слишком поверхностные объяснения.

By Praneeth P

May 22, 2020

A good way to get started with Spark SQL. You might need some knowledge of SQL to get started with.

By Ahmed R

Sep 11, 2022

Great course to get started with Spark as well as understanding the basics of data architecture

By Yuhao W

Jan 16, 2023

Coursera这课还不错,体验实践Spark SQL(在Databricks平台上),以及快速过一遍数仓数据湖的相关概念 —— 毕竟,免费公开课里,能实践到具体编程环境的MOOC真不多了。

By Prajnita S

Jan 28, 2021

There should have been more assignments for practicing. Overall the course was structured good!

By samuel k

May 30, 2022

Databricks is a great resource but in this course we only brush the top of SQL.

By Nick C

Oct 4, 2020

Pretty good. The exercises were good. I would have liked more of them.

By Wai K C

Jun 28, 2021

Will be better if more practise and examples of Spark SQL is given

By Snorri B A

Jul 15, 2021

Very informative course that increased my knowledge of SQL

By Truong T T H

Jul 6, 2020

It is a good course. I have learned a lot about Spark.

By Michael S

Apr 25, 2021

Good introduction but assignments are too guided.

By Gustavo M L A

Nov 16, 2020

Liked the practical part and the instructors.

By AVIJIT J

Apr 11, 2022

Note book for the last week is missing!

By Đàm T T

Nov 3, 2020

good for spark-ml, bigdata beginer

By Anggi F S

Oct 3, 2022

The sound of speakers is low

By Leonardo S

Jan 7, 2022

I found it very interesting.

By Dr. H K A

Apr 13, 2020

Happy to attend this course.

By Ghirardi N

Dec 28, 2022

Quite introductionary

By Anthony J V H

Sep 9, 2021

Muy buen curso

By Gabriel J N P

Feb 6, 2022

Great!!

By Thomas M M

Apr 19, 2023

Decent learning material, but can be significantly improved. In my point of view, this course felt like a typical walkthrough and/or advertisement for DataBricks, not on how to 'specifically' use Spark SQL in the context of Data Science.

As someone who is solely interested in SQL (as this specialization specifies), the addition of Python among the codes got me confused, especially that the codes were too alien for me.

To be honest, the total coding activity I've done here felt like just below one-tenth of the coding activities of the 2 previous courses (and this does not even include those activities not available in the free/community versions of the SQL platforms used). Learner involvement in the notebooks should be higher, not just simply running pre-written code.

By Allyson D d L

Dec 5, 2021

I was disappointed because I still don't know how to insall Spark in my PC. I tried to install but it doesn't work. So I think it would be better if we could learn how to use Spark without Databricks. And the content is too superficial.

By Luis A E E M

Nov 14, 2021

Es un curso muy interesante pero le aumenta mucho nivel que este en idioma ingles, deberían sacar cursos de SQL en español para solo concentrarnos en aprender los querys y no tratar de también traducir y entender lo que piden en ingles.

By Alejandro C V

Feb 22, 2024

I believe the course is more focused on the use of a specific application. Some of the videos were lengthy and lacked sufficient illustrative tools. This courses is focus in data engineering instead of data science.

By Minh D

Sep 26, 2020

plus points:

detailed notebook & easy to use environment

acceptable slides & course videos

cons:

almost no distributed stuffs at all

not diving deep into spark internals