Exam Associate-Developer-Apache-Spark-3.5 Topic 1 Question 23 Discussion

Actual exam question for Databricks's Associate-Developer-Apache-Spark-3.5 exam
Question #: 23
Topic #: 1
Given the code:

df = spark.read.csv("large_dataset.csv")
filtered_df = df.filter(col("error_column").contains("error"))
mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count")) reduced_df = mapped_df.groupBy("date").sum("count") reduced_df.count() reduced_df.show() At which point will Spark actually begin processing the data?

Suggested Answer: B Vote an answer

Spark uses lazy evaluation. Transformations like filter, select, and groupBy only define the DAG (Directed Acyclic Graph). No execution occurs until an action is triggered.
The first action in the code is:reduced_df.count()
So Spark starts processing data at this line.
Reference:Apache Spark Programming Guide - Lazy Evaluation

by Hermosa at Jul 18, 2025, 03:14 PM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10