Exam Associate-Developer-Apache-Spark Topic 1 Question 169 Discussion

Actual exam question for Databricks's Associate-Developer-Apache-Spark exam
Question #: 169
Topic #: 1
Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?

Suggested Answer: A Vote an answer

Explanation
To solve this question, you need to know that DataFrame.sample() is not guaranteed to return the exact fraction of the number of rows specified as an argument. Furthermore, since duplicates may be returned, you should understand that the operator's withReplacement argument should be set to True. A force= argument for the operator does not exist.
While the take argument returns an exact number of rows, it will just take the first specified number of rows (1000 in this question) from the DataFrame. Since the DataFrame does not include duplicate rows, there is no potential of any of those returned rows being duplicates when using take(), so the correct answer cannot involve take().
More info: pyspark.sql.DataFrame.sample - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2

by Olive at Jun 26, 2026, 11:22 AM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10