Exam Databricks-Certified-Data-Engineer-Professional Topic 1 Question 75 Discussion

Actual exam question for Databricks's Databricks-Certified-Data-Engineer-Professional exam
Question #: 75
Topic #: 1

A data engineer is configuring a pipeline that will potentially see late-arriving, duplicate records.
In addition to de-duplicating records within the batch, which of the following approaches allows the data engineer to deduplicate data against previously processed records as it is inserted into a Delta table?

A. Set the configuration delta.deduplicate = true. B. VACUUM the Delta table after each batch completes. C. Perform an insert-only merge with a matching condition on a unique key. D. Perform a full outer join on a unique key and overwrite existing data. E. Rely on Delta Lake schema enforcement to prevent duplicate records.

Suggested Answer: C Vote an answer

To deduplicate data against previously processed records as it is inserted into a Delta table, you can use the merge operation with an insert-only clause. This allows you to insert new records that do not match any existing records based on a unique key, while ignoring duplicate records that match existing records. For example, you can use the following syntax:
MERGE INTO target_table USING source_table ON target_table.unique_key = source_table.unique_key WHEN NOT MATCHED THEN INSERT * This will insert only the records from the source table that have a unique key that is not present in the target table, and skip the records that have a matching key. This way, you can avoid inserting duplicate records into the Delta table.

by Gladys at Apr 25, 2026, 10:43 AM

Limited Time Offer

15%

Off

Get Premium Databricks-Certified-Data-Engineer-Professional Questions as Interactive Self Test Engine or PDF

Comments

0 Happy Clients

0 Shares

0 Demo Downloads

10 Years in Business