Exam Databricks-Certified-Professional-Data-Engineer Topic 4 Question 74 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 74
Topic #: 4

When working with AUTO LOADER you noticed that most of the columns that were inferred as part of loading are string data types including columns that were supposed to be integers, how can we fix this?

A. Provide the schema of the source table in the cloudfiles.schemalocation B. Provide the schema of the target table in the cloudfiles.schemalocation C. Provide schema hints D. Update the checkpoint location E. Correct the incoming data by explicitly casting the data types

Suggested Answer: C Vote an answer

Explanation
The answer is, Provide schema hints.
1.spark.readStream \
2.format("cloudFiles") \
3.option("cloudFiles.format", "csv") \
4.option("header", "true") \
5.option("cloudFiles.schemaLocation", schema_location) \
6.option("cloudFiles.schemaHints", "id int, description string")
7.load(raw_data_location)
8.writeStream \
9.option("checkpointLocation", checkpoint_location) \
10.start(target_delta_table_location)option("cloudFiles.schemaHints", "id int, description string")
# Here we are providing a hint that id column is int and the description is a string When cloudfiles.schemalocation is used to store the output of the schema inference during the load process, with schema hints you can enforce data types for known columns ahead of time.

by Baron at May 13, 2025, 12:23 AM

Limited Time Offer

15%

Off

Get Premium Databricks-Certified-Professional-Data-Engineer Questions as Interactive Self Test Engine or PDF

Comments

0 Happy Clients

0 Shares

0 Demo Downloads

10 Years in Business