Exam Databricks-Certified-Professional-Data-Engineer Topic 4 Question 74 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 74
Topic #: 4
When working with AUTO LOADER you noticed that most of the columns that were inferred as part of loading are string data types including columns that were supposed to be integers, how can we fix this?

Suggested Answer: C Vote an answer

Explanation
The answer is, Provide schema hints.
1.spark.readStream \
2.format("cloudFiles") \
3.option("cloudFiles.format", "csv") \
4.option("header", "true") \
5.option("cloudFiles.schemaLocation", schema_location) \
6.option("cloudFiles.schemaHints", "id int, description string")
7.load(raw_data_location)
8.writeStream \
9.option("checkpointLocation", checkpoint_location) \
10.start(target_delta_table_location)option("cloudFiles.schemaHints", "id int, description string")
# Here we are providing a hint that id column is int and the description is a string When cloudfiles.schemalocation is used to store the output of the schema inference during the load process, with schema hints you can enforce data types for known columns ahead of time.

by Baron at May 13, 2025, 12:23 AM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10