Exam Databricks-Machine-Learning-Associate Topic 1 Question 25 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 25
Topic #: 1

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?

A. One-hot encoding is not supported by most machine learning libraries. B. One-hot encoding is dependent on the target variable's values which differ for each application. C. One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems. D. One-hot encoding is not a common strategy for representing categorical feature variables numerically. E. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

Suggested Answer: E Vote an answer

One-hot encoding transforms categorical variables into a format that can be provided to machine learning algorithms to better predict the output. However, when done prematurely or universally within a feature repository, it can be problematic:
Dimensionality Increase: One-hot encoding significantly increases the feature space, especially with high cardinality features, which can lead to high memory consumption and slower computation.
Model Specificity: Some models handle categorical variables natively (like decision trees and boosting algorithms), and premature one-hot encoding can lead to inefficiency and loss of information (e.g., ordinal relationships).
Sparse Matrix Issue: It often results in a sparse matrix where most values are zero, which can be inefficient in both storage and computation for some algorithms.
Generalization vs. Specificity: Encoding should ideally be tailored to specific models and use cases rather than applied generally in a feature repository.
Reference
"Feature Engineering and Selection: A Practical Approach for Predictive Models" by Max Kuhn and Kjell Johnson (CRC Press, 2019).

by Mandel at Mar 07, 2025, 07:02 AM

Limited Time Offer

15%

Off

Get Premium Databricks-Machine-Learning-Associate Questions as Interactive Self Test Engine or PDF

Comments

0 Happy Clients

0 Shares

0 Demo Downloads

10 Years in Business