Databricks Certified Machine Learning Professional - Databricks-Machine-Learning-Professional FREE EXAM DUMPS QUESTIONS & ANSWERS

A Data Scientist is tasked with developing models to forecast product demand. The company offers 5000 different product types, and the Data Scientist must generate weekly forecasts for each type. They have access to two years of historical purchase data and are given ample project budget.
For their next project, they want to build 5000 separate Random Forest models, one for each product type. They aim to train all the models as quickly as possible with minimal setup.
Which approach meets these requirements?
Correct Answer: B Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A Data Scientist is building a machine learning pipeline to classify raw text using a Logistic Regression model in Spark using Spark MLlib's Pipelines. This pipeline has three stages: the Tokenizer (to split the raw text in tokens), a HashingTF (to transform tokens into hashes) and the Logistic Regression itself (to perform the classification of texts). The Spark DataFrame with the training data is called trainingDF and the one with the test data is called testDF.
In order to do this, they use the following incomplete piece of code:

Which option correctly states:
(i) The complete command to run model training;
(ii) The complete command to execute the prediction on test data;
(iii) The object type of the model object returned by the model
training command.
Correct Answer: D Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A machine learning engineer is manually refreshing a model in an existing machine learning pipeline. The pipeline uses the MLflow Model Registry model "project". The machine learning engineer would like to add a new version of the model to "project". Which MLflow operation can the machine learning engineer use to accomplish this task?
Correct Answer: C Vote an answer
A machine learning engineer would like to compute predictions on inference data as it becomes available through the pipeline in microbatches. The predictions should be stored in a table for query later. Which deployment strategy can the engineer use?
Correct Answer: B Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A data scientist is building a model to predict which communication channel (Phone, SMS, Email, or Post) is most likely to be effective for a given customer. Which model type is suited to this task?
Correct Answer: C Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A Machine Learning Engineer has deployed a fraud detection model in Databricks Model Serving to detect fraudulent transactions. The engineer wants to compare the model's predictions with the actual fraud classifications from the Fraud Ops team to monitor model performance. The Fraud Ops team uses a unique transaction_id to investigate fraudulent activity and persist their findings to a fraud_findings table. The engineer enabled inference tables on the endpoint, but they are not sure how to map the models' predictions to the Fraud Ops team's classifications. How can the engineer uniquely join the models' prediction to the fraud_findings table with the fewest code changes?
Correct Answer: D Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A Machine Learning Engineer is training a large-scale gradient boosting model using SparkML on a cluster of machines. The training job fails due to memory overflow on a single executor node after processing several iterations. The cluster resources are limited to executor nodes with 16 CPU cores and 64 GB RAM each. The engineer wants to continue training the model without changing hyperparameters or reducing the dataset size. They know Spark's architecture well and want to take advantage of its benefits. Which approach will allow the Machine Learning Engineer to solve this issue?
Correct Answer: B Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
Which MLflow command logs a trained model?
Correct Answer: D Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A data scientist wants to remove the star_rating column from the Delta table at the location path.
To do this, they need to load in data and drop the star_rating column. Which of the following code blocks accomplishes this task?
Correct Answer: D Vote an answer
A Machine Learning Engineer wants to monitor the quality and stability of their machine learning model's predictions over time. They have a Delta table, retail_inference_log, which records each model prediction along with input features, a timestamp, and (when available) the true label. They need to detect data drift and monitor model performance trends using Databricks Lakehouse Monitoring, ensuring that alerts are triggered if the distribution of predictions or input features changes significantly. Which approach will set up monitoring for this use case?
Correct Answer: D Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A Data Scientist needs to analyze drift detection results from Databricks Lakehouse Monitoring.
The system has generated both profile metrics and drift metrics tables. The scientist needs to identify baseline drift in numerical features by comparing current data against a baseline from 6 months ago. Which combination of table columns and values indicates baseline drift in a numerical feature?
Correct Answer: B Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
Why is Delta Lake time travel useful in ML pipelines?
Correct Answer: A Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A machine learning engineer is in the process of implementing a concept drift monitoring solution.
They are planning to use the following steps:
1. Deploy a model to production and compute predicted values
2. Obtain the observed (actual) label values
3. _____
4. Run a statistical test to determine if there are changes over time
Which of the following should be completed as Step #3?
Correct Answer: B Vote an answer
A data scientist would like to switch from manually using MLflow logging to MLflow Autologging for all machine learning libraries used in a notebook.
They begin by adding mlflow.autolog()to the top of the below code block:

The data scientist is now trying to determine which line of code will kick off the MLflow Autologging process.
Which line of code within the above code block will start the MLflow Autologging process?
Correct Answer: A Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
A machine learning engineer has deployed a model recommender using MLflow Model Serving.
They now want to query the version of that model that is in the Production stage of the MLflow Model Registry. Which of the following model URls can be used to query the described model version?
Correct Answer: B Vote an answer
Explanation: Only visible for FreeCram members. You can sign-up / login (it's free).
0
0
0
10