Exam MLA-C01 Topic 3 Question 32 Discussion

Actual exam question for Amazon's MLA-C01 exam
Question #: 32
Topic #: 3

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.
The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.
Which solution will meet this requirement?

A. Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances. B. Change the target metric to CPU utilization. C. Modify the scaling policy target value to one. D. Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.

Suggested Answer: D Vote an answer

This issue occurs because target tracking auto scaling allows the endpoint to scale down to zero, and scaling up only happens after traffic arrives. At the start of the business day, no instances are running, so the first requests experience cold-start delays.
AWS documentation for Amazon SageMaker recommends using scheduled or step scaling policies when predictable traffic patterns exist. In this case, business hours are predictable, so the best practice is to proactively scale the endpoint before traffic arrives.
Option D correctly uses Amazon CloudWatch alarms with step scaling to increase the minimum instance count from zero to one at the start of the business day. This ensures at least one warm instance is ready to handle requests immediately, eliminating startup latency.
Options A, B, and C do not guarantee instance availability before traffic begins. Cooldown tuning and metric changes only react after load is detected.
Therefore, scheduled step scaling using CloudWatch alarms is the correct solution.

by Zona at Apr 03, 2026, 03:55 AM

Limited Time Offer

15%

Off

Get Premium MLA-C01 Questions as Interactive Self Test Engine or PDF

Comments

0 Happy Clients

0 Shares

0 Demo Downloads

10 Years in Business