You can now configure the Auto Scaling of your endpoints from the Amazon SageMaker console, the AWS Auto Scaling API, and the AWS SDK, making capacity management easier. Using Amazon SageMaker, you can specify the number and type of instances per endpoint to provide the scale that you need for your inferences. If your inference volume changes, you can change the number and/or type of instances that back each endpoint to accommodate that change. By using Auto Scaling, you can automatically adjust your inference capacity to maintain predictable performance at a low cost. From the Amazon SageMaker console, you configure the minimum and maximum number of instances for your endpoint, and then select the throughput per instance. Amazon SageMaker will monitor your deployed models to automatically adjust the instance count and keep throughput within desired levels, in response to changes in application traffic. This makes it easier to manage models in production, and it can help reduce the cost of deployed models.