AWS Machine Learning Specialty Sample Questions & MLS-C01 Dumps

Q: What is the AWS Certified Machine Learning Specialty MLS-C01 certification?

An AWS certification validating skills in building and deploying ML models on AWS.

Q: How do I become AWS Certified Machine Learning Specialty certified?

Study ML and AWS AI/ML services, register for MLS-C01, and pass the exam.

Q: What are the prerequisites for the AWS Certified Machine Learning Specialty exam?

No formal prerequisites; AWS suggests 1–2 years ML or deep learning experience.

Q: How many questions are on the AWS Certified Machine Learning Specialty exam?

The exam has 65 multiple-choice and multiple-response questions.

Q: What is the passing score for the AWS Machine Learning Specialty MLS-C01 exam?

The passing score is 750 out of 1000.

Q: How long is the AWS Machine Learning Specialty certification exam?

The exam duration is 180 minutes.

Q: What topics are covered in the AWS Certified Machine Learning Specialty exam?

Covers data engineering, EDA, modeling, ML implementation, and operations.

Q: How difficult is the AWS MLS-C01 certification exam?

It is challenging, requiring ML expertise and AWS service knowledge.

Q: How long does it take to prepare for the AWS Certified Machine Learning Specialty exam?

Most candidates prepare in 8–12 weeks.

Q: Are there any AWS Certified Machine Learning Specialty sample questions or practice tests available?

Yes—AWS provides samples; CertiMaan offers practice tests.

CertiMaan
Oct 18
21 min read

Updated: 5 days ago

Prepare confidently for the AWS Machine Learning Specialty exam with a curated set of real-world exam questions and MLS-C01 practice dumps. This comprehensive guide includes scenario-based AWS certified machine learning specialty sample questions and domain-focused practice exams covering data engineering, modeling, algorithm selection, and ML implementation on AWS. Whether you're a data scientist, ML engineer, or aspiring cloud professional, these AWS Machine Learning Specialty exam questions will help validate your skills. With updated MLS-C01 dumps and mock exams, assess your readiness and enhance your chances of success in one of AWS’s most challenging certifications. Begin your journey to earning the AWS ML Certification with trusted preparation material designed for real exam environments.

AWS Machine Learning Specialty Sample Questions List :

1. A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (Pll). The dataset: Must be accessible from a VPC only. Must not traverse the public internet. How can these requirements be satisfied?

Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.
Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance.
Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance.
Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance.

2. A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist. Which machine learning model type should the Specialist use to accomplish this task?

Linear regression
Classification
Clustering
Reinforcement learning

3. A Machine Learning Specialist built an image classification deep learning model. However the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%r respectively. How should the Specialist address this issue and what is the reason behind it?

The learning rate should be increased because the optimization process was trapped at a local minimum.
The dropout rate at the flatten layer should be increased because the model is not generalized enough.
The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough.
The epoch number should be increased because the optimization process was terminated before it reached the global minimum.

4. An insurance company is developing a new device for vehicles that uses a camera to observe drivers' behavior and alert them when they appear distracted The company created approximately 10,000 training images in a controlled environment that a Machine Learning Specialist will use to train and evaluate machine learning models During the model evaluation the Specialist notices that the training error rate diminishes faster as the number of epochs increases and the model is not accurately inferring on the unseen test images Which of the following should be used to resolve this issue? (Select TWO)

Add vanishing gradient to the model
Perform data augmentation on the training data
Make the neural network architecture complex.
Use gradient checking in the model

5. Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year's upcoming event. Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

Pre-split the data before uploading to Amazon S3
Have Amazon ML split the data randomly.
Have Amazon ML split the data sequentially.
Perform custom cross-validation on the data

6. Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other?

Recall
Misclassification rate
Mean absolute percentage error (MAPE)
Area Under the ROC Curve (AUC)

7. A city wants to monitor its air quality to address the consequences of air pollution A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city As this is a prototype, only daily data from the last year is available Which model is MOST likely to provide the best results in Amazon SageMaker?

Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor.
Use Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data.
Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full yearof data with a predictor_type of regressor.
Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full yearof data with a predictor_type of classifier.

8. A company is running a machine learning prediction service that generates 100 TB of predictions every day A Machine Learning Specialist must generate a visualization of the daily precision-recall curve from the predictions, and forward a read-only version to the Business team. Which solution requires the LEAST coding effort?

Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3 Give the Business team read-only access to S3
Generate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team
Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3 Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team
Generate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team.

9. A Machine Learning Specialist is preparing data for training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training What should the Specialist do to optimize the data for training on SageMaker'?

Use the SageMaker batch transform feature to transform the training data into a DataFrame
Use AWS Glue to compress the data into the Apache Parquet format
Transform the dataset into the Recordio protobuf format
Use the SageMaker hyperparameter optimization feature to automatically optimize the data

10. A company is setting up an Amazon SageMaker environment. The corporate data security policy does not allow communication over the internet. How can the company enable the Amazon SageMaker service without enabling direct internet access to Amazon SageMaker notebook instances?

Create a NAT gateway within the corporate VPC.
Route Amazon SageMaker traffic through an on-premises network.
Create Amazon SageMaker VPC interface endpoints within the corporate VPC.
Create VPC peering with Amazon VPC hosting Amazon SageMaker.

11. A Machine Learning Specialist deployed a model that provides product recommendations on a company's website Initially, the model was performing very well and resulted in customers buying more products on average However within the past few months the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago Which method should the Specialist try to improve model performance?

The model needs to be completely re-engineered because it is unable to handle product inventory changes
The model's hyperparameters should be periodically updated to prevent drift
The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
The model should be periodically retrained using the original training data plus new data as product inventory changes

12. A Machine Learning Specialist is building a logistic regression model that will predict whether or not a person will order a pizza. The Specialist is trying to build the optimal model with an ideal classification threshold. What model evaluation technique should the Specialist use to understand how different classification thresholds will impact the model's performance?

Receiver operating characteristic (ROC) curve
Misclassification rate
Root Mean Square Error (RM&)
L1 norm

13. A Machine Learning Specialist was given a dataset consisting of unlabeled data The Specialist must create a model that can help the team classify the data into different buckets What model should be used to complete this work?

K-means clustering
Random Cut Forest (RCF)
XGBoost
BlazingText

14. A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression During exploratory data analysis the Specialist observes that many features are highly correlated with each other This may make the model unstable What should be done to reduce the impact of having such a large number of features?

Perform one-hot encoding on highly correlated features
Use matrix multiplication on highly correlated features.
Create a new feature space using principal component analysis (PCA)
Apply the Pearson correlation coefficient

15. A Machine Learning Specialist is developing recommendation engine for a photography blog Given a picture, the recommendation engine should show a picture that captures similar objects The Specialist would like to create a numerical representation feature to perform nearest-neighbor searches What actions would allow the Specialist to get relevant numerical representations?

Reduce image resolution and use reduced resolution pixel values as features
Use Amazon Mechanical Turk to label image content and create a one-hot representation indicating the presence of specific labels
Run images through a neural network pie-trained on ImageNet, and collect the feature vectors from the penultimate layer
Average colors by channel to obtain three-dimensional representations of images.

16. A Machine Learning Specialist is working with multiple data sources containing billions of records that need to be joined. What feature engineering and model development approach should the Specialist take with a dataset this large?

Use an Amazon SageMaker notebook for both feature engineering and model development
Use an Amazon SageMaker notebook for feature engineering and Amazon ML for model development
Use Amazon EMR for feature engineering and Amazon SageMaker SDK for model development
Use Amazon ML for both feature engineering and model development.

17. A Machine Learning Specialist is packaging a custom ResNet model into a Docker container so the company can leverage Amazon SageMaker for training. The Specialist is using Amazon EC2 P3 instances to train the model and needs to properly configure the Docker container to leverage the NVIDIA GPUs. What does the Specialist need to do?

Bundle the NVIDIA drivers with the Docker image.
Build the Docker container to be NVIDIA-Docker compatible.
Organize the Docker container's file structure to execute on GPU instances.
Set the GPU flag in the Amazon SageMaker CreateTrainingJob request body

18. A Data Scientist wants to gain real-time insights into a data stream of GZIP files. Which solution would allow the use of SQL to query the stream with the LEAST latency?

Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.
AWS Glue with a custom ETL script to transform the data.
An Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster.
Amazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket.

19. An agency collects census information within a country to determine healthcare and social program needs by province and city. The census form collects responses for approximately 500 questions from each citizen Which combination of algorithms would provide the appropriate insights? (Select TWO )

The factorization machines (FM) algorithm
The Latent Dirichlet Allocation (LDA) algorithm
The principal component analysis (PCA) algorithm
The k-means algorithm

20. A manufacturing company asks its Machine Learning Specialist to develop a model that classifies defective parts into one of eight defect types. The company has provided roughly 100000 images per defect type for training During the injial training of the image classification model the Specialist notices that the validation accuracy is 80%, while the training accuracy is 90% It is known that human-level performance for this type of image classification is around 90% What should the Specialist consider to fix this issue1?

A longer training time
Making the network larger
Using a different optimizer
Using some form of regularization

21. A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions Here is an example from the dataset "The quck BROWN FOX jumps over the lazy dog " Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Select THREE)

Perform part-of-speech tagging and keep the action verb and the nouns only
Normalize all words by making the sentence lowercase
Remove stop words using an English stopword dictionary.
Correct the typography on "quck" to "quick."

22. A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture. Which of the following will accomplish this? (Select TWO.)

Customize the built-in image classification algorithm to use Inception and use this for model training.
Create a support case with the SageMaker team to change the default image classification algorithm to Inception.
Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training.
Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network and use this for model training.

23. While working on a neural network project, a Machine Learning Specialist discovers thai some features in the data have very high magnitude resulting in this data being weighted more in the cost function What should the Specialist do to ensure better convergence during backpropagation?

Dimensionality reduction
Data normalization
Model regulanzation
Data augmentation for the minority class

24. A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for exploration and analysis. Which of the following services would both ingest and store this data in the correct format?

AWSDMS
Amazon Kinesis Data Streams
Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics

25. A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s) Which visualization will accomplish this?

A histogram showing whether the most important input feature is Gaussian.
A scatter plot with points colored by target variable that uses (-Distributed Stochastic Neighbor Embedding (I-SNE) to visualize the large number of input variables in an easier-to-read dimension.
A scatter plot showing (he performance of the objective metric over each training iteration
A scatter plot showing the correlation between maximum tree depth and the objective metric.

26. A Machine Learning Specialist trained a regression model, but the first iteration needs optimizing. The Specialist needs to understand whether the model is more frequently overestimating or underestimating the target. What option can the Specialist use to determine whether it is overestimating or underestimating the target value?

Root Mean Square Error (RMSE)
Residual plots
Area under the curve
Confusion matrix

27. A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (Pll). The dataset: Must be accessible from a VPC only. Must not traverse the public internet. How can these requirements be satisfied?

Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.
Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance.
Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance.
Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance.

28. A Marketing Manager at a pet insurance company plans to launch a targeted marketing campaign on social media to acquire new customers Currently, the company has the following data in Amazon Aurora • Profiles for all past and existing customers • Profiles for all past and existing insured pets • Policy-level information • Premiums received • Claims paid What steps should be taken to implement a machine learning model to identify potential new customers on social media?

Use regression on customer profile data to understand key characteristics of consumer segments Find similar profiles on social media.
Use clustering on customer profile data to understand key characteristics of consumer segments Find similar profiles on social media.
Use a recommendation engine on customer profile data to understand key characteristics of consumer segment
Find similar profiles on social media

29. A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network. How should the Data Science team configure the notebook instance placement to meet these requirements?

Associate the Amazon SageMaker notebook with a private subnet in a VP
Place the Amazon SageMaker endpoint and S3 buckets within the same VPC.
Associate the Amazon SageMaker notebook with a private subnet in a VP
Use 1AM policies to grant access to Amazon S3 and Amazon SageMaker.

30. A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences and trends to enhance the website for better service and smart recommendations. Which solution should the Specialist recommend?

Latent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer database.
A neural network with a minimum of three layers and random initial weights to identify patterns in the customer database
Collaborative filtering based on user interactions and correlations to identify patterns in the customer database
Random Cut Forest (RCF) over random subsamples to identify patterns in the customer database

31. An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget. What should the Specialist do to meet these requirements?

Create one-hot word encoding vectors.
Produce a set of synonyms for every word using Amazon Mechanical Turk.
Create word embedding factors that store edit distance with every other word.
Download word embedding’s pre-trained on a large corpus.

32. During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates What is the MOST likely cause of this issue?

The class distribution in the dataset is imbalanced
Dataset shuffling is disabled
The batch size is too big
The learning rate is very high

33. A Machine Learning Specialist needs to move and transform data in preparation for training Some of the data needs to be processed in near-real time and other data can be moved hourly There are existing Amazon EMR MapReduce jobs to clean and feature engineering to perform on the data Which of the following services can feed data to the MapReduce jobs? (Select TWO )

Correct selection

AWSDMS
Amazon Kinesis
AWS Data Pipeline

Correct selection

Amazon Athena

34. A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data. The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format without data loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data and connect to existing business intelligence dashboards. Which solution should the Data Scientist build to satisfy the requirements?

Create a schema in the AWS Glue Data Catalog of the incoming data forma
Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to Bl tools using the Athena Java Database Connectivity (JDBC) connector.
Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and writes the data to a processed data location in Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to Bl tools using the Athena Java Database Connectivity (JDBC) connector.
Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and inserts it into an Amazon RDS PostgreSQL databas

35. Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year's upcoming event. Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

Pre-split the data before uploading to Amazon S3
Have Amazon ML split the data randomly.
Have Amazon ML split the data sequentially.
Perform custom cross-validation on the data

36. A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC. Why is the ML Specialist not seeing the instance visible in the VPC?

Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs.
Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts.
Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.
Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts.

37. A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new datasets every day the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL. Which storage scheme is MOST adapted to this scenario?

Store datasets as files in Amazon S3.
Store datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance.
Store datasets as tables in a multi-node Amazon Redshift cluster.
Store datasets as global tables in Amazon DynamoDB.

38. A bank's Machine Learning team is developing an approach for credit card fraud detection The company has a large dataset of historical data labeled as fraudulent The goal is to build a model to take the information from new transactions and predict whether each transaction is fraudulent or not Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this problem?

Seq2seq
XGBoost
K-means
Random Cut Forest (RCF)

39. A Machine Learning Specialist deployed a model that provides product recommendations on a company's website Initially, the model was performing very well and resulted in customers buying more products on average However within the past few months the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago Which method should the Specialist try to improve model performance?

The model needs to be completely re-engineered because it is unable to handle product inventory changes
The model's hyperparameters should be periodically updated to prevent drift
The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
The model should be periodically retrained using the original training data plus new data as product inventory changes

40. IT leadership wants Jo transition a company's existing machine learning data storage environment to AWS as a temporary ad hoc solution The company currently uses a custom software process that heavily leverages SOL as a query language and exclusively stores generated csv documents for machine learning The ideal state for the company would be a solution that allows it to continue to use the current workforce of SQL experts The solution must also support the storage of csv and JSON files, and be able to query over semi-structured data The following are high priorities for the company: • Solution simplicity • Fast development time • Low cost • High flexibility What technologies meet the company's requirements?

Amazon S3 and Amazon Athena
Amazon Redshift and AWS Glue
Amazon DynamoDB and DynamoDB Accelerator (DAX)
Amazon RDS and Amazon ES

41. A Machine Learning Specialist observes several performance problems with the training portion of a machine learning solution on Amazon SageMaker The solution uses a large training dataset 2 TB in size and is using the SageMaker k-means algorithm The observed issues include the unacceptable length of time it takes before the training job launches and poor I/O throughput while training the model What should the Specialist do to address the performance issues with the current solution?

Use the SageMaker batch transform feature
Compress the training data into Apache Parquet format.
Ensure that the input mode for the training job is set to Pipe.
Copy the training dataset to an Amazon EFS volume mounted on the SageMaker instance.

42. A manufacturing company has a large set of labeled historical sales data The manufacturer would like to predict how many units of a particular part should be produced each quarter Which machine learning approach should be used to solve this problem?

Logistic regression
Random Cut Forest (RCF)
Principal component analysis (PCA)
Linear regression

43. A Machine Learning Specialist is working for a credit card processing company and receives an unbalanced dataset containing credit card transactions. It contains 99,000 valid transactions and 1,000 fraudulent transactions The Specialist is asked to score a model that was run against the dataset The Specialist has been advised that identifying valid transactions is equally as important as identifying fraudulent transactions What metric is BEST suited to score the model?

Precision
Recall
Area Under the ROC Curve (AUC)
Root Mean Square Error (RMSE)

44. A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data?

Use AWS Data Pipeline to transform the data and Amazon RDS to run queries.
Use AWS Glue to catalogue the data and Amazon Athena to run queries
Use AWS Batch to run ETL on the data and Amazon Aurora to run the quenes
Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries

45. The Chief Editor for a product catalog wants the Research and Development team to build a machine learning system that can be used to detect whether or not individuals in a collection of images are wearing the company's retail brand The team has a set of training data Which machine learning algorithm should the researchers use that BEST meets their requirements?

Latent Dirichlet Allocation (LDA)
Recurrent neural network (RNN)
K-means
Convolutional neural network (CNN)

46. A Data Engineer needs to build a model using a dataset containing customer credit card information. How can the Data Engineer ensure the data remains encrypted and the credit card information is secure? Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers.

Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers.
Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VP
Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers.
Use AWS KMS to encrypt the data on Amazon S3

47. An agency collects census information within a country to determine healthcare and social program needs by province and city. The census form collects responses for approximately 500 questions from each citizen Which combination of algorithms would provide the appropriate insights? (Select TWO )

The factorization machines (FM) algorithm
The Latent Dirichlet Allocation (LDA) algorithm
The principal component analysis (PCA) algorithm
The k-means algorithm

48. A Machine Learning Specialist is assigned a TensorFlow project using Amazon SageMaker for training, and needs to continue working for an extended period with no Wi-Fi access. Which approach should the Specialist use to continue working?

Install Python 3 and boto3 on their laptop and continue the code development using that environment.
Download the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local environment, and use the Amazon SageMaker Python SDK to test the code.
Download TensorFlow from tensorflow.org to emulate the TensorFlow kernel in the SageMaker environment.
Download the SageMaker notebook to their local environment then install Jupyter Notebooks on their laptop and continue the development in a local notebook.

49. A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions Here is an example from the dataset "The quck BROWN FOX jumps over the lazy dog " Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Select THREE)

Perform part-of-speech tagging and keep the action verb and the nouns only
Normalize all words by making the sentence lowercase
Remove stop words using an English stopword dictionary.
Correct the typography on "quck" to "quick."

50. While working on a neural network project, a Machine Learning Specialist discovers thai some features in the data have very high magnitude resulting in this data being weighted more in the cost function What should the Specialist do to ensure better convergence during backpropagation?

Dimensionality reduction
Data normalization
Model regulanzation
Data augmentation for the minority class

FAQs

1. What is the AWS Certified Machine Learning Specialty MLS-C01 certification?

It is a specialty-level AWS certification that validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.

2. How do I become AWS Certified Machine Learning Specialty certified?

You need to study ML and AWS AI/ML services, register for the MLS-C01 exam on the AWS Certification Portal, and pass it.

3. What are the prerequisites for the AWS Certified Machine Learning Specialty exam?

There are no mandatory prerequisites, but AWS recommends at least 1–2 years of experience in ML or deep learning and knowledge of AWS cloud services.

4. How much does the AWS MLS-C01 certification exam cost?

The exam fee is $300 USD.

5. How many questions are on the AWS Certified Machine Learning Specialty exam?

The exam has 65 multiple-choice and multiple-response questions.

6. What is the passing score for the AWS Machine Learning Specialty MLS-C01 exam?

You need a scaled score of 750 out of 1000 to pass.

7. How long is the AWS Machine Learning Specialty certification exam?

The exam duration is 180 minutes.

8. What topics are covered in the AWS Certified Machine Learning Specialty exam?

It covers data engineering, exploratory data analysis, modeling, ML implementation, and ML operations on AWS.

9. How difficult is the AWS MLS-C01 certification exam?

It is considered challenging, requiring both ML expertise and AWS service knowledge.

10. How long does it take to prepare for the AWS Certified Machine Learning Specialty exam?

Most candidates prepare in 8–12 weeks, depending on prior ML and AWS experience.

11. Are there any AWS Certified Machine Learning Specialty sample questions or practice tests available?

Yes, AWS provides sample questions, and CertiMaan offers dumps and practice tests.

12. What is the validity period of the AWS Certified Machine Learning Specialty certification?

The certification is valid for 3 years.

13. Can I retake the AWS MLS-C01 exam if I fail?

Yes, you can retake it after 14 days by paying the exam fee again.

14. What jobs can I get with an AWS Certified Machine Learning Specialty certification?

You can work as a Machine Learning Engineer, Data Scientist, AI Specialist, or Cloud ML Engineer.

15. How much salary can I earn with the AWS Certified Machine Learning Specialty MLS-C01 certification?

Certified professionals typically earn between $120,000–$160,000 annually, depending on experience and location.