Microsoft

DP-100 — Designing and Implementing a Data Science Solution on Azure Study Guide

477 practice questions Updated 2026-02-14 $19 (70% off) HTML + PDF formats

DP-100 Exam Overview

Prepare for the Microsoft DP-100 certification exam with our comprehensive study guide. This study material contains 477 practice questions sourced from real exams and expert-verified for accuracy. Each question includes the correct answer and a detailed explanation to help you understand the material thoroughly.

The DP-100 exam — Designing and Implementing a Data Science Solution on Azure — is offered by Microsoft. Passing this exam earns you the Microsoft Certified: Data Science Associate credential, an industry-recognized certification that validates your expertise. Our study materials were last updated on 2026-02-14 to reflect the most recent exam objectives and content.

What You Get

477 Practice Questions

Complete question bank covering all exam domains and objectives.

HTML + PDF Formats

Interactive HTML file (recommended) for screen study and a print-ready PDF.

Instant Download

Access your study materials immediately after purchase.

Email with Permanent Download Links

You will receive a confirmation email with permanent download links in case you want to download the files again in the future.

Why Choose CheapestExamDumps?

Lowest Price Available

Only $19 per exam — competitors charge $50-$300 for similar content.

Updated Monthly

Study materials refreshed within 30 days of any exam content changes.

Free Preview

Try 15 real practice questions before you buy — no signup required.

Instant Access

Download HTML + PDF immediately after payment. No waiting, no account needed.

About the Microsoft Certified: Data Science Associate

The Microsoft Certified: Data Science Associate is awarded by Microsoft to professionals who demonstrate competence in the skills measured by the DP-100 exam. According to the official Microsoft certification page, this certification validates your ability to work with the technologies covered in the exam objectives.

According to the Global Knowledge IT Skills and Salary Report, certified IT professionals earn 15-25% more than their non-certified peers. Certifications from Microsoft are among the most recognized credentials in the IT industry, with strong demand across enterprise organizations worldwide.

$63 $19

One-time payment · HTML + PDF · Instant download · 477 questions

Free Sample — 15 Practice Questions

Preview 15 of 477 questions from the DP-100 exam. Try before you buy — purchase the full study guide for all 477 questions with answers and explanations.

Question 387

You are creating a classification model for a banking company to identify possible instances of credit card fraud. You plan to create the model in Azure Machine Learning by using automated machine learning. The training dataset that you are using is highly unbalanced. You need to evaluate the classification model. Which primary metric should you use?

A. normalized_mean_absolute_error
B. AUC_weighted
C. accuracy
D. normalized_root_mean_squared_error
E. spearman_correlation
Show Answer
Correct Answer: B
Explanation:
For highly imbalanced classification problems like credit card fraud, accuracy is misleading because it is dominated by the majority class. AUC_weighted evaluates the model’s ability to discriminate between classes while weighting each class by its sample proportion, making it more robust and appropriate for imbalanced datasets in Azure AutoML. The other options are regression or correlation metrics and are not suitable for classification.

Question 164

You manage an Azure Machine Learning workspace by using the Azure CLI ml extension v2. You need to define a YAML schema to create a compute cluster. Which schema should you use?

A. https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json
B. https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
C. https://azuremlschemas.azureedge.net/latest/vmCompute.schema.json
D. https://azuremlschemas.azureedge.net/latest/kubernetesCompute.schema.json
Show Answer
Correct Answer: B
Explanation:
In Azure Machine Learning CLI v2, a compute *cluster* is defined using the **amlCompute** resource type. The corresponding YAML schema for creating and managing scalable AML compute clusters is `amlCompute.schema.json`. The other schemas apply to different compute types: computeInstance (single-user VM), vmCompute (attached VM), and kubernetesCompute (AKS/Arc Kubernetes).

Question 37

HOTSPOT - You design a data processing strategy for a machine learning project. The data that must be processed includes unstructured flat files that must be processed in real time. The data transformation must be executed on a serverless compute and optimized for big data analytical workloads. You need to select the Azure services for the data science team. Which storage and data processing service should you use? To answer, select the appropriate option in the answer area. NOTE: Each correct selection is worth one point.

Illustration for DP-100 question 37
Show Answer
Correct Answer: Data storage for model training workloads: Azure Data Lake Storage Gen2 Data processing solution: Azure Databricks
Explanation:
Azure Data Lake Storage Gen2 is optimized for big data analytics and unstructured data used in model training. Azure Databricks provides scalable, serverless Spark-based processing suitable for real-time transformation and large analytical workloads.

Question 144

HOTSPOT - You are implementing hyperparameter tuning for a model training from a notebook. The notebook is in an Azure Machine Learning workspace. You add code that imports all relevant Python libraries. You must configure Bayesian sampling over the search space for the num_hidden_layers and batch_size hyperparameters. You need to complete the following Python code to configure Bayesian sampling. Which code segments should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Illustration for DP-100 question 144
Show Answer
Correct Answer: choice range
Explanation:
For Bayesian sampling in Azure ML, discrete hyperparameters like batch_size with specific step values are defined using a choice over a generated range (e.g., range(16, 128, 16)). The learning rate already uses a supported continuous distribution.

Question 507

You are performing feature engineering on a dataset. You must add a feature named CityName and populate the column value with the text London. You need to add the new feature to the dataset. Which Azure Machine Learning Studio module should you use?

A. Extract N-Gram Features from Text
B. Edit Metadata
C. Preprocess Text
D. Apply SQL Transformation
Show Answer
Correct Answer: D
Explanation:
To add a new feature (column) and populate it with a constant value like 'London' for all rows, you need a data transformation that can create columns and assign values. The Apply SQL Transformation module supports SQL statements (e.g., SELECT *, 'London' AS CityName) to add and populate a new column. Edit Metadata only modifies properties of existing columns and cannot create new ones.

Question 449

You register a model that you plan to use in a batch inference pipeline. The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset. The script has the ParallelRunStep step runs must process six input files each time the inferencing function is called. You need to configure the pipeline. Which configuration setting should you specify in the ParallelRunConfig object for the PrallelRunStep step?

A. process_count_per_node= "6"
B. node_count= "6"
C. mini_batch_size= "6"
D. error_threshold= "6"
Show Answer
Correct Answer: C
Explanation:
In a ParallelRunStep using a FileDataset, the mini_batch_size setting controls how many input files are passed to the run() (inferencing) function in each invocation. Since the requirement is to process six input files each time the inferencing function is called, mini_batch_size must be set to 6. The other options control compute scaling or error handling, not per-call file grouping.

Question 369

HOTSPOT - A coworker registers a datastore in a Machine Learning services workspace by using the following code: You need to write code to access the datastore from a notebook. How should you complete the code segment? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:

Illustration for DP-100 question 369 Illustration for DP-100 question 369
Show Answer
Correct Answer: Datastore ws demo_datastore
Explanation:
Use the Datastore class to retrieve a registered datastore from the workspace. The get() method takes the workspace object and the datastore name used during registration.

Question 1

You are implementing hyperparameter tuning by using Bayesian sampling for an Azure ML Python SDK v2-based model training from a notebook. The notebook is in an Azure Machine Learning workspace. The notebook uses a training script that runs on a compute cluster with 20 nodes. The code implements Bandit termination policy with slack_factor set to 0.2 and a sweep job with max_concurrent_trials set to 10. You must increase effectiveness of the tuning process by improving sampling convergence. You need to select which sampling convergence to use. What should you select?

A. Set the value of max_concurrent_trials to 20.
B. Set the value of slack_factor of early_termination policy to 0.1.
C. Set the value of slack_factor of early_termination policy to 0.9.
D. Set the value of max_concurrent_trials to 4.
Show Answer
Correct Answer: D
Explanation:
Bayesian sampling in Azure ML is sequential and relies on results from completed trials to guide the next hyperparameter choices. With a high max_concurrent_trials value, many runs start simultaneously and cannot benefit from each other’s outcomes, reducing convergence and behaving more like random sampling. Lowering max_concurrent_trials (for example, to 4) allows more completed results to inform subsequent trials, improving Bayesian sampling convergence. Changing the slack_factor affects early termination aggressiveness, not sampling convergence.

Question 315

You write a Python script that processes data in a comma-separated values (CSV) file. You plan to run this script as an Azure Machine Learning experiment. The script loads the data and determines the number of rows it contains using the following code: You need to record the row count as a metric named row_count that can be returned using the get_metrics method of the Run object after the experiment run completes. Which code should you use?

A. run.upload_file(T3 row_count', './data.csv')
B. run.log('row_count', rows)
C. run.tag('row_count', rows)
D. run.log_table('row_count', rows)
E. run.log_row('row_count', rows)
Show Answer
Correct Answer: B
Explanation:
To record a numeric metric that can be retrieved later with Run.get_metrics(), you must log it as a metric on the run. The run.log(name, value) method is designed for logging scalar metrics like a row count. Tags are metadata, uploads are files, and log_table/log_row are for tabular data, not a single numeric metric.

Question 73

HOTSPOT - You create multiple machine learning models by using automated machine learning. You need to configure a primary metric for each use case. Which metrics should you configure? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Illustration for DP-100 question 73
Show Answer
Correct Answer: Bug resolution time (regression): r2_score Sentiment analysis (classification): accuracy
Explanation:
Regression tasks use r2_score to measure how well predicted values explain variance in a continuous target. Classification tasks like sentiment analysis commonly use accuracy to measure the proportion of correctly classified labels.

Question 420

DRAG DROP - You create a training pipeline using the Azure Machine Learning designer. You upload a CSV file that contains the data from which you want to train your model. You need to use the designer to create a pipeline that includes steps to perform the following tasks: ✑ Select the training features using the pandas filter method. ✑ Train a model based on the naive_bayes.GaussianNB algorithm. ✑ Return only the Scored Labels column by using the query ✑ SELECT [Scored Labels] FROM t1; Which modules should you use? To answer, drag the appropriate modules to the appropriate locations. Each module name may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place:

Illustration for DP-100 question 420
Show Answer
Correct Answer: Execute Python Script Create Python Model Apply SQL Transformation
Explanation:
Execute Python Script is used to apply the pandas filter method on the dataset. Create Python Model allows defining and training a custom model using naive_bayes.GaussianNB, which is not available as a built-in designer algorithm. Apply SQL Transformation is used after scoring to return only the Scored Labels column using the specified SELECT query.

Question 508

HOTSPOT - You are creating a machine learning model in Python. The provided dataset contains several numerical columns and one text column. The text column represents a product's category. The product category will always be one of the following: ✑ Bikes ✑ Cars ✑ Vans ✑ Boats You are building a regression model using the scikit-learn Python package. You need to transform the text data to be compatible with the scikit-learn Python package. How should you complete the code segment? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:

Illustration for DP-100 question 508
Show Answer
Correct Answer: pandas as df map(ProductCategoryMapping)
Explanation:
pandas is required to load CSV data into a DataFrame. The pandas Series.map() method converts categorical text values into numeric values using the provided dictionary, making the column compatible with scikit-learn regression models.

Question 9

DRAG DROP - You are designing an Azure Machine Learning solution by using the Python SDK v2. You must train and deploy the solution by using a compute target. The compute target must meet the following requirements: • Enable the use of on-premises compute resources. • Support autoscaling. You need to configure a compute target for training and inference. Which compute targets should you configure? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Illustration for DP-100 question 9
Show Answer
Correct Answer: Training: Azure Machine Learning Kubernetes Inference: Azure Machine Learning Kubernetes
Explanation:
Azure Machine Learning Kubernetes (AKS or Arc-enabled Kubernetes) supports autoscaling and can be connected to on‑premises resources via Azure Arc. Local compute does not support autoscaling, and Apache Spark pools are cloud-only and not suitable for on‑premises inference.

Question 222

You create an Azure Machine Learning workspace. The workspace contains a dataset named sample_dataset, a compute instance, and a compute cluster. You must create a two-stage pipeline that will prepare data in the dataset and then train and register a model based on the prepared data. The first stage of the pipeline contains the following code: You need to identify the location containing the output of the first stage of the script that you can use as input for the second stage. Which storage location should you use?

A. workspaceblobstore datastore
B. workspacefilestore datastore
C. compute instance
D. compute_cluster
Show Answer
Correct Answer: A
Explanation:
In Azure Machine Learning pipelines, outputs from a pipeline step (for example, via OutputFileDatasetConfig) are written to the workspace’s default datastore unless otherwise specified. The default datastore is an Azure Blob Storage container named workspaceblobstore, which is designed to store experiment artifacts, intermediate data, and pipeline outputs. Compute instances or clusters are transient execution environments, and workspacefilestore is intended mainly for user files such as notebooks, not pipeline stage outputs.

Question 151

HOTSPOT - You plan to use a curated environment to run Azure Machine Learning training experiments in a workspace. You need to display all curated environments and their respective packages in the workspace. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Illustration for DP-100 question 151
Show Answer
Correct Answer: AzureML python
Explanation:
Curated Azure ML environments are identified by names starting with "AzureML". Package details are accessed through the environment’s Python section, where conda_dependencies can be serialized to list all installed packages.

$63 $19

Get all 477 questions with detailed answers and explanations

DP-100 — Frequently Asked Questions

What is the Microsoft DP-100 exam?

The Microsoft DP-100 exam — Designing and Implementing a Data Science Solution on Azure — is a professional IT certification exam offered by Microsoft. Passing this exam earns you the Microsoft Certified: Data Science Associate certification, a widely recognized credential in the IT industry.

How many practice questions are included?

This study guide contains 477 practice questions, each with an expert-verified correct answer and a detailed explanation. Questions cover all exam domains and objectives.

Is there a free sample available?

Yes! We provide a free sample of 15 practice questions from the DP-100 exam right on this page. Scroll up to preview them and evaluate the quality of our materials before purchasing.

When was this DP-100 study guide last updated?

This study guide was last updated on 2026-02-14. We regularly refresh our materials to reflect the latest exam content and objectives so you're always studying current material.

What file formats do I receive?

After purchase you receive two files: an interactive HTML file with show/hide answer toggles (ideal for studying on screen) and a PDF file (ideal for printing or offline study). Both work on any device — desktop, tablet, or phone.

How much does the DP-100 study guide cost?

The Microsoft DP-100 study guide costs $19 (discounted from $63). This is a one-time payment with no subscriptions or hidden fees.

How do I get my files after payment?

After successful payment via Stripe, you are immediately redirected to a download page with links to your HTML and PDF files. We also send the download links to your email address as a backup, so you'll always have access.

Why choose CheapestExamDumps over other providers?

CheapestExamDumps offers the lowest price at $19 per exam — competitors charge $50-$300 for similar content. All study materials are expert-verified, updated monthly, and include a free 15-question preview with no signup required. You get instant access to both HTML and PDF formats after payment.