Amazon

DEA-C01 — AWS Certified Data Analytics - Specialty Study Guide

237 practice questions Updated 2026-02-16 $19 (70% off) HTML + PDF formats

DEA-C01 Exam Overview

Prepare for the Amazon DEA-C01 certification exam with our comprehensive study guide. This study material contains 237 practice questions sourced from real exams and expert-verified for accuracy. Each question includes the correct answer and a detailed explanation to help you understand the material thoroughly.

The DEA-C01 exam — AWS Certified Data Analytics - Specialty — is offered by Amazon. Our study materials were last updated on 2026-02-16 to reflect the most recent exam objectives and content.

What You Get

237 Practice Questions

Complete question bank covering all exam domains and objectives.

HTML + PDF Formats

Interactive HTML file (recommended) for screen study and a print-ready PDF.

Instant Download

Access your study materials immediately after purchase.

Email with Permanent Download Links

You will receive a confirmation email with permanent download links in case you want to download the files again in the future.

Why Choose CheapestExamDumps?

Lowest Price Available

Only $19 per exam — competitors charge $50-$300 for similar content.

Updated Monthly

Study materials refreshed within 30 days of any exam content changes.

Free Preview

Try 15 real practice questions before you buy — no signup required.

Instant Access

Download HTML + PDF immediately after payment. No waiting, no account needed.

$63 $19

One-time payment · HTML + PDF · Instant download · 237 questions

Free Sample — 15 Practice Questions

Preview 15 of 237 questions from the DEA-C01 exam. Try before you buy — purchase the full study guide for all 237 questions with answers and explanations.

Question 105

A company implements a data mesh that has a central governance account. The company needs to catalog all data in the governance account. The governance account uses AWS Lake Formation to centrally share data and grant access permissions. The company has created a new data product that includes a group of Amazon Redshift Serverless tables. A data engineer needs to share the data product with a marketing team. The marketing team must have access to only a subset of columns. The data engineer needs to share the same data product with a compliance team. The compliance team must have access to a different subset of columns than the marketing team needs access to. Which combination of steps should the data engineer take to meet these requirements? (Choose two.)

A. Create views of the tables that need to be shared. Include only the required columns.
B. Create an Amazon Redshift data share that includes the tables that need to be shared.
C. Create an Amazon Redshift managed VPC endpoint in the marketing team’s account. Grant the marketing team access to the views.
D. Share the Amazon Redshift data share to the Lake Formation catalog in the governance account.
E. Share the Amazon Redshift data share to the Amazon Redshift Serverless workgroup in the marketing team's account.
Show Answer
Correct Answer: A, D
Explanation:
Amazon Redshift data sharing does not support column-level permissions. To provide different subsets of columns to different teams, the data engineer must create views that expose only the required columns. To meet the requirement for central governance and cataloging, the Redshift data share must be shared with the AWS Lake Formation catalog in the governance account, where Lake Formation manages access permissions.

Question 222

A company uses Amazon Athena to run SQL queries for extract, transform, and load (ETL) tasks by using Create Table As Select (CTAS). The company must use Apache Spark instead of SQL to generate analytics. Which solution will give the company the ability to use Spark to access Athena?

A. Athena query settings
B. Athena workgroup
C. Athena data source
D. Athena query editor
Show Answer
Correct Answer: B
Explanation:
Amazon Athena supports Apache Spark through Athena for Apache Spark, which requires creating and using a Spark-enabled Athena workgroup. The workgroup determines the query engine (SQL or Spark) and is the prerequisite to run Spark notebooks and Spark-based ETL/analytics in Athena. Other options (query settings, data source, query editor) do not enable the Spark engine itself.

Question 155

An insurance company stores transaction data that the company compressed with gzip. The company needs to query the transaction data for occasional audits. Which solution will meet this requirement in the MOST cost-effective way?

A. Store the data in Amazon Glacier Flexible Retrieval. Use Amazon S3 Glacier Select to query the data.
B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.
C. Store the data in Amazon S3. Use Amazon Athena to query the data.
D. Store the data in Amazon Glacier Instant Retrieval. Use Amazon Athena to query the data.
Show Answer
Correct Answer: A
Explanation:
The data is already compressed with gzip and is accessed only for occasional audits, indicating infrequent access and low performance requirements. Amazon S3 Glacier Flexible Retrieval offers the lowest storage cost for cold data, and S3 Glacier Select can directly query compressed data in place without restoring entire objects. Although retrieval has per-query costs and latency, this is acceptable for occasional audits and is more cost-effective overall than keeping the data in standard S3 or using Athena, which would require higher storage costs or additional query scanning charges.

Question 41

A company wants to ingest streaming data into an Amazon Redshift data warehouse from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. A data engineer needs to develop a solution that provides low data access time and that optimizes storage costs. Which solution will meet these requirements with the LEAST operational overhead?

A. Create an external schema that maps to the MSK cluster. Create a materialized view that references the external schema to consume the streaming data from the MSK topic.
B. Develop an AWS Glue streaming extract, transform, and load (ETL) job to process the incoming data from Amazon MSK. Load the data into Amazon S3. Use Amazon Redshift Spectrum to read the data from Amazon S3.
C. Create an external schema that maps to the streaming data source. Create a new Amazon Redshift table that references the external schema.
D. Create an Amazon S3 bucket. Ingest the data from Amazon MSK. Create an event-driven AWS Lambda function to load the data from the S3 bucket to a new Amazon Redshift table.
Show Answer
Correct Answer: A
Explanation:
Amazon Redshift Streaming Ingestion supports directly consuming data from Amazon MSK by creating an external schema and a materialized view. This provides near-real-time access with low latency, avoids intermediate storage layers like Amazon S3 (optimizing storage costs), and minimizes operational overhead because no custom ETL jobs, Lambda functions, or orchestration are required. The materialized view manages ingestion and refresh automatically.

Question 215

A company stores data from an application in an Amazon DynamoDB table that operates in provisioned capacity mode. The workloads of the application have predictable throughput load on a regular schedule. Every Monday, there is an immediate increase in activity early in the morning. The application has very low usage during weekends. The company must ensure that the application performs consistently during peak usage times. Which solution will meet these requirements in the MOST cost-effective way?

A. Increase the provisioned capacity to the maximum capacity that is currently present during peak load times.
B. Divide the table into two tables. Provision each table with half of the provisioned capacity of the original table. Spread queries evenly across both tables.
C. Use AWS Application Auto Scaling to schedule higher provisioned capacity for peak usage times. Schedule lower capacity during off-peak times.
D. Change the capacity mode from provisioned to on-demand. Configure the table to scale up and scale down based on the load on the table.
Show Answer
Correct Answer: C
Explanation:
The workload is predictable with regular peak and off-peak periods. Using DynamoDB in provisioned capacity mode with AWS Application Auto Scaling and scheduled scaling allows the company to increase capacity before known peak times (such as Monday mornings) and reduce it during low-usage periods like weekends. This ensures consistent performance during peaks while minimizing costs compared to overprovisioning all the time or switching to on-demand, which is typically more expensive for predictable workloads.

Question 141

A company wants to migrate an application and an on-premises Apache Kafka server to AWS. The application processes incremental updates that an on-premises Oracle database sends to the Kafka server. The company wants to use the replatform migration strategy instead of the refactor strategy. Which solution will meet these requirements with the LEAST management overhead?

A. Amazon Kinesis Data Streams
B. Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned cluster
C. Amazon Kinesis Data Firehose
D. Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless
Show Answer
Correct Answer: D
Explanation:
The requirement is to replatform an existing Apache Kafka–based architecture with the least management overhead. Amazon MSK Serverless provides a fully managed, Kafka-compatible service with automatic scaling, capacity management, and no need to manage brokers or clusters. This preserves Kafka semantics (unlike Kinesis services, which would be a refactor) while minimizing operational effort compared with an MSK provisioned cluster. Therefore, Amazon MSK Serverless best meets the requirements.

Question 135

A data engineer uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run data pipelines in an AWS account. A workflow recently failed to run. The data engineer needs to use Apache Airflow logs to diagnose the failure of the workflow. Which log type should the data engineer use to diagnose the cause of the failure?

A. YourEnvironmentName-WebServer
B. YourEnvironmentName-Scheduler
C. YourEnvironmentName-DAGProcessing
D. YourEnvironmentName-Task
Show Answer
Correct Answer: D
Explanation:
In Amazon MWAA, the most direct way to diagnose a workflow (DAG) failure is to review the task logs. YourEnvironmentName-Task logs capture the execution details of individual tasks, including error messages, stack traces, and runtime failures, which are typically the root cause when a workflow fails. Scheduler and DAGProcessing logs are more focused on scheduling and DAG parsing issues, while WebServer logs relate to the UI.

Question 5

A retail company stores point-of-sale transaction data in an Amazon RDS for MySQL database. The company maintains historical sales analytics in Amazon Redshift. The company needs to create daily reports that combine the current day's transactions with historical sales patterns for trend analysis. The company requires a solution that provides near real-time insights while minimizing data transfer costs and maintenance overhead. Which solution will meet these requirements?

A. Configure AWS Database Migration Service (AWS DMS) to continuously replicate data from RDS for MySQL to Amazon Redshift. Use Redshift queries to create consolidated reports.
B. Implement Amazon Redshift federated queries to directly access RDS for MySQL data and join it with existing Redshift tables in a single query.
C. Use AWS Glue to create an extract, transform, and load (ETL) pipeline that runs every hour to copy incremental data from RDS for MySQL to Amazon Redshift. Generate reports.
D. Export RDS for MySQL data to an Amazon S3 bucket on a regular schedule. Use the COPY command to load the data into Amazon Redshift staging tables. Join the data with historical data.
Show Answer
Correct Answer: B
Explanation:
Amazon Redshift federated queries allow Amazon Redshift to query Amazon RDS for MySQL data in place and join it with existing Redshift tables. This provides near real-time access to the current day’s transactions without continuously copying data, minimizing data transfer costs and reducing operational and maintenance overhead compared with replication or ETL-based approaches.

Question 164

A company plans to provision a log delivery stream within a VPC. The company configured the VPC flow logs to publish to Amazon CloudWatch Logs. The company needs to send the flow logs to Splunk in near real time for further analysis. Which solution will meet these requirements with the LEAST operational overhead?

A. Configure an Amazon Kinesis Data Streams data stream to use Splunk as the destination. Create a CloudWatch Logs subscription filter to send log events to the data stream.
B. Create an Amazon Kinesis Data Firehose delivery stream to use Splunk as the destination. Create a CloudWatch Logs subscription filter to send log events to the delivery stream.
C. Create an Amazon Kinesis Data Firehose delivery stream to use Splunk as the destination. Create an AWS Lambda function to send the flow logs from CloudWatch Logs to the delivery stream.
D. Configure an Amazon Kinesis Data Streams data stream to use Splunk as the destination. Create an AWS Lambda function to send the flow logs from CloudWatch Logs to the data stream.
Show Answer
Correct Answer: B
Explanation:
Amazon Kinesis Data Firehose has native integration with Splunk and can receive data directly from CloudWatch Logs via a subscription filter. This provides near real-time delivery with minimal setup and no need to manage shards, scaling, or custom Lambda code. Using Kinesis Data Streams or Lambda would add unnecessary operational overhead compared to Firehose.

Question 44

A company has an on-premises PostgreSQL database that contains customer data. The company wants to migrate the customer data to an Amazon Redshift data warehouse. The company has established a VPN connection between the on-premises database and AWS. The on-premises database is continuously updated. The company must ensure that the data in Amazon Redshift is updated as quickly as possible. Which solution will meet these requirements?

A. Use the pg_dump utility to generate a backup of the PostgreSQL database. Use the AWS Schema Conversion Tool (AWS SCT) to upload the backup to Amazon Redshift. Set up a cron job to perform a backup. Upload the backup to Amazon Redshift every night.
B. Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to use the change data capture (CDC) feature.
C. Use the pg_dump utility to generate a backup of the PostgreSQL database. Upload the backup to an Amazon S3 bucket. Use the COPY command to import the data into Amazon Redshift.
D. Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to perform a full load of the database to Amazon Redshift every night.
Show Answer
Correct Answer: B
Explanation:
The database is continuously updated and the data in Amazon Redshift must be kept as current as possible. AWS Database Migration Service (AWS DMS) supports an initial full load followed by ongoing change data capture (CDC), which replicates inserts, updates, and deletes from the on‑premises PostgreSQL database to Amazon Redshift in near real time over the existing VPN connection. The other options rely on periodic backups and batch loads, which cannot keep Redshift updated quickly enough.

Question 192

A company uses an Amazon QuickSight dashboard to monitor usage of one of the company's applications. The company uses AWS Glue jobs to process data for the dashboard. The company stores the data in a single Amazon S3 bucket. The company adds new data every day. A data engineer discovers that dashboard queries are becoming slower over time. The data engineer determines that the root cause of the slowing queries is long-running AWS Glue jobs. Which actions should the data engineer take to improve the performance of the AWS Glue jobs? (Choose two.)

A. Partition the data that is in the S3 bucket. Organize the data by year, month, and day.
B. Increase the AWS Glue instance size by scaling up the worker type.
C. Convert the AWS Glue schema to the DynamicFrame schema class.
D. Adjust AWS Glue job scheduling frequency so the jobs run half as many times each day.
E. Modify the IAM role that grants access to AWS glue to grant access to all S3 features.
Show Answer
Correct Answer: A, B
Explanation:
As daily data accumulates in a single S3 bucket, AWS Glue jobs must scan increasing volumes, causing longer runtimes. Partitioning the S3 data by year/month/day allows Glue to read only relevant partitions, reducing I/O and processing time. Additionally, scaling up the AWS Glue worker type (increasing DPUs) provides more compute and memory, enabling the jobs to process larger datasets faster. The other options do not directly address Glue job performance bottlenecks.

Question 67

A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real-time endpoint in Amazon SageMaker. The company wants to make real-time inventory recommendations. The company also wants to make predictions about future inventory needs. Which solutions will meet these requirements? (Choose two.)

A. Use Amazon Redshift ML to generate inventory recommendations.
B. Use SQL to invoke a remote SageMaker endpoint for prediction.
C. Use Amazon Redshift ML to schedule regular data exports for offline model training.
D. Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.
E. Use Amazon Redshift as a file storage system to archive old inventory management reports.
Show Answer
Correct Answer: A, B
Explanation:
Amazon Redshift ML allows Redshift to directly integrate with SageMaker models to generate predictions such as inventory recommendations using SQL, satisfying the real-time recommendation requirement. Additionally, Redshift can invoke a remote SageMaker real-time endpoint via SQL to perform predictions about future inventory needs. The other options do not address real-time prediction or are incorrect uses of the services.

Question 97

A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers. A data engineer notices that one of the fields in the source data includes values that are in JSON format. How should the data engineer load the JSON data into the data warehouse with the LEAST effort?

A. Use the SUPER data type to store the data in the Amazon Redshift table.
B. Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.
C. Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.
D. Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.
Show Answer
Correct Answer: A
Explanation:
Amazon Redshift supports the SUPER data type, which can store semi-structured data such as JSON directly. Using SUPER allows the data engineer to load the JSON field without flattening or preprocessing, minimizing development and operational effort. Other options introduce unnecessary transformation or querying outside Redshift.

Question 168

A company has a business intelligence platform on AWS. The company uses an AWS Storage Gateway Amazon S3 File Gateway to transfer files from the company's on-premises environment to an Amazon S3 bucket. A data engineer needs to setup a process that will automatically launch an AWS Glue workflow to run a series of AWS Glue jobs when each file transfer finishes successfully. Which solution will meet these requirements with the LEAST operational overhead?

A. Determine when the file transfers usually finish based on previous successful file transfers. Set up an Amazon EventBridge scheduled event to initiate the AWS Glue jobs at that time of day.
B. Set up an Amazon EventBridge event that initiates the AWS Glue workflow after every successful S3 File Gateway file transfer event.
C. Set up an on-demand AWS Glue workflow so that the data engineer can start the AWS Glue workflow when each file transfer is complete.
D. Set up an AWS Lambda function that will invoke the AWS Glue Workflow. Set up an event for the creation of an S3 object as a trigger for the Lambda function.
Show Answer
Correct Answer: B
Explanation:
The requirement is to automatically start an AWS Glue workflow immediately after each successful file transfer from an S3 File Gateway, with the least operational overhead. Amazon S3 can emit object creation events directly to Amazon EventBridge, and AWS Glue workflows can be started natively from EventBridge without custom code. This creates a fully managed, event-driven solution. Option D adds unnecessary operational overhead by introducing and maintaining a Lambda function. Options A and C are manual or time-based and do not reliably align with actual file transfer completion.

Question 203

A company receives call logs as Amazon S3 objects that contain sensitive customer information. The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access. Which solution will meet these requirements with the LEAST effort?

A. Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.
B. Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.
C. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.
D. Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.
Show Answer
Correct Answer: C
Explanation:
SSE-KMS encrypts S3 objects using AWS KMS keys while allowing fine-grained access control through IAM and KMS key policies. This lets only specific employees use the encryption keys without managing custom encryption code or infrastructure. It meets the security requirement with the least operational effort compared to CloudHSM or customer-managed key handling.

$63 $19

Get all 237 questions with detailed answers and explanations

DEA-C01 — Frequently Asked Questions

What is the Amazon DEA-C01 exam?

The Amazon DEA-C01 exam — AWS Certified Data Analytics - Specialty — is a professional IT certification exam offered by Amazon.

How many practice questions are included?

This study guide contains 237 practice questions, each with an expert-verified correct answer and a detailed explanation. Questions cover all exam domains and objectives.

Is there a free sample available?

Yes! We provide a free sample of 15 practice questions from the DEA-C01 exam right on this page. Scroll up to preview them and evaluate the quality of our materials before purchasing.

When was this DEA-C01 study guide last updated?

This study guide was last updated on 2026-02-16. We regularly refresh our materials to reflect the latest exam content and objectives so you're always studying current material.

What file formats do I receive?

After purchase you receive two files: an interactive HTML file with show/hide answer toggles (ideal for studying on screen) and a PDF file (ideal for printing or offline study). Both work on any device — desktop, tablet, or phone.

How much does the DEA-C01 study guide cost?

The Amazon DEA-C01 study guide costs $19 (discounted from $63). This is a one-time payment with no subscriptions or hidden fees.

How do I get my files after payment?

After successful payment via Stripe, you are immediately redirected to a download page with links to your HTML and PDF files. We also send the download links to your email address as a backup, so you'll always have access.

Why choose CheapestExamDumps over other providers?

CheapestExamDumps offers the lowest price at $19 per exam — competitors charge $50-$300 for similar content. All study materials are expert-verified, updated monthly, and include a free 15-question preview with no signup required. You get instant access to both HTML and PDF formats after payment.