Pre-Summer Special Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

Page: 1 / 9
Total 86 questions
Exam Code: Data-Engineer-Associate                Update: Apr 19, 2026
Exam Name: AWS Certified Data Engineer - Associate (DEA-C01)

Amazon Web Services AWS Certified Data Engineer - Associate (DEA-C01) Data-Engineer-Associate Exam Dumps: Updated Questions & Answers (April 2026)

Question # 1

A company has an on-premises PostgreSQL database that contains customer data. The company wants to migrate the customer data to an Amazon Redshift data warehouse. The company has established a VPN connection between the on-premises database and AWS.

The on-premises database is continuously updated. The company must ensure that the data in Amazon Redshift is updated as quickly as possible.

Which solution will meet these requirements?

A.

Use the pg_dump utility to generate a backup of the PostgreSQL database. Use the AWS Schema Conversion Tool (AWS SCT) to upload the backup to Amazon Redshift. Set up a cron job to perform a backup. Upload the backup to Amazon Redshift every night.

B.

Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to use the change data capture (CDC) feature.

C.

Use the pg_dump utility to generate a backup of the PostgreSQL database. Upload the backup to an Amazon S3 bucket. Use the COPY command to import the data into Amazon Redshift.

D.

Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to perform a full load of the database to Amazon Redshift every night.

Question # 2

A data engineer develops an AWS Glue Apache Spark ETL job to perform transformations on a dataset. When the data engineer runs the job, the job returns an error that reads, " No space left on device. "

The data engineer needs to identify the source of the error and provide a solution.

Which combinations of steps will meet this requirement MOST cost-effectively? (Select TWO.)

A.

Scale out the workers vertically to address data skewness.

B.

Use the Spark UI and AWS Glue metrics to monitor data skew in the Spark executors.

C.

Scale out the number of workers horizontally to address data skewness.

D.

Enable the --write-shuffle-files-to-s3 job parameter. Use the salting technique.

E.

Use error logs in Amazon CloudWatch to monitor data skew.

Question # 3

A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable information (PII) about customers within the data. The company ' s privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.

The data engineer needs to modify the current process to scan for the custom PII categories across multiple datasets within the data lake.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Manually review the data for custom PII categories.

B.

Implement custom data quality rules in Data Brew. Apply the custom rules across datasets.

C.

Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew.

D.

Implement regex patterns to extract PII information from fields during extract transform, and load (ETL) operations into the data lake.

Question # 4

A media company uses software as a service (SaaS) applications to gather data by using third-party tools. The company needs to store the data in an Amazon S3 bucket. The company will use Amazon Redshift to perform analytics based on the data.

Which AWS service or feature will meet these requirements with the LEAST operational overhead?

A.

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

B.

Amazon AppFlow

C.

AWS Glue Data Catalog

D.

Amazon Kinesis

Question # 5

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.

The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.

Which AWS service should the company use to meet these requirements?

A.

AWS Lambda

B.

AWS Database Migration Service (AWS DMS)

C.

AWS Direct Connect

D.

AWS DataSync

Question # 6

A data engineer at a large company needs to create centralized datasets that are optimized for Amazon Redshift performance. The company has multiple downstream teams that use their own AWS accounts and dedicated Amazon Redshift clusters with RA3 nodes. All downstream teams need access to the centralized datasets.

Which solution will provide immediate access to the datasets and maintain the current Amazon Redshift performance?

A.

Copy the datasets to an Amazon S3 bucket by using the UNLOAD command. Register the table definitions in a dedicated AWS Glue Data Catalog schema. Share the schema with the other AWS accounts by using AWS Lake Formation. Use Amazon Redshift Spectrum to access the data.

B.

Create a daily extract, transform, and load (ETL) job to unload the data to an Amazon S3 staging area. Instruct the teams to copy the data into their Amazon Redshift clusters.

C.

Set up Amazon Redshift data sharing between the Amazon Redshift producer clusters and the consumer clusters to provide access to the centralized datasets.

D.

Set up an AWS DataSync job that automatically syncs the data between the Amazon Redshift producer clusters and the consumer clusters.

Question # 7

A company is building a data lake for a new analytics team. The company is using Amazon S3 for storage and Amazon Athena for query analysis. All data that is in Amazon S3 is in Apache Parquet format.

The company is running a new Oracle database as a source system in the company ' s data center. The company has 70 tables in the Oracle database. All the tables have primary keys. Data can occasionally change in the source system. The company wants to ingest the tables every day into the data lake.

Which solution will meet this requirement with the LEAST effort?

A.

Create an Apache Sqoop job in Amazon EMR to read the data from the Oracle database. Configure the Sqoop job to write the data to Amazon S3 in Parquet format.

B.

Create an AWS Glue connection to the Oracle database. Create an AWS Glue bookmark job to ingest the data incrementally and to write the data to Amazon S3 in Parquet format.

C.

Create an AWS Database Migration Service (AWS DMS) task for ongoing replication. Set the Oracle database as the source. Set Amazon S3 as the target. Configure the task to write the data in Parquet format.

D.

Create an Oracle database in Amazon RDS. Use AWS Database Migration Service (AWS DMS) to migrate the on-premises Oracle database to Amazon RDS. Configure triggers on the tables to invoke AWS Lambda functions to write changed records to Amazon S3 in Parquet format.

Question # 8

A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.

The data engineer ' s original query is as follows:

SELECT product_name, sum(sales_amount)

FROM sales_data

WHERE year = 2023

GROUP BY product_name

How should the data engineer modify the Athena query to meet these requirements?

A.

Replace sum(sales amount) with count(*J for the aggregation.

B.

Change WHERE year = 2023 to WHERE extractlyear FROM sales data) = 2023.

C.

Add HAVING sumfsales amount) > 0 after the GROUP BY clause.

D.

Remove the GROUP BY clause

Question # 9

A data engineer needs to deploy a complex pipeline. The stages of the pipeline must run scripts, but only fully managed and serverless services can be used.

A.

Deploy AWS Glue jobs and workflows. Use AWS Glue to run the jobs and workflows on a schedule.

B.

Use Amazon MWAA to build and schedule the pipeline.

C.

Deploy the script to EC2. Use EventBridge to schedule it.

D.

Use AWS Glue DataBrew and EventBridge to run on a schedule.

Question # 10

A data engineer needs to maintain a central metadata repository that users access through Amazon EMR and Amazon Athena queries. The repository needs to provide the schema and properties of many tables. Some of the metadata is stored in Apache Hive. The data engineer needs to import the metadata from Hive into the central metadata repository.

Which solution will meet these requirements with the LEAST development effort?

A.

Use Amazon EMR and Apache Ranger.

B.

Use a Hive metastore on an EMR cluster.

C.

Use the AWS Glue Data Catalog.

D.

Use a metastore on an Amazon RDS for MySQL DB instance.

Page: 1 / 9
Total 86 questions

Most Popular Certification Exams

Payment

       

Contact us

Site Secure

mcafee secure

TESTED 19 Apr 2026