Pre-Summer Special Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

Page: 1 / 6
Total 60 questions
Exam Code: Databricks-Certified-Professional-Data-Engineer                Update: May 6, 2026
Exam Name: Databricks Certified Data Engineer Professional Exam

Databricks Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Exam Dumps: Updated Questions & Answers (May 2026)

Question # 1

A junior data engineer has manually configured a series of jobs using the Databricks Jobs UI. Upon reviewing their work, the engineer realizes that they are listed as the " Owner " for each job. They attempt to transfer " Owner " privileges to the " DevOps " group, but cannot successfully accomplish this task.

Which statement explains what is preventing this privilege transfer?

A.

Databricks jobs must have exactly one owner; " Owner " privileges cannot be assigned to a group.

B.

The creator of a Databricks job will always have " Owner " privileges; this configuration cannot be changed.

C.

Other than the default " admins " group, only individual users can be granted privileges on jobs.

D.

A user can only transfer job ownership to a group if they are also a member of that group.

E.

Only workspace administrators can grant " Owner " privileges to a group.

Question # 2

A data engineer is implementing Unity Catalog governance for a multi-team environment. Data scientists need interactive clusters for basic data exploration tasks, while automated ETL jobs require dedicated processing.

How should the data engineer configure cluster isolation policies to enforce least privilege and ensure Unity Catalog compliance?

A.

Use only DEDICATED access mode for both interactive workloads and automated jobs to maximize security isolation.

B.

Allow all users to create any cluster type and rely on manual configuration to enable Unity Catalog access modes.

C.

Configure all clusters with NO ISOLATION_SHARED access mode since Unity Catalog works with any cluster configuration.

D.

Create compute policies with STANDARD access mode for interactive workloads and DEDICATED access mode for automated jobs.

Question # 3

An analytics team wants to run a short-term experiment in Databricks SQL on the customer transactions Delta table (about 20 billion records) created by the data engineering team. Which strategy should the data engineering team use to ensure minimal downtime and no impact on the ongoing ETL processes?

A.

Create a new table for the analytics team using a CTAS statement.

B.

Deep clone the table for the analytics team.

C.

Give the analytics team direct access to the production table.

D.

Shallow clone the table for the analytics team.

Question # 4

A task orchestrator has been configured to run two hourly tasks. First, an outside system writes Parquet data to a directory mounted at /mnt/raw_orders/. After this data is written, a Databricks job containing the following code is executed:

(spark.readStream

.format( " parquet " )

.load( " /mnt/raw_orders/ " )

.withWatermark( " time " , " 2 hours " )

.dropDuplicates([ " customer_id " , " order_id " ])

.writeStream

.trigger(once=True)

.table( " orders " )

)

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order, and that the time field indicates when the record was queued in the source system. If the upstream system is known to occasionally enqueue duplicate entries for a single order hours apart, which statement is correct?

A.

The orders table will not contain duplicates, but records arriving more than 2 hours late will be ignored and missing from the table.

B.

The orders table will contain only the most recent 2 hours of records and no duplicates will be present.

C.

All records will be held in the state store for 2 hours before being deduplicated and committed to the orders table.

D.

Duplicate records enqueued more than 2 hours apart may be retained and the orders table may contain duplicate records with the same customer_id and order_id.

Question # 5

Given the following PySpark code snippet in a Databricks notebook:

filtered_df = spark.read.format( " delta " ).load( " /mnt/data/large_table " ) \

.filter( " event_date > ' 2024-01-01 ' " )

filtered_df.count()

The data engineer notices from the Query Profiler that the scan operator for filtered_df is reading almost all files, despite the filter being applied.

What is the probable reason for poor data skipping?

A.

The Delta table lacks optimization that enables dynamic file pruning.

B.

The filter is executed only after the full data scan, preventing data skipping.

C.

The event_date column is outside the table’s partitioning and Z-ordering scheme.

D.

The filter condition involves a data type excluded from data skipping support.

Question # 6

A data engineer is masking a column containing email addresses. The goal is to produce output strings of identical length for all rows, while generating different outputs for different email values .

Which SQL function should be used to achieve this?

A.

mask(email, ' ? ' )

B.

hash(email)

C.

sha1(email)

D.

sha2(email, 0)

Question # 7

The business reporting tem requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts transforms and load the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

A.

Schedule a jo to execute the pipeline once and hour on a dedicated interactive cluster.

B.

Schedule a Structured Streaming job with a trigger interval of 60 minutes.

C.

Schedule a job to execute the pipeline once hour on a new job cluster.

D.

Configure a job that executes every time new data lands in a given directory.

Question # 8

A data engineer is performing a join operating to combine values from a static userlookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

A.

userLookup.join(streamingDF, [ " userid " ], how= " inner " )

B.

streamingDF.join(userLookup, [ " user_id " ], how= " outer " )

C.

streamingDF.join(userLookup, [ " user_id”], how= " left " )

D.

streamingDF.join(userLookup, [ " userid " ], how= " inner " )

E.

userLookup.join(streamingDF, [ " user_id " ], how= " right " )

Question # 9

A data organization has adopted Delta Sharing to securely distribute curated datasets from a Unity Catalog-enabled workspace . The data engineering team shares large Delta tables internally via Databricks-to-Databricks and externally via Open Sharing for aggregated reports. While testing, they encounter challenges related to access control, data update visibility, and shareable object types.

What is a limitation of the Delta Sharing protocol or implementation when used with Databricks-to-Databricks or Open Sharing?

A.

With Open Sharing, recipients cannot access Volumes, Models, or notebooks — only static Delta tables are supported.

B.

Delta Sharing does not support Unity Catalog–enabled tables; only legacy Hive Metastore tables are shareable.

C.

With Databricks-to-Databricks sharing, Unity Catalog recipients must re-ingest data manually using COPY INTO or REST APIs.

D.

Delta Sharing (both Databricks-to-Databricks and Open Sharing) allows recipients to modify the source data if they have select privileges.

Question # 10

An upstream system is emitting change data capture (CDC) logs that are being written to a cloud object storage directory. Each record in the log indicates the change type (insert, update, or delete) and the values for each field after the change. The source table has a primary key identified by the field pk_id .

For auditing purposes, the data governance team wishes to maintain a full record of all values that have ever been valid in the source system. For analytical purposes, only the most recent value for each record needs to be recorded. The Databricks job to ingest these records occurs once per hour, but each individual record may have changed multiple times over the course of an hour.

Which solution meets these requirements?

A.

Create a separate history table for each pk_id resolve the current state of the table by running a union all filtering the history tables for the most recent state.

B.

Use merge into to insert, update, or delete the most recent entry for each pk_id into a bronze table, then propagate all changes throughout the system.

C.

Iterate through an ordered set of changes to the table, applying each in turn; rely on Delta Lake ' s versioning ability to create an audit log.

D.

Use Delta Lake ' s change data feed to automatically process CDC data from an external system, propagating all changes to all dependent tables in the Lakehouse.

E.

Ingest all log information into a bronze table; use merge into to insert, update, or delete the most recent entry for each pk_id into a silver table to recreate the current table state.

Page: 1 / 6
Total 60 questions

Most Popular Certification Exams

Payment

       

Contact us

Site Secure

mcafee secure

TESTED 06 May 2026