Month End Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

Page: 1 / 6
Total 54 questions
Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0                Update: May 15, 2025
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps: Updated Questions & Answers (May 2025)

Question # 1

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

A.

1. select

2. "storeId"

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. "storeId"

3. printSchema()

D.

1. limit

2. "storeId"

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Question # 2

The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.

Code block:

transactionsDf.agg("storeId").avg("value")

A.

Instead of avg("value"), avg(col("value")) should be used.

B.

The avg("value") should be specified as a second argument to agg() instead of being appended to it.

C.

All column names should be wrapped in col() operators.

D.

agg should be replaced by groupBy.

E.

"storeId" and "value" should be swapped.

Question # 3

Which is the highest level in Spark's execution hierarchy?

A.

Task

B.

Executor

C.

Slot

D.

Job

E.

Stage

Question # 4

Which of the following code blocks returns a single-column DataFrame of all entries in Python list throughputRates which contains only float-type values ?

A.

spark.createDataFrame((throughputRates), FloatType)

B.

spark.createDataFrame(throughputRates, FloatType)

C.

spark.DataFrame(throughputRates, FloatType)

D.

spark.createDataFrame(throughputRates)

E.

spark.createDataFrame(throughputRates, FloatType())

Question # 5

The code block shown below should write DataFrame transactionsDf to disk at path csvPath as a single CSV file, using tabs (\t characters) as separators between columns, expressing missing

values as string n/a, and omitting a header row with column names. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.write.__2__(__3__, " ").__4__.__5__(csvPath)

A.

1. coalesce(1)

2. option

3. "sep"

4. option("header", True)

5. path

B.

1. coalesce(1)

2. option

3. "colsep"

4. option("nullValue", "n/a")

5. path

C.

1. repartition(1)

2. option

3. "sep"

4. option("nullValue", "n/a")

5. csv

(Correct)

D.

1. csv

2. option

3. "sep"

4. option("emptyValue", "n/a")

5. path

1. repartition(1)

2. mode

3. "sep"

4. mode("nullValue", "n/a")

5. csv

Question # 6

Which of the following describes Spark's way of managing memory?

A.

Spark uses a subset of the reserved system memory.

B.

Storage memory is used for caching partitions derived from DataFrames.

C.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

D.

Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

E.

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

Question # 7

Which of the following statements about storage levels is incorrect?

A.

The cache operator on DataFrames is evaluated like a transformation.

B.

In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

C.

Caching can be undone using the DataFrame.unpersist() operator.

D.

MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

E.

DISK_ONLY will not use the worker node's memory.

Question # 8

Which of the following statements about reducing out-of-memory errors is incorrect?

A.

Concatenating multiple string columns into a single column may guard against out-of-memory errors.

B.

Reducing partition size can help against out-of-memory errors.

C.

Limiting the amount of data being automatically broadcast in joins can help against out-of-memory errors.

D.

Setting a limit on the maximum size of serialized data returned to the driver may help prevent out-of-memory errors.

E.

Decreasing the number of cores available to each executor can help against out-of-memory errors.

Question # 9

Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?

A.

transactionsDf.count("productId").distinct()

B.

transactionsDf.groupBy("productId").agg(col("value").count())

C.

transactionsDf.count("productId")

D.

transactionsDf.groupBy("productId").count()

E.

transactionsDf.groupBy("productId").select(count("value"))

Question # 10

Which of the following code blocks reads all CSV files in directory filePath into a single DataFrame, with column names defined in the CSV file headers?

Content of directory filePath:

1._SUCCESS

2._committed_2754546451699747124

3._started_2754546451699747124

4.part-00000-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-298-1-c000.csv.gz

5.part-00001-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-299-1-c000.csv.gz

6.part-00002-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-300-1-c000.csv.gz

7.part-00003-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-301-1-c000.csv.gz

spark.option("header",True).csv(filePath)

A.

spark.read.format("csv").option("header",True).option("compression","zip").load(filePath)

B.

spark.read().option("header",True).load(filePath)

C.

spark.read.format("csv").option("header",True).load(filePath)

D.

spark.read.load(filePath)

Page: 1 / 6
Total 54 questions

Most Popular Certification Exams

Payment

       

Contact us

dumpscollection live chat

Site Secure

mcafee secure

TESTED 22 May 2025