Which Spark configuration controls the number of tasks that can run in parallel on the executor?
Options:
40 of 55.
A developer wants to refactor older Spark code to take advantage of built-in functions introduced in Spark 3.5.
The original code:
from pyspark.sql import functions as F
min_price = 110.50
result_df = prices_df.filter(F.col("price") > min_price).agg(F.count("*"))
Which code block should the developer use to refactor the code?
49 of 55.
In the code block below, aggDF contains aggregations on a streaming DataFrame:
aggDF.writeStream \
.format("console") \
.outputMode("???") \
.start()
Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?
54 of 55.
What is the benefit of Adaptive Query Execution (AQE)?
55 of 55.
An application architect has been investigating Spark Connect as a way to modernize existing Spark applications running in their organization.
Which requirement blocks the adoption of Spark Connect in this organization?
What is a feature of Spark Connect?
A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:
Low number of Active Tasks
Many tasks complete in milliseconds
Fewer tasks than available CPUs
Which approach should be used to adjust the partitioning for optimal resource allocation?
45 of 55.
Which feature of Spark Connect should be considered when designing an application that plans to enable remote interaction with a Spark cluster?
5 of 55.
What is the relationship between jobs, stages, and tasks during execution in Apache Spark?
17 of 55.
A data engineer has noticed that upgrading the Spark version in their applications from Spark 3.0 to Spark 3.5 has improved the runtime of some scheduled Spark applications.
Looking further, the data engineer realizes that Adaptive Query Execution (AQE) is now enabled.
Which operation should AQE be implementing to automatically improve the Spark application performance?