Pass4itsure > Databricks > Databricks Certification > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

Questions 4

The code block displayed below contains multiple errors. The code block should return a DataFrame that

contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f| 3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

A. The column names should be listed directly as arguments to the operator and not as a list.

B. The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.

C. The select operator should be replaced by a drop operator.

D. The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.

E. The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Buy Now
Questions 5

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

A. transactionsDf.groupBy(col(storeId).avg())

B. transactionsDf.groupBy("storeId").avg(col("value"))

C. transactionsDf.groupBy("storeId").agg(avg("value"))

D. transactionsDf.groupBy("storeId").agg(average("value"))

E. transactionsDf.groupBy("value").average()

Buy Now
Questions 6

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

A. itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

B. 1.itemsDf.withColumnRenamed("attributes", "feature0") 2.itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Buy Now
Questions 7

Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?

A. itemsDf.cache().count()

B. itemsDf.cache(eager=True)

C. cache(itemsDf)

D. itemsDf.cache().filter()

E. itemsDf.rdd.storeCopy()

Buy Now
Questions 8

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

A. DataFrame.repartition(12)

B. DataFrame.coalesce(6).shuffle()

C. DataFrame.coalesce(6)

D. DataFrame.coalesce(6, shuffle=True)

E. DataFrame.repartition(6)

Buy Now
Questions 9

Which of the following is one of the big performance advantages that Spark has over Hadoop?

A. Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.

B. Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.

C. Spark achieves great performance by storing data and performing computation in memory, whereas large jobs in Hadoop require a large amount of relatively slow disk I/O operations.

D. Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.

E. Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user-friendly API.

Buy Now
Questions 10

Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

A. transactionsDf.sort(asc_nulls_last("predError"))

B. transactionsDf.orderBy("predError").desc_nulls_last()

C. transactionsDf.sort("predError", ascending=False)

D. transactionsDf.desc_nulls_last("predError")

E. transactionsDf.orderBy("predError").asc_nulls_last()

Buy Now
Questions 11

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A. 1. filter

2.

array_contains("cozy")

3.

select

4.

"itemId"

5.

explode

6.

"attributes"

B. 1. where

2.

"array_contains(attributes, 'cozy')"

3.

select

4.

itemId

5.

explode

6.

attributes

C. 1. filter

2.

"array_contains(attributes, 'cozy')"

3.

select

4.

"itemId"

5.

map

6.

"attributes"

D. 1. filter

2.

"array_contains(attributes, cozy)"

3.

select

4.

"itemId"

5.

explode

6.

"attributes"

E. 1. filter

2.

"array_contains(attributes, 'cozy')"

3.

select

4.

"itemId"

5.

explode

6.

"attributes"

Buy Now
Questions 12

Which of the following statements about data skew is incorrect?

A. Spark will not automatically optimize skew joins by default.

B. Broadcast joins are a viable way to increase join performance for skewed data over sort- merge joins.

C. In skewed DataFrames, the largest and the smallest partition consume very different amounts of memory.

D. To mitigate skew, Spark automatically disregards null values in keys when joining.

E. Salting can resolve data skew.

Buy Now
Questions 13

The code block shown below should return a DataFrame with all columns of DataFrame transactionsDf, but only maximum 2 rows in which column productId has at least the value 2. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__).__3__

A. 1. where

2.

"productId" > 2

3.

max(2)

B. 1. where

2.

transactionsDf[productId] >= 2

3.

limit(2)

C. 1. filter

2.

productId > 2

3.

max(2)

D. 1. filter

2.

col("productId") >= 2

3.

limit(2)

E. 1. where

2.

productId >= 2

3.

limit(2)

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0
Last Update: Apr 19, 2024
Questions: 180
10%OFF Coupon Code: SAVE10

PDF (Q&A)

$45.99

VCE

$49.99

PDF + VCE

$59.99