Pass4itsure > EMC > Data Scientist > E20-007 > E20-007 Online Practice Questions and Answers

E20-007 Online Practice Questions and Answers

Questions 4

You have been assigned to run a Logistic Regression model for 100 countries each. All data is currently stored in a PostgreSQL database.

Which tool/library should be used to produce these models with the least effort?

A. MADlib

B. Mahout

C. RStudio

D. HBase

Buy Now
Questions 5

Refer to the exhibit.

You have run a linear regression model against your data, and have plotted true outcome versus predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model?

A. The R-squared may be biased upwards by the extreme-valued outcomes. Remove them and refit to get a better idea of the model's quality over typical data.

B. The R-squared is good. The model should perform well.

C. The extreme-valued outliers may negatively affect the model's performance. Remove them to see if the R-squared improves over typical data.

D. The observations seem to come from two different populations, but this model fits them both equally well.

Buy Now
Questions 6

Refer to the exhibit Consider the training data set shown in the exhibit. What are the classification (Y = 0 or 1) and the probability of the classification for the tupleX(0, 0, 1) using Naive Bayesian classifier?

A. Classification Y = 1, Probability = 4/54

B. Classification Y = 0, Probability = 1/54

C. Classification Y = 1, Probability = 1/54

D. Classification Y = 0, Probability = 4/54

Buy Now
Questions 7

What is the primary bottleneck in text classification?

A. The availablilty of tagged training data.

B. The ability to parse unstructured text data.

C. The high dimensionality of text data.

D. The fact that text corpora are dynamic.

Buy Now
Questions 8

Which ROC curve represents a perfect model fit?

A.

B.

C.

D.

A. Exhibit A

B. Exhibit B

C. Exhibit C

D. Exhibit D

Buy Now
Questions 9

You have fit a decision tree classifier using 12 input variables. The resulting tree used 7 of the 12 variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the model is

0.85. What is your evaluation of this model?

A. The tree is probably overfit. Try fitting shallower trees and using an ensemble method.

B. The AUC is high, and the small nodes are all very pure. This is an accurate model.

C. The tree did not split on all the input variables. You need a larger data set to get a more accurate model.

D. The AUC is high, so the overall model is accurate. It is not well-calibrated, because the small nodes will give poor estimates of probability.

Buy Now
Questions 10

You are analyzing a time series and want to determine its stationarity. You also want to determine the order of autoregressive models.

How are the autocorrelation functions used?

A. ACF as an indication of stationarity, and PACF for the correlation between Xt and Xt-k not explained by their mutual correlation with X1 through Xk-1.

B. PACF as an indication of stationarity, and ACF for the correlation between Xt and Xt-k not explained by their mutual correlation with X1 through Xk-1.

C. ACF as an indication of stationarity, and PACF to determine the correlation of X1 through Xk-1.

D. PACF as an indication of stationarity, and ACF to determine the correlation of X1 through Xk-1.

Buy Now
Questions 11

Consider a database with 4 transactions:

Transaction 1: {cheese, bread, milk}

Transaction 2: {soda, bread, milk}

Transaction 3: {cheese, bread}

Transaction 4: {cheese, soda, juice}

The minimum support is 25%. Which rule has a confidence equal to 50%?

A. {bread, milk} => {cheese}

B. {bread} => {milk}

C. {juice} => {soda}

D. {bread} => {cheese}

Buy Now
Questions 12

Which data type value is used for the observed response variable in a logistic regression model?

A. Any positive real number

B. Any integer

C. A binary value

D. Any real number

Buy Now
Questions 13

When would you prefer a Naive Bayes model to a logistic regression model for classification?

A. When you are using several categorical input variables with over 1000 possible values each.

B. When you need to estimate the probability of an outcome, not just which class it is in.

C. When all the input variables are numerical.

D. When some of the input variables might be correlated.

Buy Now
Exam Code: E20-007
Exam Name: Data Science and Big Data Analytics
Last Update: Apr 18, 2024
Questions: 198
10%OFF Coupon Code: SAVE10

PDF (Q&A)

$45.99

VCE

$49.99

PDF + VCE

$59.99