You are working as a data science consultant for a gaming company. You have three member team and all other stake holders are from the company itself like project managers and project sponsored, data team etc. During the discussion project managed asked you that when can you tell me that the model you are using is robust enough, after which step you can consider answer for this question?
A. Data Preparation
B. Discovery
C. Operationalize
D. Model planning
E. Model building
RMSE is a useful metric for evaluating which types of models?
A. Logistic regression
B. Naive Bayes classifier
C. Linear regression
D. All of the above
Suppose you have been given two Random Variables X and Y, whose joint distribution is already known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known. So how do you calculate the marginal distribution of X
A. This is typically calculated by summing the joint probability distribution over Y.
B. This is typically calculated by integrating the joint probability distribution over Y
C. This is typically calculated by summing (In case of discrete variable) the joint probability distribution over Y
D. This is typically calculated by integrating(ln case of continuous variable) the joint probability distribution over Y.
Which of the following could be features?
A. Words in the document
B. Symptoms of a diseases
C. Characteristics of an unidentified object
D. 0nly 1 and 2
E. All 1,2 and 3 are possible
Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?
A. The data is unformatted.
B. There is not enough data to create a test set.
C. There are missing values in the data.
D. There are categorical variables in the model.
Which of the below best describe the Principal component analysis
A. Dimensionality reduction
B. Collaborative filtering
C. Classification
D. Regression
E. Clustering
Assume some output variable "y" is a linear combination of some independent input variables "A" plus some independent noise "e". The way the independent variables are combined is defined by a parameter vector B y=AB+e where X is an m x n matrix. B is a vector of n unknowns, and b is a vector of m values. Assuming that m is not equal to n and the columns of X are linearly independent, which expression correctly solves for B?
A. Option A
B. Option B
C. Option C
D. Option D
Select the correct statement which applies to Supervised learning
A. We asks the machine to learn from our data when we specify a target variable.
B. Lesser machine's task to only divining some pattern from the input data to get the target variable
C. Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?
A problem statement is given as below
Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?
Which of the following model will you use to solve it.
A. Binomial
B. Poisson
C. Normal
D. Any of the above
Feature Hashing approach is "SGD-based classifiers avoid the need to predetermine vector size by simply picking a reasonable size and shoehorning the training data into vectors of that size" now with large vectors or with multiple locations per feature in Feature hashing?
A. Is a problem with accuracy
B. It is hard to understand what classifier is doing
C. It is easy to understand what classifier is doing
D. Is a problem with accuracy as well as hard to understand what classifier us doing