Which statement is TRUE concerning optimizing the load performance?
A. You can improve the performance by increasing the number of map tasks assigned to the load
B. When loading large files the number of files that you load does not impact the performance of the LOAD HADOOP statement
C. You can improve the performance by decreasing the number of map tasks that are assigned to the load and adjusting the heap size
D. It is advantageous to run the LOAD HADOOP statement directly pointing to large files located in the host file system as opposed to copying the files to the DFS prior to load
Which of the following statements are TRUE regarding the use of Data Click to load data into BigInsights? (Choose two.)
A. Big SQL cannot be used to access the data moved in by Data Click because the data is in Hive
B. You must import metadata for all sources and targets that you want to make available for Data Click activities
C. Connections from the relational database source to HDFS are discovered automatically from within Data Click
D. Hive tables are automatically created every time you run an activity that moves data from a relational database into HDFS
E. HBase tables are automatically created every time you ran an activity that moves data from a relational database into HDFS
Which of the following statements regarding importing streaming data from InfoSphere Streams into Hadoop is TRUE?
A. InfoSphere Streams can both read from and write data to HDFS
B. The Streams Big Data toolkit operators that interface with HDFS uses Apache Flume to integrate with Hadoop
C. Streams applications never need to be concerned with making the data schemas consistent with those on Hadoop
D. Big SQL can be used to preprocess the data as it flows through InfoSphere Streams before the data lands in HDFS
Which of the following techniques is NOT employed by Big SQL to improve performance?
A. Query Optimization
B. Predicate Push down
C. Compression efficiency
D. Load data into DB2 and return the data
Which format would be best for holding semi-structured data?
A. Text
B. Avro
C. CSV
D. JSON
Which of the following statement is TRUE with BigSheets?
A. You can create any type of sheet from a parent workbook
B. You must create a child workbook in order to create a chart
C. You can delete a parent workbook without deleting the child workbooks
D. You must run the workbook on the data to get the full results of the analysis
Which of the following statements regarding importing streaming data from InfoSphere Streams into Hadoop is TRUE?
A. InfoSphere Streams can only write to HDFS not read from HDFS
B. InfoSphere Streams can only write directly to BigInsights, not other Hadoop distributions like Hortonworks or Cloudera
C. A Streams developer needs to account for the fact that BigInsights may not be able to absorb the incoming streams at the rate InfoSphere Streams is sending them
D. Adding a Big Data toolkit operator (for writing to Hadoop) to an InfoSphere Streams Processing Language (SPL) application requires that the SPL application be recompiled
Which is a benefit of row oriented table design?
A. When writing a new row, if all of the row data is supplied at the same time the entire row can be written with a single disk seek
B. When columns of a single row are required at the same time, the entire row can be retrieved with a single disk seek regardless of row size
C. When new values of a column are supplied for all rows at once, that column data can be written efficiently and replace old column data without touching any other columns for the rows
D. When an aggregate needs to be computed over many rows but only a notably smaller subset of all columns of data, reading that smaller subset of data can be faster than reading all data
Which of the following must happen before the Big SQL EXPLAIN command can execute?
A. Run the ANALYZE command
B. Set the COMPATIBILITY_MODE global variable
C. Execute the SET HADOOP PROPERTY command
D. Call the SYSPROC.SYSINSTALLOBJECTS procedure
Which component of BigInsights is able to mask data items so as restrict viewing of sensitive data?
A. Flume
B. HDFS
C. Oozie
D. Big SQL