Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.
A. Yes.
B. Yes, but only if one of the tables fits into memory
C. Yes, so long as both tables fit into memory.
D. No, MapReduce cannot perform relational operations.
E. No, but it can be done with either Pig or Hive.
You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper's map method?
A. Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk.
B. Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDFS.
C. Into in-memory buffers that spill over to the local file system of the TaskTracker node running the Mapper.
D. Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer
E. Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDFS.
MapReduce v2 (MRv2/YARN) is designed to address which two issues?
A. Single point of failure in the NameNode.
B. Resource pressure on the JobTracker.
C. HDFS latency.
D. Ability to run frameworks other than MapReduce, such as MPI.
E. Reduce complexity of the MapReduce APIs.
F. Standardize on a single MapReduce API.
What data does a Reducer reduce method process?
A. All the data in a single input file.
B. All data produced by a single mapper.
C. All data for a given key, regardless of which mapper(s) produced it.
D. All data for a given value, regardless of which mapper(s) produced it.
In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
A. Increase the parameter that controls minimum split size in the job configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplitable to always return false.
You have written a Mapper which invokes the following five calls to the OutputColletor.collect method:
output.collect (new Text ("Apple"), new Text ("Red") ) ;
output.collect (new Text ("Banana"), new Text ("Yellow") ) ; output.collect (new Text ("Apple"), new Text
("Yellow") ) ; output.collect (new Text ("Cherry"), new Text ("Red") ) ;
output.collect (new Text ("Apple"), new Text ("Green") ) ;
How many times will the Reducer's reduce method be invoked?
A. 6
B. 3
C. 1
D. 0
E. 5
Which project gives you a distributed, Scalable, data store that allows you random, realtime read/write access to hundreds of terabytes of data?
A. HBase
B. Hue
C. Pig
D. Hive
E. Oozie
F. Flume
G. Sqoop
Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and generate Java classes to interact with that imported data?
A. Oozie
B. Flume
C. Pig
D. Hue
E. Hive
F. Sqoop
G. fuse-dfs
You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?
A. Four, all files will be processed
B. Three, the pound sign is an invalid character for HDFS file names
C. Two, file names with a leading period or underscore are ignored
D. None, the directory cannot be named jobdata
E. One, no special characters can prefix the name of an input file
When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?
A. When the types of the reduce operation's input key and input value match the types of the reducer's output key and output value and when the reduce operation is both communicative and associative.
B. When the signature of the reduce method matches the signature of the combine method.
C. Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.
D. Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.
E. Never. Combiners and reducers must be implemented separately because they serve different purposes.