You've written a MapReduce job that will process 500 million input records and generated 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reduces which is a potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network?
A. Partitioner
B. OutputFormat
C. WritableComparable
D. Writable
E. InputFormat
F. Combiner
Which HDFS command copies an HDFS file named foo to the local filesystem as localFoo?
A. hadoop fs -get foo LocalFoo
B. hadoop -cp foo LocalFoo
C. hadoop fs -Is foo
D. hadoop fs -put foo LocalFoo
You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?
A. Four, all files will be processed
B. Three, the pound sign is an invalid character for HDFS file names
C. Two, file names with a leading period or underscore are ignored
D. None, the directory cannot be named jobdata
E. One, no special characters can prefix the name of an input file
Which YARN component is responsible for monitoring the success or failure of a Container?
A. ResourceManager
B. ApplicationMaster
C. NodeManager
D. JobTracker
Which one of the following statements describes the relationship between the ResourceManager and the ApplicationMaster?
A. The ApplicationMaster requests resources from the ResourceManager
B. The ApplicationMaster starts a single instance of the ResourceManager
C. The ResourceManager monitors and restarts any failed Containers of the ApplicationMaster
D. The ApplicationMaster starts an instance of the ResourceManager within each Container
You want to populate an associative array in order to perform a map-side join. You've decided to put this information in a text file, place that file into the DistributedCache and read it in your Mapper before any records are processed.
Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array?
A. combine
B. map
C. init
D. configure
Assuming the following Hive query executes successfully:
Which one of the following statements describes the result set?
A. A bigram of the top 80 sentences that contain the substring "you are" in the lines column of the input data A1 table.
B. An 80-value ngram of sentences that contain the words "you" or "are" in the lines column of the inputdata table.
C. A trigram of the top 80 sentences that contain "you are" followed by a null space in the lines column of the inputdata table.
D. A frequency distribution of the top 80 words that follow the subsequence "you are" in the lines column of the inputdata table.
Indentify which best defines a SequenceFile?
A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects
B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.
Examine the following Pig commands: Which one of the following statements is true?
A. The SAMPLE command generates an "unexpected symbol" error
B. Each MapReduce task will terminate after executing for 0.2 minutes
C. The reducers will only output the first 20% of the data passed from the mappers
D. A random sample of approximately 20% of the data will be output
How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?
A. Keys are presented to reducer in sorted order; values for a given key are not sorted.
B. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.
C. Keys are presented to a reducer in random order; values for a given key are not sorted.
D. Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.