Problem Scenario 17 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish below assignment.
1.
Create a table in hive as below, create table departments_hiveOl(department_id int,
department_name string, avg_salary int);
2.
Create another table in mysql using below statement CREATE TABLE IF NOT EXISTS
departments_hive01(id int, department_name varchar(45), avg_salary int);
3.
Copy all the data from departments table to departments_hive01 using insert into
departments_hive01 select a.*, null from departments a;
Also insert following records as below
insert into departments_hive01 values(777, "Not known",1000);
insert into departments_hive01 values(8888, null,1000);
insert into departments_hive01 values(666, null,1100);
4.
Now import data from mysql table departments_hive01 to this hive table. Please make
sure that data should be visible using below hive command. Also, while importing if null
value found for department_name column replace it with "" (empty string) and for id column
with -999 select * from departments_hive;
Problem Scenario 45 : You have been given 2 files , with the content as given Below (spark12/technology.txt) (spark12/salary.txt) (spark12/technology.txt) first,last,technology Amit,Jain,java Lokesh,kumar,unix Mithun,kale,spark Rajni,vekat,hadoop Rahul,Yadav,scala (spark12/salary.txt) first,last,salary Amit,Jain,100000 Lokesh,kumar,95000 Mithun,kale,150000 Rajni,vekat,154000 Rahul,Yadav,120000 Write a Spark program, which will join the data based on first and last name and save the joined results in following format, first Last.technology.salary
Problem Scenario 8 : You have been given following mysql database details as well as
other info.
Please accomplish following.
1.
Import joined result of orders and order_items table join on orders.order_id = order_items.order_item_order_id.
2.
Also make sure each tables file is partitioned in 2 files e.g. part-00000, part-00002
3.
Also make sure you use orderid columns for sqoop to use for boundary conditions.
Problem Scenario 40 : You have been given sample data as below in a file called spark15/file1.txt 3070811,1963,1096,,"US","CA",,1, 3022811,1963,1096,,"US","CA",,1,56 3033811,1963,1096,,"US","CA",,1,23 Below is the code snippet to process this tile. val field= sc.textFile("spark15/f ilel.txt") val mapper = field.map(x=> A) mapper.map(x => x.map(x=> {B})).collect
Please fill in A and B so it can generate below final output
Array(Array(3070811,1963,109G, 0, "US", "CA", 0,1, 0)
,Array(3022811,1963,1096, 0, "US", "CA", 0,1, 56)
,Array(3033811,1963,1096, 0, "US", "CA", 0,1, 23)
)
Problem Scenario 96 : Your spark application required extra Java options as below. XX:+PrintGCDetails-XX:+PrintGCTimeStamps Please replace the XXX values correctly ./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=talse -conf XXX hadoopexam.jar
Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.
Data
echo "IBM,100,20160104" >> /tmp/spooldir/bb/.bb.txt echo "IBM,103,20160105" >> /tmp/spooldir/bb/.bb.txt mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt After few mins echo "IBM,100.2,20160104" >> /tmp/spooldir/dr/.dr.txt echo "IBM,103.1,20160105" >> /tmp/spooldir/dr/.dr.txt mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt
Requirements:
You have been given below directory location (if not available than create it) /tmp/spooldir .
You have a finacial subscription for getting stock prices from BloomBerg as well as
Reuters and using ftp you download every hour new files from their respective ftp site in
directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.
As soon as file committed in this directory that needs to be available in hdfs in
/tmp/flume/finance location in a single directory.
Write a flume configuration file named flume7.conf and use it to load data in hdfs with
following additional properties .
1.
Spool /tmp/spooldir/bb and /tmp/spooldir/dr
2.
File prefix in hdfs sholuld be events
3.
File suffix should be .log
4.
If file is not commited and in use than it should have _ as prefix.
5.
Data should be written as text to hdfs
Problem Scenario GG : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2)
val b = a.keyBy(_.length)
val c = sc.parallelize(List("ant", "falcon", "squid"), 2)
val d = c.keyBy(.length)
operation 1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String)] = Array((4,lion))
Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id ,
order_item_product_id,
order_item_quantity,order_item_subtotal,order_item_product_price)
Please accomplish following activities.
1.
Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory p89_orders and p89_order_items .
2.
Join these data using orderjd in Spark and Python
3.
Now fetch selected columns from joined data Orderld, Order date and amount collected on this order.
4.
Calculate total order placed for each date, and produced the output sorted by date.
Problem Scenario 93 : You have to run your Spark application with locally 8 thread or locally on 8 cores. Replace XXX with correct values. spark-submit --class com.hadoopexam.MyTask XXX \ -deploy-mode cluster SSPARK_HOME/lib/hadoopexam.jar 10
Problem Scenario 6 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Compression Codec : org.apache.hadoop.io.compress.SnappyCodec Please accomplish following.
1.
Import entire database such that it can be used as a hive tables, it must be created in default schema.
2.
Also make sure each tables file is partitioned in 3 files e.g. part-00000, part-00002, part00003
3.
Store all the Java files in a directory called java_output to evalute the further