Project 3 Task 1 Design and implementation of HBase table Implement as a single HBase table a database that contains information described by the following conceptual schema. (1) Create HBase script solution1.hb with HBase shell commands that create HBase table and load sample data into the table. Load into the table information about at least two accidents such that each involved one person and one car. When ready use HBase shell to process a script file solution1.hb and to save a report from processing in a file solution1.rpt. Deliverables A file solution1.rpt that contains a report from processing of solution1.hb script with the statements that create HBase table and load sample data. Task 2 Querying and manipulating data in HBase table Consider a conceptual schema given below. The schema represents a simple database domain where students submit assignments and each submission consists of several files and it is related to one subject. Download a file task2.hb with HBase shell commands and use HBase shell to process it. Processing of a script task2.hb creates HBase table task2 and loads some data into it. Use HBase shell to implement the following queries and data manipulations on the HBase table created in the previous step. Save the queries and data manipulations in a file solution2.hb. Find all information about a subject that has code 312, list two versions per cell.Find all information about a submission of assignment 1 performed by a student 007 in a subject 312, list one version per cell. Delete a column family FILES.Add a column family ENROLMENT that contains information about dates when the subjects have been enrolled by the students and allow for 2 versions in each cell of the column family.Increase the total number of versions in each cell of a column family ENROLMENT. When ready, start HBase shell and process a script file solution2.hb with Hbase command shell. When processing is completed copy the contents of Command window with a listing from processing of the script and paste the results into a file solution2.rpt. Save the file. When ready submit a file solution2.rpt. Deliverables A file solution2.rpt with a listing from processing of a script file solution2.hb. Task 3 Data processing with Pig Latin Consider the following conceptual schema of a data warehouse. Download a file task3.zip published on Moodle together with a specification of Assignment 3 and unzip it. You should obtain a folder TASK3 with the following files: customer.tbl, order_details.tbl, order.tbl, product.tbl, salesperson.tbl. The files contain data dumped from a data warehouse whose conceptual schema is given above. Use editor to examine the contents of *.tbl files. Note, that each file has a header with information about the meanings of data in each column. A header is not a data component of each file. Remove the headers and transfer the files into HDFS. Create Pig Latin script solution3.pig that implements the following queries. Find the first and the last name (first-name, last-name) of sales people who handled the orders submitted by the customers located in Mexico. Find the total number of sales people who handled the orders submitted in 1996. Find the summarizations of prices (unit-price) per ordered product (product- id). Find the identifiers of orders (order-id) that included both Ikura and Tofu. When ready, use pig command line interface to process a script solution3.pig and to save a report from processing in a file solution3.rpt. Deliverables A file solution3.rpt with a report from processing of Pig Latin script solution3.pig. Task 4 Data processing with Spark In this task we use the files uploaded to HDFS in the Task 3 of this Assignment. If you have not uploaded the files then download a file task3.zip published on Moodle together with a specification of Assignment 3 and unzip it. You should obtain a folder TASK3 with the following files: customer.tbl, order_details.tbl, order.tbl, product.tbl, salesperson.tbl. When ready create a script solution4.sc that implements the following Spark-shell operations: Create a DataFrame named orderDetailsDF that contains information about the details of orders included in a file order-details.tbl.Lists all order details where quantity is greater than 50.Find the total number of orders submitted in Germany.Find the total number of orders per each country.Find 5 most expensive (use attribute unit-price) products. When ready, start Spark-shell and process a script solution4.sc in Spark-shell using :paste command. Save a report in a file solution4.rpt. Deliverables A file solution4.rpt with a report from processing of a file solution4.sc.
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS