predict the type of leukemia | My Assignment Tutor

Q1 [25 marks] [10 marks] Describe briefly the following classification techniquesK-nearest neighborDecision tree [15 marks] A healthcare provider wishes to use data mining to predict the type of leukemia for patients, whether ALL or AML. For this classification task, you are to use rapidminer for prototyping the system. Below is the main rapid miner process. a. Given the following rapidminer operators: decision treeapply modelclassification performance   describe how to connect these operators as a subprocess (of the validation operator). Feel free to sketch a diagram to show the different connections.      b. If you want to evaluate k-nearest neighbors algorithm (using K-NN operator). Describe what you need to change in the subprocess. c. after running the main process, below is the performance output you received. What is the accuracy of the evaluated algorithm? Q2 [25 marks] [10 marks] Describe briefly the following Cross validationThe difference between a model and a pattern [15 marks] Suppose you are employed as a data mining consultant for the British University in Dubai. Describe how data mining can help the university by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied. [15 marks] Q3 [25 marks] [a] [10 marks] Describe briefly the following Parallel coordinatesScatter plot arrays [b] [15 marks] Dubai stock exchange (DFM) wants to cluster “similar” stocks, based on the evolution of stock prices over time. In particular, the price for each stock was collected every second for one week (note that DFM works for 4 hours per day and only opens 5 days per week). DFM wants to use Dynamic Time Warping for measuring the proximity. Answer the following questions: how many attributes are in the data set? (choose one answer and justify)4 attributes10 attributes20 attributes72000 attributesNone of the abovelower bound function LB was proposed. Trace how 1-nearest neighbor will work with the lower bound function and using the table below (hint add columns best_so_far and DTW_computed) CandidatesDTWLBS1108S264S320S475S531S642 Q4 [25 marks] [10 marks] Describe briefly the following Anomaly detectionAssociation rule mining[15 marks] Determining a suitable representation and a suitable proximity measure are crucial decisions in many data mining tasks. For each of the following cases: (i) describe briefly any preprocessing needed and how you will represent the data (i.e. how would you store the data) (ii) choose a suitable proximity measure and justify your answer. A dataset of movie ratings: each record contains user_id (integer), movie_id (integer), time stamp (integer), and rating (integer from 1 to 5) [5 marks]A dataset of gulfnews articles: each record contains date (date), title (text), and body (text) [5 marks]A dataset of student marks: each record contains student marks (integer) for different modules. [5 marks]


Leave a Reply

Your email address will not be published. Required fields are marked *