Data Analysis for Enterprise Modelling | My Assignment Tutor

MIDDLESEX UNIVERSITYCOURSEWORK 22020/21CST2330Data Analysis for Enterprise ModellingRoman BelavkinThis assignment is worth 50% of the overall grade. The submission date is Week24, Friday, 19:00 April 16, 2021. You should do the assignment individually.Contents1 Data clustering and classification (20%) 11.1 Analysis and classification of market conditions . . . . . . . . . . . 11.2 Clustering of cryptocurrency returns . . . . . . . . . . . . . . . . . 22 Data modelling and prediction (20%) 33 Presentation (10%) 5CST2330, COURSEWORK 2, 2020/21 1Software requiredYou are recommended to use R language and R-Studio | an integrated development environment for R. They are available on the University computers fromApps Anywhere and can also be installed on personal computers. Free copies areavailable at: may need the following libraries in R-Studio:• DBI | if you need to connect and read databases.• xts | for extensible timeseries objects.• factoextra | for visualisation of clusters in Task 1.• kohonen | for self-organising maps in Task 1.• pls | for more sophisticated types of linear regression in Task 2• neuralnet | for artificial neural networks in Task 2.Data requiredThis coursework requires the log.returns dataset, which you should have createdin Task 3 of Coursework 1 (or in lab exercises of Weeks 10 and 11) from thecrypto-candles dataset. The crypto-candles dataset can be downloaded fromthe data folder on the course’s webpage (My Learning), where it is available intwo alternative formats:crypto-candles.csvcrypto-candles.dbPlease, refer to the instructions of Coursework 1 or lab exercises of Weeks 10{11.1 Data clustering and classification (20%)1.1 Analysis and classification of market conditionsIn this task you have to:1. Classify the data into days when the market was bullish (most assets go upin price) or when it was bearish (most assets go down in price). Include plotsvisualising these clusters. 5 marks2. Describe the behaviour of some main cryptocurrency pairs, such as BTC/USDor ETH/USD, on the days with bullish and bearish trends. 2 marksCST2330, COURSEWORK 2, 2020/21 23. Identify outliers (points that lie far from cluster centres) and look at thecorresponding dates. Search past news headlines to see if there were any interesting events on those dates that could explain the unusual market movements. 3 marksYou have to support your analysis and conclusions by plots. Include your R codeinto the Appendix.Additional detailsTo complete this task, you need to analyse the log.returns dataset, where columns(variables) are different cryptocurrency pairs, and rows (observations) are differenttrading days, such as the sample shown below:tBTCUSD tETHUSD tEOSUSD tEOSBTC2019 -01 -03 -0.031232550 -0.044525926 -0.075494326 -0.0456359332019 -01 -04 0.007767325 0.038988534 0.014945587 0.0094965582019 -01 -05 -0.010932127 -0.001010101 -0.015746082 -0.0060120852019 -01 -06 0.063509080 0.015791781 0.068055167 0.0035576082019 -01 -07 -0.013160785 -0.038340722 -0.038148675 -0.0256977772019 -01 -08 -0.003384510 -0.012546890 0.003315349 0.008504637You can use one or any of the following methods:• Principle component analysis (see Ex. 5, Lab 14).• k-means or hierarchical clustering (see Ex. 3, Lab 15).• Self-organising map (see Ex. 4, Lab 16).1.2 Clustering of cryptocurrency returnsIn this task you have to:1. Identify groups (4 or more) of cryptocurrency pairs that have similar logreturns. Mention examples of pairs in each group. Include plots visualisingthese clusters. 5 marks2. Identify a group of so-called ‘stable coins’, and use your visualisations toexplain how are they different from other groups. 2 marks3. Choose any cryptocurrency pair that you think looks interesting or differenton your graphs. Search information about it online to find some possibleexplanations for your observation. 3 marksSupport your conclusions by plots. Include your R code into the Appendix.CST2330, COURSEWORK 2, 2020/21 3Additional detailsTo complete this task, you need to analyse the transposed version of the log.returnsdataset, where columns (variables) are different trading days, and rows (observations) are different cryptocurrency pairs, such as the sample shown below:2019 -01 -03 2019 -01 -04 2019 -01 -05 2019 -01 -06tBTCUSD -0.03123255 0.007767325 -0.010932127 0.063509080tETHUSD -0.04452593 0.038988534 -0.001010101 0.015791781tEOSUSD -0.07549433 0.014945587 -0.015746082 0.068055167tEOSBTC -0.04563593 0.009496558 -0.006012085 0.003557608As before, you can use one or any of the following methods:• Principle component analysis (see Ex. 5, Lab 14).• k-means or hierarchical clustering (see Ex. 4, Lab 15).• Self-organising map (see Ex. 5, Lab 16).2 Data modelling and prediction (20%)In this task you have to1. Choose and arrange a subset of the log-returns data for modelling and prediction. 3 marks2. Split the arranged subset into the training and testing sets. 2 marks3. Use one or more techniques to train the models on the training set and thenevaluate their predictions on the testing set. 10 marks4. Measure and compare the performance of two or more models. 5 marksAdditional detailsTo complete this task, you need to choose log-returns of any one cryptocurrencypair from the log-returns dataset and prepare the sets for training models andtesting their predictions. Here you can choose any of the following arrangementsof the data:• You can use log-returns of several previous days of the same cryptocurrencyas predictors of the next day log-return. The example below shows arrangement for IOT/BTC: the first three columns are predictors (log-returns on 3consecutive days) and the last columns is the response (log-return on 4thday):CST2330, COURSEWORK 2, 2020/21 4tIOTBTC tIOTBTC .3 tIOTBTC .2 tIOTBTC .12019 -01 -06 -0.010913609 -0.015048654 -0.009534739 -0.0316311342019 -01 -07 -0.015048654 -0.009534739 -0.031631134 -0.0200842662019 -01 -08 -0.009534739 -0.031631134 -0.020084266 -0.0141360722019 -01 -09 -0.031631134 -0.020084266 -0.014136072 0.016436308(see Ex. 2, Lab 19).• You can use log-returns of other cryptocurrencies as predictors. The examplebelow shows the arrangement using log-returns of BTC/USD, ETH/USD andIOT/BTC of the same day as predictors of IOT/BTC on the following day:tBTCUSD tETHUSD tIOTBTC tIOTBTC .12019 -01 -03 -0.031232550 -0.044525926 -0.010913609 -0.0150486542019 -01 -04 0.007767325 0.038988534 -0.015048654 -0.0095347392019 -01 -05 -0.010932127 -0.001010101 -0.009534739 -0.0316311342019 -01 -06 0.063509080 0.015791781 -0.031631134 -0.0200842662019 -01 -07 -0.013160785 -0.038340722 -0.020084266 -0.0141360722019 -01 -08 -0.003384510 -0.012546890 -0.014136072 0.016436308(see Ex. 3, Lab 19).• You can use more columns to take into account more data from the past andfrom different cryptocurrency pairs as predictors.After arranging the data, you have to split it into the training and testing setsusing 70%{30% or 80%{20% ratio (see Ex. 2, 3, Lab 19).You can use one or more of the following techniques for modelling the data:• Multiple linear regression (see Ex. 2, 3, Lab 19).• Principle component and partial least squares regression (see Lab 20).• Artificial neural networks (see Lab 21).You can use any of the following measures to compare the performance of themodels on the test data:• Root mean-squared error (RMSE) (see Ex. 2, Lab 17, or Labs 19{20).• Correlation of predicted and desired responses (see Ex. 2, Lab 17 or Labs18{20).• Mean rate of return (see Ex. 4, Lab 18 or Labs 19{20).CST2330, COURSEWORK 2, 2020/21 53 Presentation (10%)Your report should be well presented. A good guide is the Publication Manual ofthe American Psychological Association (e.g. see the very least, your report should be clear, typed or nicely hand-written document with good spelling, grammar and easy to understand English. There isno word limit, but a useful report should be just long enough to describe thework. Tables, graphs, careful labelling and numbering are all well established andeffective presentation tools.Things to avoid are:• Including images or diagrams that you did not create yourself or did notobtain the permission to use from the author (even if the image is from theInternet).• Including graphs or diagrams that you do not explain.• Forgetting to label the axes on the charts.• Including material irrelevant to the work.Assignment SubmissionsSubmit your report online using My Learning by Week 24, Friday, 19:00 April16, 2021.


Leave a Reply

Your email address will not be published. Required fields are marked *