BUS5CA Customer Analytics and Social Media | My Assignment Tutor

1BUS5CA Customer Analytics and Social MediaSemester 2 2020Assignment 4 (Supplementary)Sentiment Analysis and Customer SegmentationRelease Date: 8th January 2021 (Friday) @ 12:00pmDue Date: 11th January 2021 (Monday) @ 5:00pmAssignment Type: IndividualWeight: 30%Format of Submission:Reports (electronic form) and electronic submissions of analytics files (SAS files and R scripts)to be emailed to the subject coordinator by the due date.Important Notes:You should complete ALL the tasks given in this assignment. You must at least achieve 50% ofthe total marks for each case study in order to PASS this supplement assessment.No late submission is allowed. If your submission was not received by the due date (asmentioned above), this assignment will be regarded as FAIL.1. Case Study A (Sentiment Analysis): As a data scientist working for a movie reviewfirm, you are tasked to develop a sentiment analytics engine for Twitter, which isused to predict consumers’ review sentiments. The aim is to develop both dictionarybased and machine learning-based sentiment analytics scripts using a number of Rlibraries and the SAS Sentiment Analysis Studio (which were covered in the workshopactivities on Week 3 and Week 4 in Semester 2). You are required to use thedeveloped engine to predict movie reviewers’ sentiments and benchmark variousalgorithms and analytics tools.2. Case Study B (Customer Profiling): Customer segmentation is the process of splittingcustomers into different groups with similar characteristics for potential businessvalue proposition. Many companies find that segmenting their customers enablethem to communicate, engage with their customers more effectively. Moore Bank isconducting an analysis on the existing customer profiles and the marketing campaigndata to identify the target customers who are mostly likely to subscribe long-termdeposits. As a member of the data analytics team, you are tasked to analysehistorical data and develop predictive models for marketing purposes. Your managerhas designed a pilot project focusing on clustering-based customer segmentationand profiling to discover consumer insights.2Case Study A (15%)Sentiment analysis is the technique aiming to gauge the attitudes of customers in relation totopics, products and services of interests. It is a pivotal technology for providing insights toenhance the business bottom line in campaign tracking, customer-centric marketing strategyand brand awareness. Sentiment analytics approaches are used to produce sentimentcategories such as ‘positive’, ‘negative’ and ‘neutral’. More specific human emotions are alsothe topic of interest. There are two major streams of methods to develop sentiment analyticsengine: the dictionary-based and machine learning-based approaches. In this part of theassignment, you are required to perform sentiment analytics based on both approaches.Task Requirements:As a data scientist, you are required to perform a number of data analytics tasks. You aretasked to develop both dictionary-based and machine-learning sentiment analytics enginesusing the R programming language and apply it to predict the sentiments of movie reviewtweets from a sample of data. You are also required to use the SAS Sentiment AnalysisStudio to compare the results.To achieve the above, you need to carry out the following data analytics tasks:Task 1: Develop a dictionary-based sentiment analytics engine based on the R library‘syuzhet’ and ‘tidytext’ to analyse the different emotions from the review tweets (5%).• Analyse and aggregate the eight emotions (anger, anticipation, disgust, fear, joy,sadness, surprise and trust) from the review tweets file ‘movie_tweets.csv’ usingthe function ‘get_nrc_sentiment’. (You are required to plot a chart to visualisethese emotions using the R library ‘ggplot2’.)• Finding the top 5 most frequent words in all the movie reviews for each of theeight emotions (anger, anticipation, disgust, fear, joy, sadness, surprise andtrust). Analyse and discuss the results.Task 2: Develop a machine learning-based model using the R libraries ‘tm’ and ‘e1071’ aswell as evaluate the predictive accuracies of two classifiers (5%).• Develop R scripts and import the data sets from the folder ‘movie_tweets’ fortraining and testing.• Use both the negative tweets and the positive tweets from the subfolder ‘training’as the training dataset; and use the rest of the negative tweets and the positivetweets from the subfolder ‘testing’ as the testing dataset.(Hint: You may need to use as.character() function to convert a dataframe columnfrom factors to characters.)• Develop a machine learning-based sentiment analytics engine and predictsentiment categories (only as ‘positive’ and ‘negative’) using ‘tm’ and ‘e1071’ withthe Naïve Bayes classifier and the SVM classifier.• Evaluate the testing accuracies and report the predicted results.3Task 3: Develop a statistical model using SAS Sentiment Analysis Studio and evaluate theaccuracies (5%).• Use the same data folder: ‘movie_tweets’ which contain ‘negative’ and ‘positive’tweets for training and testing.• Build a statistical model using SAS Sentiment Analysis with three differentadvanced settings. You need to change configurations in the advanced model toobtain the best training accuracy and keep a record of how you manage toimprove the accuracy.(Hint: Refer to the SAS Sentiment Analysis Studio tutorial in Semester 2.)• Evaluate and compare the testing accuracies for different models and report theresults.• Compare the results obtained in this task with the previous predictive resultsusing R (from Task 2) and discuss.You are required to:a) Prepare a report for Case Study A with all the analytics results to the above three keytasks. (You can use an appendix for any additional screenshots which you feel areimportant for the report). The report should be named as:_Assignment4_CaseStudyA_Report.docb) Save the R script for both Task 1 and Task 2 as:_Assignment4_CaseStudyA.rc) Save the SAS Sentiment Studio project as:_Assignment4_CaseStudyA_SAS.zip(The detailed saving procedures can be found in Assignment 1 Additional TechnicalSupport file given in Semester 2.)4Case Study B (15%)As a member of the data analytics team for Moore Bank, you are tasked to analyse thehistorical data of their existing customer base and develop predictive models for marketingpurposes. Your manager has designed a pilot project focusing on clustering-based customersegmentation and profiling to discover consumer insights.Dataset:The dataset required for this assignment (bank_data) is given the csv format. You shouldimport the given dataset into your SAS project with the File Import node from the Sample tab.Task Requirements:The project that you are undertaking in this case study is seeking knowledge and insightsrelating to:• The demographics-based segments and their profiles;• The representative behavioural profiles for each segment;• How the produced segments can be mapped to a broader concept of segments inAustralian community.A number of analytics tasks are designed by the team to achieve the above objectives. You areexpected to use SAS to perform clustering and profiling segments with the support of Rprogramming for this assignment. You are required to relate the segments and profiles inconjunction with Roy Morgan value segments. Please use the following link to furtherunderstand these value segments: http://www.roymorgan.com/products/values-segments.Task 1: Customer segmentation based on demographics data (5%)By using the SAS Enterprise Miner, conduct a clustering and segment profiling based on thedemographics data (Age, Career, Marital_Status, Education).• What are the key demographics segments for the whole dataset? Describe the mainprofiles and then map them into the Roy Morgan segments.• What are the most important variables based on each segment? (Target: Subscribed)• Are there differences in segments for customers subscribed to long-term deposit andthose who did not?[Hint: Adopt and try 5-7 clusters, interpret and map them into the Roy Morgan segments. Toidentify variable importance, you need to set “Subscribed” as target. To understand thedifference in segments, you may need perform clustering separately for the subscribedcustomers and the non-subscribed group. In order to do this, you may need the Filter nodefrom the Sample tab under SAS Enterprise Miner.]Task 2: Customer segmentation based on behavioural data (5%)Considering the behavioural variables in the data (Default_Credit, Mortgage,Personal_Loan), you are required to conduct a clustering and segment profiling.• What are the key behavioural segments for the whole dataset? Describe the mainprofiles.5• What are the important variables based on each segment? (Target: Subscribed)• Are there differences in segments for customers subscribed to long-term deposit andthose who did not?[Hint: Use no more than 5 clusters. You should adopt the same approach from Task 1.]Task 3: Cross cluster analysis – demographics to behavioural segments (5%)For each individual (both subscribers and non-subscribers), record the correspondingdemographics and behavioural clusters (based on Task 1 and Task 2 above). Perform a crosscluster analysis in R by using demographics clusters as rows and behavioural clusters ascolumns in a table.[Hint: To do this, you may need to export your segment results from Task 1 and Task 2 (withthe Save Data node from the Utility tab and save as a .csv format) and use the R table andprobability table functions. You should make sure that your segment results from SAS includethe customer index (the row number) and the target variable (“Subscribed”).]• Are there any significant associations between the two types of segments? Discuss.[Hint: Investigate the cross table and identify combined segments with majorassociations.]• Is there a relationship between the outcome (Subscribed) and the combineddemographics and behavioural segments identified? Explain the combined segmentsproduced from demographics and behavioural clusters and their associations with theoutcome (Subscribed).[Hint: Look at the lift of “yes” of Variable 8 as compared to the average for eachselected combined segment.]Lift calculation example:The lift for the combined segment of demographic segment 1 and behavioural segment1 = Frequency of subscribers in the combined segment of demographic segment 1 andbehavioural segment 1 / Frequency of the whole population in the combined segmentof demographic segment 1 and behavioural segment 1You are required to:a) Prepare a report with answers for the above three key tasks. (You can use an appendixfor any additional screenshots which you feel are important for the report.)The report should be named as: _Assignment4_CaseStudyB_Report.docb) Save the SAS project for Task 1 and Task 2 above as SPK files with the name as below:_Assignment4_CaseStudyB_Task1.spk_Assignment4_CaseStudyB_Task2.spkc) Zip the two SPK files and name the zipped file as:_Assignment4_CaseStudyB_SAS.zipd) Save the R code for Task 3 as: _Assignment4_CaseStudyB.r6General Report Guidelines1. The report should consist of a table of contents, an introduction, and logicallyorganised sections/topics (such as ‘case study A’, ‘case study B’), a conclusion and a listofreferences where necessary.2. Choose a fitting sequence of sections/topics for the body of the report. Two sectionsfor the two case studies are essential, you may add other sub-sections deemedrelevant.3. You should include diagrams, tables and charts from the analytics solutions toeffectively present your results. (Consider using Alt + Print Screen to capturescreenshots if needed.)4. Page limit: For each case study, five (5) pages for the main report writing but notmore than ten (10) pages including appendices.5. Reports should be written in Microsoft Word (font size 11) and submitted as a Wordfile.6. Final submission will comprise six (6) separate files:a. _Assignment4_CaseStudyA_Report.doc;b. _Assignment4_CaseStudyB_Report.doc;c. _Assignment4_CaseStudyA.r;d. _Assignment4_CaseStudyB.r;e. _Assignment4_CaseStudyA_SAS.zip;f. _Assignment4_CaseStudyB_SAS.zip.Important: You should submit all the reports, R scripts and SAS files via email to the subjectcoordinator.7Marking RubricsA grade will be awarded to each of the tasks and then an overall mark determined for theentire assessment. The rubric below gives you an idea of what you must achieve to earn acertain ‘grade’. As a general rule, to meet a ‘C’, you must first satisfy the requirements of a‘D’. And for an ‘A’, you must first satisfy the requirements of a ‘B’, which must of coursefirst meet the requirements of a ‘C’ and so on.The marking rubric for this assignment is given below. CriterionPassCreditDistinctionHigh DistinctionCase study A Task 1:Develop dictionarybased sentimentanalytic engine andanalyse emotions(5 marks)Limited effort toFair effort toExcellent effort toExceptional effort tostructure and presentstructure and presentstructure and presentstructure and presentinsights for emotionsinsights for emotionsinsights for emotionsinsights for emotionsfrom tweets.from tweets.from tweets.from tweets.Limited knowledge ofFair knowledge of theExcellent knowledgeComprehensivethe R programming.R programming.of the Rknowledge of the Rprogramming.programming.Case study A Task 2:Develop machinelearning-basedsentimentanalytic engine andevaluate predictiveaccuracies using R(5 marks)Limited effort toFair effort toExcellent effort toExceptional effort tostructure and presentstructure and presentstructure and presentstructure and presentinformation andinformation andinformation andinformation andinsights.insights.insights.insights.Limited knowledge ofFair knowledge of theExcellent knowledgeComprehensivethe R programming.R programming.of the Rknowledge of the Rprogramming.programming.Case study A Task 3:Limited effort toFair effort toExcellent effort toExceptional effort toDevelop sentimentstructure and presentstructure and presentstructure and presentstructure and presentanalytic engineinformation andinformation andinformation andinformation andusing SAS Sentimentinsights.insights.insights.insights.Analysis StudioLimited knowledge ofFair knowledge ofExcellent knowledgeComprehensive(5 marks)SAS SentimentSAS Sentimentof SAS Sentimentknowledge of SASStudio.Studio.Studio.Sentiment Studio.Case Study B Task 1:Segmentation /profiling ondemographics data(5 marks)Limited effort toaddress questions andpresent informationand insights.Limited knowledge ofSAS Enterprise Miner.Fair effort to addressquestions and presentinformation andinsights.Fair knowledge of SASEnterprise Miner.Excellent effort toaddress questions andpresent informationand insights.Excellent knowledge ofSAS Enterprise Miner.Exceptional effort toaddress questions andpresent informationand insights.Comprehensiveknowledge of SASEnterprise Miner.Case Study B Task 2:Segmentation /profiling onbehavioural data(5 marks)Limited effort toaddress questions andpresent informationand insights.Limited knowledge ofSAS Enterprise Miner.Fair effort to addressquestions and presentinformation andinsights.Fair knowledge of SASEnterprise Miner.Excellent effort toaddress questions andpresent informationand insights.Excellent knowledge ofSAS Enterprise Miner.Exceptional effort toaddress questions andpresent informationand insights.Comprehensiveknowledge of SASEnterprise Miner.Case Study B Task 3:Cross cluster analysis(5 marks)Limited effort toaddress questions andpresent informationand insights.Limited knowledge ofR and other supportingtools.Fair effort to addressquestions and presentinformation andinsights.Fair knowledge of Rand other supportingtools.Excellent effort toaddress questions andpresent informationand insights.Excellent knowledge ofR and other supportingtools.Exceptional effort toaddress questions andpresent informationand insights.Comprehensiveknowledge of R andother supporting tools. Important information• Standard plagiarism and collusion policy of this university apply to this assignment.8Appendix: Attribute Information for Case Study BThis section contains a description of the attributes of the dataset.{‘name of the column’: ‘description’}Input variables:1 – Age (numeric)2 – Career: career type (categorical: ‘admin.’, ‘blue-collar’, ‘entrepreneur’, ‘housemaid’, ‘management’,‘retired’, ‘self-employed’, ‘services’, ‘student’, ‘technician’, ‘unemployed’, ‘unknown’)3 – Marital_Status: marital status (categorical: ‘divorced’, ‘married’, ‘single’; note: ‘divorced’ meansdivorced or widowed)4 – Education (categorical: ‘primary, ‘secondary’, ‘tertiary’, ‘unknown’)5 – Default_Credit: has credit in default? (binary: ‘no’,’yes’)6 – Mortgage: has home loan? (binary: ‘no’,’yes’)7 – Personal_Loan: has personal loan? (binary: ‘no’,’yes’)Output variable (desired target):8 – Subscribed – has the client subscribed a term deposit? (binary: ‘yes’,’no’)The dataset is adapted from:S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing.Decision Support Systems, Elsevier, 62:22-31, June 2014.

QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply

Your email address will not be published. Required fields are marked *