big data in recommender systems | My Assignment Tutor

COMP03210/COMP6210 Workshop Week 7Tutorial Questions1. Can you provide some examples of big data in recommender systems?Answer: Large volume of user-item interaction data (e.g., users’ clicks, comments on items) on ecommerce websites (e.g., yelp.com) and online social networks (e.g., Facebook). For example, in oneminute, Yelp users post 26,380 reviews, Facebook users share 2,460,000 pieces of content.2. What are the main challenges in streaming recommender systems?Answer: Overload problem, learning long-term users’ preference and capturing user preferencedrift.3. What are the main strategies used for handling streaming data in recommender systems?Answer: Effective sampling, reservoir maintenance and user preference drift detection.4. What are the main target problems and the corresponding recommendation algorithms in Netflix?Answer: Ranking videos is the core problem in Netflix. Top-N video ranking and continuewatching ranking are the two main recommendation algorithms to address this problem.Practical Questions1. Design a Map-Reduce job that counts the total number of the occurrences of a certain word (such as‘MapReduce’) in ‘MapReduce_wiki.txt’.Input: We use the input file ‘MapReduce_wiki.txt’ in ‘Map-Reduce Example for Python’.Running command in command prompt/interpreter (cmd.exe) is ‘python Practical2_Q1_Solution.pyMapReduce_wiki.txt > output_Practical2_Q1.txt’.Output: The total number of the occurrences of ‘MapReduce’ in ‘MapReduce_wiki.txt’ is 12.Code (a single-step job):“””Practical 2 — Question 1 Single-step jobDescription: Design a Map-Reduce job that count the total number of the occurrences of acertain word in the inputThis is a kind of single-step job which only needs to subclass MRJob andoverride a few methodIf you want to learn more, visithttps://mrjob.readthedocs.io/en/latest/guides/writing-mrjobs.html#single-step-jobs“””from mrjob.job import MRJobimport reWORD_RE = re.compile(r”[w’]+”)# Here, we count the total number of the occurrences of ‘mapreduce’ in the input file.query_word=’MapReduce’class MRMostUsedWord(MRJob):# step 1: mapper, count the number of the occurrences of query_word in each mapperdef mapper(self, _, line):for word in WORD_RE.findall(line):if word == query_word:yield word, 1# step 2: reducer, count the total number of the occurrences of query_word in thewhole inputdef reducer(self, word, counts):yield word, sum(counts)if __name__ == ‘__main__’:MRMostUsedWord.run()2. Design a Map-Reduce job that finds the most commonly used word in ‘MapReduce_wiki.txt’.Input: We still choose ‘MapReduce_wiki.txt’ as the input file.Running command in command prompt/interpreter (cmd.exe) is ‘python Practical2_Q2_Solution.pyMapReduce_wiki.txt > output_Practical2_Q2.txt’.Output: The most commonly used word in ‘MapReduce_wiki.txt’ is ‘the’, the number of occurrencesis 24.Code (a multi-step job):“””Practical 2 — Question 2 Multi-step jobDescription: Design a Map-Reduce job that finds the most commonly used word in the inputThis is a kind of multi-step job which needs to override steps() to return alist of MRStepsFrom: https://mrjob.readthedocs.io/en/latest/guides/quickstart.html#writing-your-secondjob“””from mrjob.job import MRJobfrom mrjob.step import MRStepimport reWORD_RE = re.compile(r”[w’]+”)class MRMostUsedWord(MRJob):#This is a multi-step job, so we need to override steps() functiondef steps(self):# step 1: mapper, get each word in each line# step 2: combiner, count the numbers of the words after each mapper (it candecrease total data transfer)# step 3: reducer, count the total numbers of the words in the whole input# step 4: reducer, find the most commonly used (the maximum number) word in thewhole inputreturn [MRStep(mapper=self.mapper_get_words,combiner=self.combiner_count_words,reducer=self.reducer_count_words),MRStep(reducer=self.reducer_find_max_word)]# step 1: mapper, get each word in each linedef mapper_get_words(self, _, line):# yield each word in the linefor word in WORD_RE.findall(line):yield (word.lower(), 1)# step 2: combiner, count the numbers of the words after each mapper# this step can decrease total data transfer, you can remove this step, but it willspend more timedef combiner_count_words(self, word, counts):# optimization: sum the words we’ve seen so faryield (word, sum(counts))# step 3: reducer, count the total numbers of the words in the whole inputdef reducer_count_words(self, word, counts):# send all (num_occurrences, word) pairs to the same reducer.# num_occurrences is so we can easily use Python’s max() function.yield None, (sum(counts), word)# step 4: reducer, find the most commonly used (the maximum number of occurrences)word in the whole input# discard the key; it is just Nonedef reducer_find_max_word(self, _, word_count_pairs):# each item of word_count_pairs is (count, word),# so yielding one results in key=counts, value=wordyield max(word_count_pairs)if __name__ == ‘__main__’:MRMostUsedWord.run()

QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply

Your email address will not be published. Required fields are marked *