connecting peoples like social media | My Assignment Tutor

Toxic comment detection in online discussion Student Name: Student ID: Table of Contents Hypothesis 3 The problem/ Short Description of your idea. 3 The project aim(s) 3 The project objective 3 How you plan to conduct your research 3 Project plan 3 Data Preprocess: 4 Visualize the data 4 Prepare Train and Test Datasets 4 Encoding 5 Word Vectorization 5 Model Creation 5 Evaluate the model 5 Training set 5 Test set 6 Precision recall 6 F-1 score 6 Classification report 6 Insights from the Project 6 References/ Bibilography 7 Hypothesis In the world, there is a lot of platform open know days to connecting peoples like social media is the most significant part of it. My hypothesis is to classify the data which is in sentence format, Toxic or not. The Problem/ Short Description of your idea. To detect the sentence is toxic or not, we have to use many parameters by which we build a multiclass model to identify the toxicity. Research Questions Q1 What are our main objective and finding ? Q2 Does this model and language is usefull for better prediction Q3 Does Toxic comment detection provide better solution to solve complex problems in a  society? The project aim(s) My aim in this project to focus on data pre-processing and feature engineering and ensure that data which I am using is correct and sufficient to give the outcome. The project objective Our research objective is to find out the better result to classify the toxic material, wherein research I am using many areas to collect the data and using python language. How you plan to conduct your research There are some steps like literature review, design, coding, testing, implementation, maintenance. Project plan I am using the data set to classify the toxicity in the data set by using 4 labels like Severe Toxic, Obscene, Threat, Insult, and Identity Hate. Data Preprocess: Step 1: Checking the missing values. The first thing to check the missing value in both side training and testing side this if there is some missing data. Step 2: Text Normalizations: After check there is no missing value or clear all missing value after that, I focus on normalizing the text data because when we grab the data in a sentence. Step 3 : Lemmatization: In this process, we’re grouping the different forms of words in a single group or combining the different types of words, so the whole is pointing single form. Step 4: Stopwords Removal: This is the most critical step in text preprocessing for use in text classification. Step 5: Tokenization & Indexing We use the tokenization method to break down all the data into unique words to classify all the models separately. Step 6: Padding Some variables are enormous in length, so I am not passing by convert into a vector, so challenging to modulate the model, so we are using padding. Visualize the data It represents information and data in graphical format with the help of some elements like graphs, maps, and charts. Prepare Train and Test Datasets The data is broken in the train and test data form. The training data is used to fit the model used, and test data is used to perform the predictions. Encoding It transforms the categorical values into numerical values that the model can able to understand. Following encoding methods are used by the sklearn library. They are:- 1- One Hot Encoding:- It allows data representation in a more expressive manner. Many ML algorithms do not work with this categorical data presently. 2- Label Encoding:- Conversion of labels into numeric form is called Label Encoding as it makes the data into ML readable format. Word Vectorization The process of converting a text document into the feature of the numerical vector is known as Word Vectorization. Model Creation There are three perspectives of the data model, which are discussed below:- Conceptual Model:- This perspective of the data model defines that what is the need of presenting in the data model in the order of defining and organizing the business concepts. Logical Model: In the logical model, we will know how a model can be implemented. It includes tables, columns, etc. Physical Model:- Physical Model defines that with the help of a data management system how we implement any data model A pipeline is used to create a model as first recognize the data in a structured format and then use the library to get the data in CSV and other formats. We are using two datasets and create graphs to find insights into them. Split both the dataset into train and test by using random seed 42 with the ratio of 70% and 30%. Evaluate the model Training set We can visualize accuracy and loss during the entire training process after completing the training of deep learning models. Test set It provides me the confidence for accessing the performance of deep learning models with the use of test sets. Precision recall The fraction of all correct results is known as Recall which the model returns. The weighted average of precision and recall F-1 score is calculated, which is between 0-1. F-1 score A classification error metric that helps us evaluate an algorithm’s performance is known as the F-1 score. Classification report When you complete your training and are fit for the machine learning model, evaluating the model’s performance is very important. Insights from the Project I have worked on two different deep learning models in this project, and I have implemented them on Natural Language Processing. References/ Bibilography [1]”SQL injection – Wikipedia”, En.wikipedia.org, 2021. [Online]. Available: https://en.wikipedia.org/wiki/SQL_injection. [Accessed: 14- May- 2021]. [2]”Web application firewall – Wikipedia”, En.wikipedia.org, 2021. [Online]. Available: https://en.wikipedia.org/wiki/Web_application_firewall#:~:text=A%20web%20application%20firewall%20(WAF,and%20from%20a%20web%20service. [Accessed: 14- May- 2021]. [3]W. Academy and C. scripting, “What is cross-site scripting (XSS) and how to prevent it? | Web Security Academy”, Portswigger.net, 2021. [Online]. Available: https://portswigger.net/web-security/cross-site-scripting. [Accessed: 14- May- 2021]. [4]”Path Traversal | OWASP”, Owasp.org, 2021. [Online]. Available: https://owasp.org/www-community/attacks/Path_Traversal. [Accessed: 14- May- 2021]. [5]”What Is Command Injection? | Examples, Methods & Prevention | Imperva”, Learning Center, 2021. [Online]. Available: https://www.imperva.com/learn/application-security/command-injection/. [Accessed: 14- May- 2021]. [6]”Machine Learning: What it is and why it matters”, Sas.com, 2021. [Online]. Available: https://www.sas.com/en_in/insights/analytics/machine-learning.html#:~:text=Machine%20learning%20is%20a%20method,decisions%20with%20minimal%20human%20intervention. [Accessed: 14- May- 2021]. [7]”sklearn.svm.SVC — scikit-learn 0.24.2 documentation”, Scikit-learn.org, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html. [Accessed: 14- May- 2021]. [8]”sklearn.ensemble.RandomForestClassifier — scikit-learn 0.24.2 documentation”, Scikit-learn.org, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. [Accessed: 14- May- 2021]. [9]”sklearn.naive_bayes.GaussianNB — scikit-learn 0.24.2 documentation”, Scikit-learn.org, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html. [Accessed: 14- May- 2021]. [10]”1.17. Neural network models (supervised) — scikit-learn 0.24.2 documentation”, Scikit-learn.org, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/neural_networks_supervised.html. [Accessed: 14- May- 2021]. [11]”sklearn.metrics.f1_score — scikit-learn 0.24.2 documentation”, Scikit-learn.org, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html. [Accessed: 14- May- 2021]. [12]J. Brownlee, “A Gentle Introduction to k-fold Cross-Validation”, Machine Learning Mastery, 2021. [Online]. Available: https://machinelearningmastery.com/k-fold-cross-validation/. [Accessed: 14- May- 2021]. [13]”About Train, Validation and Test Sets in Machine Learning”, Medium, 2021. [Online]. Available: https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7?gi=91af07256830. [Accessed: 14- May- 2021]. [14]”Python For Beginners”, Python.org, 2021. [Online]. Available: https://www.python.org/about/gettingstarted/. [Accessed: 14- May- 2021]. [15]”Project Jupyter”, Jupyter.org, 2021. [Online]. Available: https://jupyter.org/. [Accessed: 14- May- 2021]. [16]”Binary classification – Wikipedia”, En.wikipedia.org, 2021. [Online]. Available: https://en.wikipedia.org/wiki/Binary_classification#:~:text=Binary%20classification%20is%20the%20task,basis%20of%20a%20classification%20rule. [Accessed: 14- May- 2021]. [17]”What is an Intrusion Detection System? | Barracuda Networks”, Barracuda.com, 2021. [Online]. Available: https://www.barracuda.com/glossary/intrusion-detection-system#:~:text=An%20intrusion%20detection%20system%20(IDS,information%20and%20event%20management%20system. [Accessed: 14- May- 2021]. [18]”Data Mining: How Companies Use Data to Find Useful Patterns and Trends”, Investopedia, 2021. [Online]. Available: https://www.investopedia.com/terms/d/datamining.asp#:~:text=Data%20mining%20is%20a%20process,increase%20sales%20and%20decrease%20costs. [Accessed: 14- May- 2021]. [19]”Data visualization beginner’s guide: a definition, examples, and learning resources”, Tableau, 2021. [Online]. Available: https://www.tableau.com/learn/articles/data-visualization#:~:text=Data%20visualization%20is%20the%20graphical,outliers%2C%20and%20patterns%20in%20data. [Accessed: 14- May- 2021]. [20]”Data Encoding Techniques for Machine Learning Applications | Jigsaw Academy”, Jigsaw Academy, 2021. [Online]. Available: https://www.jigsawacademy.com/data-encoding-techniques-for-machine-learning-applications/. [Accessed: 14- May- 2021].

QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply

Your email address will not be published. Required fields are marked *