EXPLORATORY VISUALIZATION MODEL FOR MEASURING THE DIGITAL DIVIDE IN ASIAN AND EUROPEAN COUNTRIES CHAPTER ONE INTRODUCTION Introduction For several decades, the digital divide has been one of the most celebrated universal phenomenon. However, there is no clear definition of the term ‘digital divide’ owing to the fact that in the mid-1990, an extensive assortment of contentious elucidations and approaches to the digital divide started coming up. The universally accepted technocratic definition of the digital divide is that it is basically the dissimilarity in the provision of access to technological services whereas information sociologist passive the technological divide as an expression of numerous social, geographical, economical and informative divides (Van Dijk, 2017). In July 1995, the US department of commerce’s National Telecommunications and Information Administration (NTIA) carried out the initial investigation on the Digital Divide in a survey that came to be known as ‘Falling Through the Net’. Falling Through the Net is largely credited for introducing the dos and don’ts of what is currently referred to as the Information and Communication Technologies (Kiely and Salazar, 2018). Despite the fact that the digital divide is fuelled by a wide array of elements, Internet reigns supreme amongst the key drivers of the digital divide at the moment. Internet setups compounded with the quality of internet performance have gained a lot of significance in all facets of human life whether it is the economic sphere of life, the social aspects or the political facet of human life. In recognition of the impact of the digital divide and the role of the Internet performance in widening the digital divides across various societies and regions, the PingER project was brought into existence for the chief purpose of measuring Internet end to end performance in different regions across the globe (White, B. and Cottrell, L., 2016). The formulation of the PingER project was largely informed by the notion that effective measurement of the digital divide in different regions would undoubtedly play a key role in guiding all decision making processes that are targeted towards tackling the problem of the digital divide (Wenwei and Fang, 2018). The PingER project monitors end to end performance of a wide array of Internet Links globally. Two and a half decades since the PingER project was initiated, the project now boasts of an extensive array of data repositories of internet performance measurements sourced from numerous data collection sites around the globe. The PingER project data repository is made up of millions of datasets collected since 1998. Despite the fact that all pieces of data collected from the PingER project are available online for free, the huge volumes and complexity of the data involved makes it difficult for common individuals to analyse and interpret the data in a manner that will enable them to generate any meaningful insights that can be deemed to be beneficial in the fight against the digital divide. Most individuals tasked with the responsibility of tackling the digital divide lack any meaningful data science knowledge. In recognition of this fact, the administrators of the PingER project sought to employ various data science and information visualization techniques with a goal of making the data understandable to the general populous. Information visualizations have acquired a huge sense of value in the day to day lives of all human beings. This is because accurate statistical models compounded with information visualization enables humans to come up with a wide array of techniques and decisions. This in turn, eases the burden brought about by various daily challenges by generating accurate predictions and key insight that in one way or another. Therefore, visualization often proves to be quite helpful in human decision making processes that are targeted towards tackling various problems (Hoeber, 2018). It would be quite impossible for individuals to comprehend the meaning of different forms of data especially when the volumes of the data in question are extremely large. The value of data in human decision making processes cannot be overlooked but considering the fact raw data cannot be able to relay any meaningful information. Individuals have to rely on general guidelines derived from various statistical models and presented through various information visualization techniques to guide them towards a specific understanding which will enable them to develop a precise action plan. The PingER project is undoubtedly the best practical example of a scenario whereby raw data could be useless especially because extensive volumes of data are involved. Thus it is definitely impossible to generate any meaningful clues that could guide the next course of action with regard to tackling the problem of the digital divide between Asia and Europe. In acknowledgement of the extensive negative impact of the digital divide on the general populous the administrators of the PingER project sought to breakdown the huge volumes of complex data that are continuously generated by the PingER project by employing a wide array of information visualization techniques. According to a significant number of research studies like (Plaisant and Carpendale, 2011; Balliet and Heimlich, 2016), information visualization is a very robust technique for the exploration of huge volumes of data especially because information visualization amalgamates the superior visual capabilities of human beings with the computational prowess of digital resources. From a general perspective, exploratory visualization is defined as a process whereby a data science practitioner creates different graphics while dealing with complex or relatively unknown data sets. Exploratory visualization process commences with the effective collection of data and culminates with the development of cutting-edge concepts and prepositions (Howe and Heer, 2015). The concepts and prepositions developed from exploratory visualization are used for further analysis and debates. In acknowledgement of the extensive scope and complexity of data, the quantification of the digital divide between Asian countries and European countries employed various exploratory visualization techniques. The utilization of exploratory visualization techniques in the analysis and interpretation of PingER data with regard to internet performance between Asian countries and European countries provides optimal support for effective monitoring of the digital divide as well as proper decision making processes with regard to how to tackle the problem of the digital divide between countries situated in both continents. The most vital element in addition to gathering accurate the data is creation of an accurate supposition that would be helpful in any decision making process. In most cases, information visualization serves as an instrument for three distinct functions namely, communication, analysis and exploration (Shneiderman, 2013). When used as instrument for communication, information visualization aims to plainly and accurately relay complex notions to the viewer in a simpler manner that makes comprehension of such complex notions much easier. As for the analytic function, information visualization is employed as a tool for testing different suppositions, with an aim of drawing comparisons or contrasting different elements in order to generate key insights about a specific problem. The exploration function of information visualization is mostly employed as an instrument for generating key concepts that can lead to the formation of meaningful and helpful suppositions on data sets that are not well comprehended by their users. Exploratory visualization models are increasingly gaining relevance as the most important forms of information visualization across various facets of human life. A study on the exploratory visualization model for measuring the digital divide in Asian and European countries is quite helpful in generating meaningful insights about the exploratory visualization models. Such a study could also shade light on how exploratory visualization models could be effectively employed in generating suppositions and facts that could be useful in tackling a wide variety of problems across various facets of human existence. Motivation Exploratory visualization models are rapidly gaining relevance across a wide variety of professional fields owing to a dramatic increase in the complexity of raw data (Ellis and Mansmann, 2010). Espinosa and Money, 2013) placed lots of emphasis on the development of methodologies and systems that could be used in the visualization of huge volumes of data. Consequently, insignificant interest has been directed towards forming a deeper understanding on the general process of exploratory visualization and the practical effects of exploratory visualization models. There is a strong necessity for the development of a deeper understanding on the process of exploratory visualization. This is because the actual effectiveness of such exploratory visualization models are gaining relevance across different fields and as we look towards the future thus, are most likely to be used by more people across different facets of human life. Information visualization is a potent technique for the effective exploration of complex datasets especially because they amalgamate the loftier visual capabilities of human beings with the computational prowess of different technologies. If the development of exploratory visualization models as well as the effect of exploratory visualization models were comprehended on the level of the measurement of the digital divide between Asian countries and European countries, it could benefit all types of individuals and organizations. This is because the measurement of the digital divide between Asian countries and European countries is a classic example of the challenges posed by complex and extensive volumes of data. Consequently, exploratory visualization models can be effectively implemented to alleviate the challenges posed by complex extensive volumes of data. The findings of the study would be helpful in developing an extensive comprehension of exploratory visualization models. This is important because a deep understanding of exploratory visualization models can aid data analyst in acquiring key knowledge on how to use exploratory visualization to acquire an overview of the data they are working with. Also, exploratory visualization models provide organizational administrators and different groups of individuals tasked with the responsibility of making decision with the practical apparatus that enable them to comprehend the issues that they are dealing with and how best to tackle such issues hence the reason why it is quite important to gain an extensive understanding of exploratory visualization models. Problem statement The PingER project data repository is made up of millions of datasets collected since 1998. The huge volumes of complex data have raised several challenges with regard to the analysis and interpretation of different sets of data thereby forcing the administrators of the PingER project to implement various information visualization techniques chief among them with a goal of easing the burden of analysis and interpretation of the extensive and complex data. However despite noting all this interesting facts about the problem of the digital divide and the role of the PingER project in measuring the digital this study places more emphasis on the Exploratory visualization model used in measuring the digital divide between Asian countries and European countries in PingER. Exploratory visualization models are rapidly gaining relevance across a wide variety of professional fields owing to a dramatic rise in the complexity and extensiveness of raw data that are gathered by different individuals and entities on a daily basis. However, little interest has been directed towards forming a deeper understanding on the general process of exploratory visualization and the practical effects of exploratory visualization models. Thus there is a deep needed necessity for further studies on exploratory visualization methods. The measurement of the digital divide between Asian countries and European countries is the best example of a situation where extensive and complex volumes of data are involved (Ordu and Simsek, 2015). Consequently, the measurement of the digital divide between Asian countries and European countries forms a strong basis for developing an extensive and precise understanding of all key concepts of the development and implementation of exploratory visualization models. This study is highly effective in developing a generalised understanding of exploratory visualization models. This is because the measurement of the digital divide between Asian and European countries is not limited to outstanding exploratory visualization algorithms but takes into account the entire process of exploratory visualization modelling. Objective of the research The primary aim of this study is to develop an exploratory visualization model for measuring the digital divide between Asian and European countries with following key objectives in mind: To create a clear understanding of what the digital divide is and the importance of exploratory visualization model in measuring the digital divide between Asian countries and European countries.To establish an accurate process of exploratory visualization model implementation in the analysis and interpretation of complex huge volumes of data.To develop an exploratory visualization model for measuring the digital divide in Asian and European countries.To evaluate the developed exploratory visualization model Thesis approach and contributions The study sought to evaluate exploratory visualization model for measuring the digital divide in Asian countries and European countries. However the study directed its focus away from the problem of the digital divide towards the exploratory visualization model used in measuring the digital divide between Asian countries and European countries. In seeking to meet its objectives, vital information was sourced from two directions. Firstly, the general process of exploratory visualization was delineated through a comprehensive literature review which will resulted in the generation a fused exploratory visualization process model. Next, the exploratory visualization model was further put to the test through an in-depth analysis of PingER data for European countries and Asian countries. The analysis of the data obtained was done on the python platform and using several another analysis techniques including but not limited to regression analysis. The findings of the research were presented using various information visualizations techniques after which the findings were discussed in detail and finally, a clear outline of key recommendations was given. The primary contributions of this thesis include: The quantification of the digital divide between Asian countries and European countries- This study will provide an in-depth analysis of key matrices of PingER data with the chief purpose of quantifying the digital divide between Asia and Europe.Integration of data visualization in relaying information provided by PingER data- Considering that all pieces of data in PingER are presented in huge volume and often incomprehensible status the implementation of exploratory visualization processes in the analysis and interpretation of PingER data will generate key insights that will support decision making process aimed at tackling the decision making processes.Visualization in data quality space- By presenting the different categories of data sourced from the PingER project in a separate view, this project has the potential of making the interpretation of PingER data to be much easier.Development of a precise and accurate process model of exploratory visualization- Basing on relevant models and characterizations in previous literature, this study develops a comprehensive exploratory visualization model using PingER data with a goal of measuring the digital divide between Asian countries and European countries.Organization of the thesis This thesis commences with a clear definition of the digital divide and its relationship with internet performance and introduction of the concept of exploratory visualization and a clear outline of the reason why information visualization specifically exploratory visualization is an important process in generating key insights that support decision making process especially when the data involved is extensive and complex. In Chapter 2, all literary works that are related to the exploratory visualization model for measuring the digital divide between Asian and European countries are discussed. Chapter three offers in-depth insight into the methodology that was while carrying out the study. On the other hand, chapter provides an in-depth analysis of the available data and a clear presentation of the findings. Chapter five provides conclusions and recommendations for the implementation of the findings of the current research and also the recommendations for future studies. CHAPTER TWO LITERATURE REVIEW 2.1. Data Visualization In the recent past, the universe has observed a dramatic rise in the volumes of data that different organizations collect and process on a daily basis. Owing to the dramatic increase in data volumes, the volumes of data that are currently available across different information platforms including the internet have sky rocketed in recent times. Despite the fact that a huge percentage of data that are generated on a daily basis are often freely accessible to all interested users, the magnitude of the data often raises a lot of difficulties for interested users in terms of visualization of the data, exploration and ultimate utilization of such data sets (Van Der Aalst, 2016). The realization of the challenges raised by huge amounts of data as well as the acknowledgement of the value of data and appropriate data interpretation to scientific studies prompted different experts have successfully developed different techniques of processing huge volumes of data and presenting them in a manner that can be easily comprehended by any user (Ramakrishnan and Shahabi, 2014). Most notably, the advancement of Information Communication Technology (ICT) as enabled experts in the tech landscape to develop computer software which has the ability to process huge volumes of data and visualize them in an understandable manner. In the 21st century, each and every individual working within a particular entity whether it is a commercial entity or a non-profit is increasingly becoming dependent on insights generated from a wide variety of datasets that are collected by their organizations on a daily basis. The insights generated from such datasets help different individuals in making the right decisions, taking the right courses of action in tackling different problems and in overall operational efficiency. Since a huge proportion of the data collected by organizations are huge and complex, it is quite impossible to generate any meaningful insight without enlisting some kind of aid. So, the best possible way of generating meaningful insight from data and avoiding the possibility of missing key correlations entirely rests in in-depth innovative analysis of the available data as well as the use of easy to comprehend data visualizations. From a general context, data visualization entails a wide array of activities including design, advancement and use of graphical representations of processed data sets which are generated using certain recommended computer software. In most cases, data visualization generates actual representations of the data in question thus enabling the users of the data to realize the data analytics in visual forms that make it simpler for them to comprehend the data (Lewin and Singh, 2018). In summary, data visualization is helpful in the discovery of patterns, process of understanding the information presented by the data and the formation of opinions with regard to the data in question. The concept exploration of data using visualization was brought into the limelight nearly three decades ago by a renowned statistician known as Francis Anscombe. The statistician was able to design a quartet which is famously referred to as the Anscombe quartet. Anscombe’s achievements were a clear demonstration that complex datasets can easily be comprehended when presented in a graphical format. Several decades down the line, visual science has undergone tremendous evolution so much that there is currently no doubt regarding the effectiveness of data visualization in relaying information or elucidating complex datasets to a particular audience. However, it is important to note that data visualization can only be effective if the visualizations are attuned in the correct manner such that they can be able to exploit the brain’s detection capabilities (Bikakis, 2018). Proper data visualization tend to raise data comprehension speeds as well as the rate at which the information relayed by the data in question is understood especially because visual acuity makes use of the human eye which essentially speaking, has one of the greatest connection to the brain which is the central information processing point. Owing to the increase in the volume of data available online, the entire globe is currently witnessing a massive gush of attention directed towards data visualization and its ability to relay key pieces of information accurately and efficiently. While taking note of the rising interests directed towards data visualization in recent times, it is important to take note of the fact that a huge proportion of these newly found interests stems from the fact that data visualization is rapidly being acknowledged as a fundamental element in research communication. Despite the fact that the concept of data visualization is a relatively new element in research, it enables several researchers to analyse transform and display complex data sets. Healy (2018) takes note of the fact that the capabilities of data visualization are exceedingly valuable especially in the current information driven environment which is often characterised by massive complex datasets being generated on a daily basis. Lindquist asserts that data visualization is continuously emerging as the most valuable, sense-creating, analytical and information-relay tool for effectively apprehending and tackling the complexities that are presented by huge volumes of data. The foundation of data visualization rests is rooted in the fact that if the most appropriate data visualization is selected and implemented in the correct manner, it has the potential to accurately reveal the advancement and extensiveness of the underlying issues presented by the datasets in question and possible interventions to such issues while at same time, creating room for further exploration of the datasets in question (Williamson, 2016). 2.1.2. Standardization of data visualization processes in order to achieve the goals of data visualization The creation of accurate data visualizations is a complex process that demands optimum attention and precision. Data visualization is fundamentally a complex form of visual communication and similar to verbal communication, visual communication is largely dependent on semantics and structural accuracy. In this regard, the vitality of the need to fully comprehend the guidelines of proper data visualization development cannot be over-emphasised. It is also important to take note of the fact that there is a huge variation between graphical representation of data and effective visualization of huge complex datasets. Nearly two decades ago, a huge proportion of data scientists who had attempted to use data visualization did so in a poor and absurd manner (Tang and Li, 2018). Despite the fact that huge improvements have been recorded with regard to the use of data visualization in the recent past, renowned experts in the field of data visualization have recently highlighted the value of literacy in all data visualization techniques that have been developed in the recent past. Despite the fact that there is no universal consensus amongst data visualization practitioners regarding the doctrines of proper data visualization, there certain guidelines which are generally recognized as good practise in the field of data visualization. Currently, there are three generally accepted guiding principles related to the design of strong effective data visualizations. These guiding principles include the need comprehend the data, development of a comprehensive understanding what you intend to reveal in the visualization and finally, developing a clear and precise comprehension of your chosen visualization format in terms of its advantages and disadvantages. These guiding principles are elaborated as follows: ii. The need to comprehend the data The most vital dimension of comprehending data is the acquisition of accurate knowledge regarding associations or arrangements within the dataset in question. From a general perspective datasets are categorized into two distinct groups namely, discrete data and continuous data. Discrete data denotes discrete things that do not have any inherent pattern relative to each other. On the other hand, continuous data is characterised by a particular methodical arrangement. Visualization guidelines dictate that discrete and continuous data should be exhibited in different manners so as to ensure that their correlation can easily be identified. Considering the fact that continuous data is systematically interlinked, visual forms such as line graphs or family trees are most likely helpful in guiding viewers towards the discovery and comprehension of their relationships in a rapid manner (Dur, 2014). As for discrete data, they could be presented using pie charts or any other form of nominal or ordinal scales. It is also important to take not that the two aforementioned groups are the most renowned groups of data but recent times have seen the development of more distinctions of data types. Despite all the variations that exist in the newly developed data types the common factor that must always be considered when developing data visualizations is the ability to understand the various arrangements of the data as well as the relationships between such pieces of data. ii. Development of a comprehensive understanding what you intend to reveal During any data visualization process, one must always consider the demographics of the target audience as well as the intended purpose of the data visualization process. These two facts must always remain at the top of any data scientist’s mind. The aforementioned aspects tend to inform a data scientist’s ability to filter the representation of the whole dataset so as not to overwhelm his or her audience. The power of data visualization often stems from its ability to attract the attention and processing capabilities of the viewer as well as their usability traits (Steele and Iliinsky, 2011). However for data visualization to have strong focus, it is important to put the context of the viewer into consideration as well as his or her inspiration, level of attention and the time that he or she has. As a universal rule, data visualizations have to be kept simple while at same time allowing adequate room for a specific plot or chronological organization of information. iii. Developing a clear and precise comprehension of your chosen visualization format in terms of its advantages and disadvantages In recent times, the universe has observed an intense upsurge in journals seeking to highlight some of the universally accepted best data visualization practices. Such journals have often attempted to provide a step by step guide for the creation of highly effective data visualizations. While there so many rules that have been brought forward by renowned data visualization experts, the most important thing that data visualization designers must always keep in mind is that they must fully comprehend the strengths and weaknesses that different data visualization formats often present (Angus and Wiles, 2015). 2.2. Techniques for the data visualization There is no doubt that lots of truth lies in the statement that a single picture is worth a thousand words. The substance and reality of this statement is especially evidenced whenever any individual seeks to comprehend a certain dataset or perhaps, attempting to mine certain key discernments from a particular dataset. Considering the fact that in recent times a huge proportion of datasets often occur in large and complex states, visualizations are undoubtedly quite helpful whenever an individual or a group of people are attempting to ascertain the key associations that exist amongst thousands of variables within a particular dataset. In order to come up with meaningful data visualizations, there several basics that must be placed into consideration including the size of the data involved, the type of the data involved and the column composition of the dataset in question (Helfman and Goldberg, 2016). It is also important to take note that in today’s rapidly moving society, data visualizations have to be designed in a manner that facilitates quick delivery through different platforms that allow individuals to easily gain access to such visualizations and explore them on their own at their own desired times. Considering the fact that data occur in different forms and magnitude, data visualizations need to be developed based on the configurations of the data in question (Kosara, 2016). In this regard, it is important to take note of the fact that data visualization techniques occur in different forms some of which can only be applied to simple and small datasets and are often referred to as basic data visualizations and those that pertain to the visualization of large and complex data sets. Before embarking on a comprehensive study of data visualization techniques whether it is in the context of small and simple datasets or large and complex datasets, it is important to take note of the fact data visualization is not entirely similar to scientific visualization. Scientific visualization often makes use of animations, simulations and complex computer-generated graphics in creating visual designs of different configurations and procedures that processes that are not entirely visual in nature. On the other hand, data visualization tends to present and exhibit different sets of information in a manner that is inclined towards encouraging proper interpretations, assortments and associations within the dataset. Unlike scientific visualization, data visualization is known for its ability to utilize human capabilities in the recognition of patterns and the analysis of various trends that exist within the dataset in question while at the same time exploiting the human ability to retrieve significant volumes of information within the shortest time span (Shishkin and Skatkov, 2016). 2.2.1. Basic data visualization techniques Basic data visualization techniques are those that are often implemented when tacking small and simple datasets. The common basic data visualization techniques include: Line charts Line charts are often used to reveal the association that exist between two or more different variables and are often use to keep an eye on the trajectory of certain variations over a certain period of time. It is also important to note that the usefulness of line charts mostly rests in their ability to draw comparisons between numerous subjects or items over a particular time span (Tu and Chen, 2017). Bar charts In most cases, bar charts are used when drawing comparisons between the quantities of a wide variety of groups. The values of each group are epitomized using a single bar and the bars can be constituted as either vertically or horizontally with the span of each bar acting as a representation of its value. It is also important to note that a simple bar chat often comes in handy when values appear to be quite dissimilar so much that the variations between each bar can be discerned by the ordinary human eye. However, in an event when the bars may appear to be too close or perhaps, the data set is made up of huge numerals meaning that the bars need to be extremely huge, it would be quite difficult to employ the use of bar charts (Xu and Nandi, 2016). Scatter plots Scatter plots are two-dimensional plots that reveal the combined variations that exist between two data objects. In most cases, scatter plots are helpful when seeking to ascertain just how spread out a particular dataset might be. Pie charts In recent times, there have been so much discussions revolving around the usefulness of pie charts which are often used to draw comparisons between different sections of a whole. The discussions stem from the fact that pie charts are characterised by an elevated level of difficulty in terms of their interpretation especially because it is quite challenging for the human eye to estimate areas and compare visual angles. Figure 2.1 Common steps in data visualization Source: (Nayak and Lenka, 2016 ) 2.3 Exploratory visualization Exploratory visualization is the process of creating imagery and graphical utility of statistical components to aid in data presentation. Exploratory visualization is mostly used to showcase the geographical representation of the data analyzed to uncover the underlying relationship among the dataset have a presented. It is not obligatory to have preset statistical model to use the exploratory data visualization techniques since by its definition it has to provide more beyond the formal custom modeling or hypothesis testing task (Li, 2018). The growth of exploratory data visualization was channeled by the exploratory data analysis (EDA), which was championed by John Turkey in the mid-20th century around the year 1960. John had the aura that data analysis was the backbone of research and that it required to be the most considered part and provided the threshold attention. He envisioned that data had to be analyzed comprehensively with use of adequate technical resources which then led to the invention of open source programming languages; the S, S plus and the R, these developments led to the advancement in the data analytics through dissemination of robust statistics and nonparametric statistics, which were necessitated by the testing of median, mode, mean, standard deviation, deviation and quartiles successes whose findings were orthogonal to the primary analysis task (Cox, 2017). According John Turkey the main objectives of the EDA were to: Advocate for hypothetical causes and their phenomenal parametersEvaluate the condition under which the prevailing assumptions will be held.Evaluate the decision for selection criteria for appropriate data analytic tool.Provide for the basis of data collection and statistical procedures. The comprehensiveness of the EDA procedures set it at a different class compared to IDA; initial data analysis. 2.3.1. Extension of Exploratory Data Analysis (EDA) With the surge of machine language and development of data analytics, there is a void of technical services required to easily link the two. EDA is advancing as the time goes by, year after year since its first invention by its founder Sir John Turkey witnessed by the invention of numerous open sources programming languages championed by Python programming language which have the capacity to handle sophisticated data analytics requirements(Cox, 2017). Figure 2.31 data science process chart. From the chart above it is deducible that data visualization is the last step of data handling prior to make decision. This shows how data visualization holds a central position in the data handling and how it is dependent on the data processing, model and algorithm employed in the data processing. Also, from the chart it is deducible that the exploratory data analysis is the engine of the data processing with its fundamental interlinking with the prime parts of the chart; the data processing, data cleansing and lastly the models and algorithms employed. Data exploration analysis provides the data analyst the opportunity to: Verify the data, and ascertain the relationship that exist within the data setsTo check for unanticipated structures within the data that needs to be altered, removed or changed accordinglyTo ensure that the data process is governed by data-driven insights and not in any way motivated by assumptions of stakeholdersTo provide data-based context relating to the problem sighted for the data science procedures that can place the statistical outputs to its maximum use (Cox, 2017). EDA Methods From the definition of the EDA data science and its application on real world problems it can be divided into two major sections. Turkey’s work on exploratory data did not possess clear division within the structuring of the EDA, however with clear precision on the work, it is divided into the two major section; the first method as non-graphical or graphical while the second classification is based on univariate or multivariate ( majorly bivariate). The non-graphical method is purely based on computation of numbers and arithmetical mathematics while on the other hand the graphical data method is based on data presentation using appropriate visualization tool to have diagrammatic or pictorial way exhibition (Cox, 2017). EDA method chart Figure 2.32 EDA method The above chart displays the EDA methods interfaces and how parts and sub parts are interlinked from the descriptive statistics to their appropriate visualization. Dimensionality reduction and Cluster analysis for EDA and EDV methods Dimensionality reduction In the last 3 to 4 years more data was produced and recorded compared to data available for equal time periods in the entire world. These huge volumes of data chronicled at single times imply that data is being produced in more dimensions which are increasing day in day out (Reddy and Baker, 2020). This can be attributed to the surge use of online and digitalization technologies and software such as Facebook, WhatsApp, google among other which allows machines to interact with humans on daily lively routines. For example, the amount of data Facebook collects per minute is baggy considering that it has to stores all its user’s personal information, likes, comments and reselects most liked items to repost as favorites. Considering that slack amounts of data are configured by machines on timely intervals, it is necessary to have a system that sorts the data in such a way that only imperative data are left while superfluous data are wiped out to save storage spaces and computational time. As a result of the mind-boggling numbers of machine data, Dimensionality reduction is developed to assist in analyzing and presenting statistical inference through visualization. Why is Dimensionality Reduction required? Here are the benefits of applying Dimensionality Reduction on hefty data: As mentioned above, Dimensionality Reduction is based on sieving data allowing only pertinent data to be confined by the machine, thus saving storage space. It should be noted that voluminous data have rows and columns that are truly irrelevant considering the threshold of the target analysis objectives (Reddy and Baker, 2020).Reducing the scope of data, the machine is dealing with allows the machine to work fast thus saving also the computational time.Removing the extraneous data from a group data wipes out multicollinearity. The process of confiscating data is solemn based on deleting columns and rows with redundant information (Reddy and Baker, 2020). Dimensionality Reduction Techniques Figure 2.333 Dimensionality Reduction Techniques chart Missing Value Ratio: this method is used when the dataset has too many missing values. It is based on dropping variables with most missing values.Low Variance filter: this method is used when most variables in the data have constant variables from the dataset. As a result, only variables with low variance are wiped out.High Correlation filter; this method is based on selecting variables within the dataset with high multi-correlation and drop them accordingly.Random Forest; this is most used most of Dimensionality Reduction and unlike other methods of its kind, it is based on selecting variables based on their importance. The top important variables are maintained while the rest are removed (Becht and Newell, 2019).Factor Analysis: this method is used mostly when the variables in the dataset are highly correlated. It works by grouping variables based on their correlation with each other.Principal Component Analysis: the method is mostly used with linear data. It works by dividing the data into smaller components groups which are then analyzed accordingly.Independent Component Analysis: also used with linear data, but unlike PCA, Independent Component Analysis works by selecting independent variables within the dataset.ISOMAP; this techniques bests fits the non-linear data setst-SNE; this method just like ISOMAP best fits the non-linear data sets however it has improved visualization.UMAP; the method bests suit the highly dimensional data. Cluster analysis Clustering is a special type of unsupervised machine learning where variables within a provided data set are classified and grouped base on given similarities to. It is normally used to identify strata of similar objects in a multivariate data set so as to improve its meaning on generative features on the existing data set. Unsupervised machine learning bank on drawing of statistical reference from dataset comprising of input data with unlabeled responses (Granato and Maggio, 2018). The classification of data divides data into groups containing similar objects and dissimilar objects, thereby making it more statistically viable through data visualization. Why perform cluster analysis? As mentioned above, clustering is the process of grouping data based on their properties and objects into statistical groups which are based on shared similarities, consequently the intrinsic grouping of dataset will be highly treasured considering unlabeled data. It creates rooms for extracting value from large sets of structured and unstructured data and ease the process of finding the logical patterns existing within the dataset. Types of clustering analysis Partitioning algorithms: this method of clustering is based on dividing the datasets into K groups where K is the number of groups selected by the data analyst.Hierarchical clustering; unlike the partitioning algorithm, the data analyst does not have to select the desired number of grouping since it the Hierarchical clustering is grounded on tree modelling system of data basing on dendrogram selection.Fuzzy clustering; this type of clustering is more complex than the other types of clustering since it allows a member of a cluster to also be members of other clusters.Density-Based Methods; this method of clustering is based on the dense nature of cluster, implying that regions with dense cluster have more similarity compared to region with less dense (Kassambara, 2017). Importance of Exploratory visualization Regarding the voluminous amounts of data that are processed daily it is difficult to keep an accurate tract of data without using Exploratory visualization assistance. Human brain is a powerful tool which do have the ability to do computation problems, however it is affected by memory issues and tends to forget easily. Exploratory visualization therefore aids with ample visualization that enables the human brain to understand data trends much more improved than mere presentation of premeditated reports (Kumar and Johnson, 2020). Exploratory visualization matched with exploratory data analysis (EDA) forms a complete fleet of data sciences that comprehensively analyze and present data effectively. 2.4 Measuring the digital divide As mentioned earlier digital divide is defined as the dissimilarity in the provision of access to technological services whereas information sociologist passive the technological divide as an expression of numerous social, geographical, economical and informative divides. In simple terms it is the technical inequality among nations in the entire universe, with some better placed with internet services while others have poor internet related service. Researches done earlier which focused on digital divide were all centered on simple technical services surrounding the internet environment. As a result, these studies were narrowed to number of users capable of accessing internet, number of computers per given number of people (mostly per hundred), number of persons having computers per given geographical positions (mostly per a square mile), gauging the internet users basing on age, gender and educational backgrounds (Büchi and Latzer, 2016). Consequently, the results of analysis showed that some nations such as Japan, United states, Singapore, Taiwan and majority of the European counties had an improved accessed to internet services basing on the above-mentioned technical variables as compared to most third world countries in Africa, southern America and parts of Asia. On the other hand, this methods of measuring digital divide included the cost of accessing internet on the said countries. Internet service fee is a significant factor in determining the digital divide since the cost of having broadbands in the entire world varies from country to country. Technology divide has brought out as a phenomenon which covers three distinctive aspects namely; The global divide– This refers to the inequalities in access to Information Communication Technologies in a worldwide context, as in between countries or between different continents.The social divide- This refers to the inequalities in access to information Communication Technologies in a societal context, as in between different sections of a country’s communal organizationThe democratic divide– This refers to the variations between individuals who have access to Information Communication Technologies and those that do not have such privileges with respect to the inability to fully engage in public issues as much as those who have full access to ICTs (Grant and Eynon, 2017). This study focused on measuring digital divide using exploratory visualization for Asian and European countries. Unlike other previously done studies on digital divide amongst the two continents the study sought to employ techniques were basing on the five internet packets namely, the packet loss which basically happens when all the data packets sent fails to reach the intended destination, Round-trip time (RTT) which is the time its takes a browser to receive response from the server, TCP throughput which is the ideal trouble shooting time taken by a browser to rectify or to identify problem with the internet or the browser mostly measured in latency divided into UDP Throughput and TCP Throughput which are classified basing on impacts with latency, Out of Order Packets is another major chronic internet problem which happens when internet packets received by the browser are different from the one sent by the server, lastly the Duplicate Packets which normally happens when identical packets are sent by the server (Abdelsalam and Zampognaro, 2017). The five internet packets mentioned above are critical in provision of internet services. Digital divide is formed when other countries of geographical positions receives more improve internet services than others in other areas. Internet speed is determined by the closeness to the nearest server, which translate to the point that should an internet server be positioned at Africa then Africans will be experiencing a faster browsing efficiency as compared to other internet users positioned outside Africa, the longer the distance from Africa the poorer the internet speed (Kharat and Kulkarni, 2019). Servers have then been positioned on strategic geographical positions which ensure adequate speed for all, while the other internet packets have continued to affect internet activities differently. 2.5 Related work 2.5.1 Network Management Network management is process of managing the networks faults in a bid to increase its efficiency and create an error free network. To achieve this target, network management is based on four critical aspects namely: Faults identification. This is a simple way of finding faults in a network on network connection. It is done using tools such as the troubleshootingPerformance management. This is an ideal tool that measures internet performance in a computer using the OpManager and the network configuration management module.Network Provisioning. This entails measuring the load handled by a network and to determine necessary network trend (Ellsworth and Newcombe, 2018).Maintaining QoS in network management Network management is a crucial factor in improving internet access and therefore forms a solid core in digital divide creation. CHAPTER THREE METHODOLOGY 3.1. PingER Framework Nearly three decades ago, the world woke up to the realization that Internet performance is vastly linked to various key regional economic growth metrics. In response to the newly found realization, a deep-rooted necessity to develop an elaborate and precise framework for effective monitoring and comprehension of internet links performances across the globe. Driven by the necessity to monitor and develop a precise understanding of internet performance across various regions, the PingER project was born. The PingER project was brought into existence with a goal of figuring out various digital infrastructural inadequacies, reduced resource digital distributions and internet routing problems across various regions in order to generate possible solutions for the future. Various Internet performance metrics across the globe are obtained via the PingER framework which was established by the SLAC National Accelerator Laboratory based in the United States of America. The PingER project as a whole boasts of very extensive deployment levels across the globe. The deployment stats indicate that there are over 700 hosts distributed across more than 115 nations across the globe from hosts in over 30 nations. The 700 hosts correspond to over 2200 remote host pairs across the globes which are known to take an estimated value of 200,000 ping capacities every single day (Mal & Cottrell, 2016). 3.1.1. The PingER framework performance monitoring methodology The PingER framework is made up of three host types namely, monitoring host, remote hosts and archive hosts. Basically, monitoring host refers to a specific computer which contains software which is referred to as the PingER Monitoring Agent. At the moment the PingER project boast of 50 monitoring hosts spread across 23 nations in different regions across the globe. On the other hand, remote hosts typically refer to a collection of website servers that have stable uptime. Remote hosts are constantly monitored by 50 monitoring agents at designated steady intervals. Unlike monitoring hosts, remote hosts do not require any software but they must be pingable at all times from the monitoring agents. The PingER project currently boasts of almost 700 remote hosts which are effectively spread across 170 nations in different regions of the globe. Consequently, the PingER framework is made up of 10,000 Monitoring Agent remote hosts pairs spread across the globe and actively monitoring and measuring the performance of internet links (Pan, & Leslie, 2016). The PingER framework measures Internet performance at a regular thirty-minute interval. The measurement sequence is triggered from each monitoring agent by conveying a set of 100 bytes ping requests and 1 kilobyte ping requests to a particular group of remote hosts. Monitoring agents are designed to halt all processes of sending pings when it collects 10 ping replies from the remote hosts or when the overall ping request conveyed to the remote host hits 30. The PingER framework records data from each set of Ping. From a general context, raw data obtained from the PingER framework is made up of names and IP addresses of the Monitoring Agents and the target remote sites as well as the timestamps, packet sizes, minimum, average and maximum Round Trip Times (RTT) figures of the ping reactions and the individual ping Round Trip Times and their sequence values (Sampson & Cottrell, 2017). The archive host performs the vital role of retrieving all pieces of raw data collected by Monitoring Agents on a day to day basis. Fundamentally speaking, the Archive host is a term used to refer to central data storage warehouse located at SLAC headquarters. As things stand at the moment, data obtained by the PingER project through the PingER framework is the true definition of big and complex data. The volume of compressed PingER datasets is currently estimated to be over 60 GB and is made up of numerous flat files. The raw data obtained in PingER is processed to generate different key performance metrics which can be accessed by the general public through the PingER website. Figure 3.1. General build of the PingER framework 3.2. Data discovery and selection Big data is currently generating lots of unparalleled opportunities for commercial entities to accomplish comprehensive, quicker discernments that ultimately inform all their decision making processes with regard to the enhancement of client experiences and acceleration of their innovation speed. However, due to the extensiveness and complexity of big data, they cannot generate any meaningful value in their original state. Consequently, most organizations dealing with big and complex data have resorted to the utilization of visualization-based data discovery tools. Generally speaking, visualization-based data discovery tools enable users to integrate data from various sources and perform actual analytics that eventually tend to showcase viable results in a persuasive, collaborative and an easily comprehendible format. For analytics and visualization purposes big data is subdivided into three distinct classes of data namely; Descriptive data Descriptive data is the simplest form of data amongst the three-business analytics data. Ideally, basing on its simple raw nature up to 90% of the existing businesses in the world uses the descriptive data to identify the underlying trends in the data (Englander, 2012). The main purpose of the descriptive data is to discover out the ins and outs behind prized success or failure of the organizations in a given time period. It is basically used in answering the question ‘what had happened?’ it is important in deducing the way the corporation will take events focusing on its previous actions and their respective results. It should be noted that if some actions done by the corporation resulted in poor results then the management of the corporation will be keen on establishing and taking different actions that will necessitate better results. In machine leaning descriptive data can be used to speculate the previous action that were undertaken in the training of models in machine learning that effective failed in the testing model. Such methodologies of training data will not be used again in training data or will be reviewed (Shkedi, 2011). Predictive data Unlike the descriptive data which generally focused on the past the predictive data is centered on predicting what the future holds for such data in business analytics. It is centered on answering the question ‘what could happen in the future to data regarding the previous trends?’ Analyzing data successfully can enable the data analyzers to effectively and comprehensive predict the future (Crawford & Schultz, 2014). This can enable the business enterprises, corporation and interested organization to set real time realistic and achieve goals, allowing of operative scheduling and confining prospects. Predictive analytics can be used to study the data and ogle into the crystal ball problem solving tool (Hazen & Jones-Farmer, 2014). Prescriptive data The prescriptive data is an extension of the predictive data. It may not an excellent future predictor which can predict soccer and lottery winning numbers but good enough to suggest best decision to the businesses organization when it comes to decision making. The predictive data is centered on answering the question ‘what could happen in the future to data regarding the previous trends? , while the prescriptive data describes what should the business do, this is the most important aspect has it do contain the action that needs to be done to achieve the set results and goals (Chalamall, & Papotti, 2014). Unlike the previous business analytic data; the descriptive and the predictive data, the prescriptive data is not simple in nature and as a result very few companies and organization spread across globe uses it. Less than 7% of all businesses in the world uses the prescriptive data, since it is expensive and requires complex expertise (Soltanpoor & Sellis, 2016). Steps of data preprocessing From definition of data preprocessing, it is the process of matching data to machine use through data sorting, cleaning, editing, wrangling, data reduction, and data wrangling. Different data require different attention since they all varied combination some may have missing values, others varied formats while others may have duplicate data which spans to unrequired data. This implies that one has to keenly inspect data to realize the kind of treatment it requires (Alasadi and Bhaya, 2017). The data involved in this study is sourced from PingER containing at least five matrices namely; TCP Throughput, Packet Loss, Out of order packets, Duplicate Packets and Average round trip data whose each matrix contains data spanning from the year 2010 to 2020 for the Digital divide existing between the European and the Asian continents. The following are ways at which data is preprocessed; Data Quality AssessmentFeature AggregationFeature SamplingDimensionality ReductionFeature Encoding Data quality Assessment Ideally this the first them process in the data preprocessing. It involves the process of accessing the data to ascertain if it contains all the necessary information, while at the same time inspecting for: Missing values; from the data collected from the PingER website, each matrix data was inspected for missing values. Unfortunately, no rows or columns had missing values and should one have some, the column or the row would have been deleted or the value of the missing variable estimated.Inconsistent values; this implies data which are either too large, too small, or contains information that do not match the expected information on a particular row or column. Since data was sourced from a reputable website all values were consistent (Kahn and Liaw, 2016). Feature aggression The data from the Pinger website contains varied information spanning over a long period of time. As such unnecessary information can be included in the data. Feature aggression comes in as a handy tool which enables placing of aggregated values in order to increase data perspective. This eventually reduces data size and consequently save space and data processing time (ZHANG and Lei, 2019). Feature Sampling Almost similar to feature aggression, data from the PingER website is sampled narrowly to contained only the data that contains only required information such as only the data from the year 2010 to 2020, while selecting only significant rows and columns and contains relevant information for the study needs, thus saving space and data processing time. Dimensionality Reduction Similar to the above feature reduction and feature aggression, dimensionality reduction is aimed at reducing the space and the time taken for the data preprocessing. But unlike the feature reduction and feature aggression, it does not major on reducing the sizes and numbers of rows and columns involved but reducing the number of features involved in the data by mapping the higher-dimensional feature-space into lower-dimensional feature-space. Note that high dimension takes more planes to map output while lower takes much lesser values making easy to map a 2D image, aided by removal of irrelevant features and noise (Becht and Newell, 2019) Feature encoding The whole idea behind data preprocessing is to present correct and accurate data in a manner that is easily understood by machines. The data sourced from the PingER website was robust implying that was suitable for machines consumption. It only required minimal editing and encoding of data which achieved through ample selection of data for the five Matrices. Implementation of visualization model Visualization As mentioned in the previous chapters, visualization is the process of creating images, diagrams and animations that can convey message or a desired meaning. The importance of visualization is that it fastens and eases the process of understanding a given concept by aligning thoughts and ideas to the way the human brain works best. Visualization model Visualization is an important aspect in data science and is not only used to create diagrams and images but also used to manipulate data to create varied imageries from same data thus providing more sound meaning of the data analyzed. A model of visualization is used to provide this vivacious data manipulation, through provision of link amongst hypothesis and experiment and link amongst insight and revised hypothesis (Tappini et al 2019). Figure 3.3 Data Visualization During this process several types of data are involved such as: Control Data: this includes the data that triggers and pedals all the modules in the system.User Input/Output: This includes the system computer common input operations and tools such as the keyboard and the computer output tool such as screen and speakers for sound and sonification, which are then transformed to Metadata for the system modules.Internal Data: data already existing in the system.External Data: data can be transferred into the system.Storable Data: this includes data that can be stored, maintained and accessed within the system.Graphics Data: includes internal data that can be manipulated into graphical items (2D or 3D)Picture Data: data which have limited graphical primitives such as the 2D. A Visualization Technique module takes internal data sourced from the external model and transmutes it into a best matching Base Graphics System module. VISUALIZATION PROCESS Visualization of data is a process, which do involve some key steps namely: the data generation phase, data enrichment, mapping of the visualization, the rendering phase and the last step of the process the display phase. Figure 3.4 data visualization process From the five steps mentioned above, the first and the last stands outside the process of data visualization. The data from PingER is already generated, hence it is uploaded into the system of the data analysis tool.The second step; the data enrichment is basically an improvement on data cleansing where operation such as the Domain transformations, interpolation, sampling, and noise filtering are done and enhanced to aid computation.The third step is the central part of the visualization process where only the comprehensive and mutual inclusive data from PingER were mapped to visual primitives and attributes.The second last step in data visualization; the Rendering phase is the step where the primary images of the data are suited to graphics primitives and attributes, lighting operations and anti-aliasing filtering to create high end 3D images.The final step is displaying of the figured visualization with options of copy, storing and renaming. To suite the needs for the data, clustering analysis was selected for this study since it had the best properties to display the disparities that do exist amongst the European Countries and the Asian Countries based on digital divide. Evaluation Evaluation is the process of determining merit, worth and significance of a project, tool or utility. Evaluation of the visualization The selection and the execution of the visualization model and its processes is excellent. The need to exclude noise, duplicate, incorrect and inconsistent data ensures that the only precise data were included in the calculation and computation of the visualization. The data selection and inclusion from the PingER website is exact and exceptional since only data within the required time range of 2010 to 2020, essential European and Asian counties data on the only five matrices were selected. To add on the selection of the clustering analysis as the Visualization model is because the data was compact and well separated which fits computation under clustering since clusters are relatively scalable (Kassambara, 2017). List of reference Abdelsalam, A., & Zampognaro, F. (2017). TCP Wave: A new reliable transport approach for future internet. Computer Networks, 112, 122-143. Angus, D. and Wiles, J., 2015. Acquired codes of meaning in data visualization and infographics: Beyond perceptual primitives. IEEE transactions on visualization and computer graphics, 22(1), pp.509-518. Balliet, R.N. and Heimlich, J., 2016. Investigating aspects of data visualization literacy using 20 information visualizations and 273 science museum visitors. Information Visualization, 15(3), pp.198-213. Becht, E., & Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology, 37(1), 38-44. Bikakis, N., 2018. Big data visualization tools. arXiv preprint arXiv:1801.08336. Büchi, M., & Latzer, M. (2016). Modeling the second-level digital divide: A five-country study of social differences in Internet use. New media & society, 18(11), 2703-2722. Cox, V. (2017). Exploratory data analysis. In Translating Statistics to Make Decisions (pp. 47-74). Apress, Berkeley, CA. Dur, B.I.U., 2014. Data visualization and infographics in visual communication design education at the age of information. Journal of Arts and Humanities, 3(5), pp.39-50. Ellis, G. and Mansmann, F., 2010. Mastering the information age: solving problems with visual analytics. Ellsworth, J.L., and Newcombe, C.R., Amazon Technologies Inc, 2018. Forward-based resource delivery network management techniques. U.S. Patent 9,893,957. Espinosa, J.A. and Money, W., 2013, January. Big data: Issues and challenges moving forward. In 2013 46th Hawaii International Conference on System Sciences (pp. 995-1004). IEEE. Granato, D., & Maggio, R. M. (2018). Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective. Trends in Food Science & Technology, 72, 83-90. Grant, L., & Eynon, R. (2017). Digital divides and social justice in technology-enhanced learning. In Technology Enhanced Learning (pp. 157-168). Springer, Cham. Healy, K., 2018. Data visualization: a practical introduction. Princeton University Press. Helfman, J. and Goldberg, J., Oracle International Corp, 2016. Filtering for data visualization techniques. U.S. Patent 9,477,732. Hoeber, O., 2018, March. Information visualization for interactive information retrieval. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval (pp. 371-374). Howe, B. and Heer, J., 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics, 22(1), pp.649-658. Kassambara, A. (2017). Practical guide to cluster analysis in R: Unsupervised machine learning (Vol. 1). Sthda. Kharat, P., & Kulkarni, M. (2019). Congestion controlling schemes for high-speed data networks: A survey. Journal of High Speed Networks, 25(1), 41-60. Kiely, D. and Salazar, S., 2018. Falling Through the Net: The Digital Divide in Western Australia (No. FWA11). Bankwest Curtin Economics Centre (BCEC), Curtin Business School. Kosara, R., 2016. Presentation-oriented visualization techniques. IEEE computer graphics and applications, 36(1), pp.80-85. Kumar, A., & Johnson, A. (2020, May). Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 590-599). Lewin, B.A. and Singh, A.K., New BIS Safe Luxco SARL, 2018. Methods, apparatus and systems for data visualization and related applications. U.S. Patent 9,870,629. Li, J. (2018). An Exploratory Analysis of Individual Long-Term Google Search and Browsing History. Nayak, G.K. and Lenka, R.K., 2016, December. Big data visualization: Tools and challenges. In 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (pp. 656-660). IEEE. Ordu, M.D. and Simsek, B., 2015. Examining the global digital divide: a cross-country analysis. Communications of the IBIMA, 2015, p.1. Plaisant, C. and Carpendale, S., 2011. Empirical studies in information visualization: Seven scenarios. IEEE transactions on visualization and computer graphics, 18(9), pp.1520-1536. Ramakrishnan, R. and Shahabi, C., 2014. Big data and its technical challenges. Communications of the ACM, 57(7), pp.86-94. Reddy, G. T.,& Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776-54788. Shishkin, Y.E. and Skatkov, A.V., 2016. Big Data visualization in decision making. In Science in Progress (pp. 203-205). Shneiderman, B., 2013. Improving healthcare with interactive visualization. Computer, 46(5), pp.58-66. Steele, J. and Iliinsky, N., 2011. Designing data visualizations. O’Reilly Media, Inc.. Tang, N. and Li, G., 2018, April. Deepeye: Towards automatic data visualization. In 2018 IEEE 34th International Conference on Data Engineering (ICDE) (pp. 101-112). IEEE. Tu, C. and Chen, B., 2017. Is there a robust technique for selecting aspect ratios in line charts?. IEEE transactions on visualization and computer graphics, 24(12), pp.3096-3110. Van Der Aalst, W., 2016. Data science in action. In Process mining (pp. 3-23). Springer, Berlin, Heidelberg. Van Dijk, J.A., 2017. Digital divide: Impact of access. The international encyclopedia of media effects, pp.1-11. Wenwei, L. and Fang, C., 2018. Bridging the digital divide: measuring digital literacy. Economics: The Open-Access, Open-Assessment E-Journal, 12(2018-23), pp.1-20. White, B. and Cottrell, L., 2016, January. Analysis and clustering of PingER network data. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence) (pp. 268-273). IEEE. Williamson, B., 2016. Digital education governance: data visualization, predictive analytics, and ‘real-time’policy instruments. Journal of Education Policy, 31(2), pp.123-141. Xu, L. and Nandi, A., 2016. Graphical perception in animated bar charts. arXiv preprint arXiv:1604.00080. Alasadi, S. A., & Bhaya, W. S. (2017). Review of data preprocessing techniques in data mining. Journal of Engineering and Applied Sciences, 12(16), 4102-4107. Aldana, C.H.,and Shukla, A.K., Qualcomm Inc, 2015. Methods and systems for enhanced round trip time (RTT) exchange. U.S. Patent 9,154,971. Angori, L., Didimo, W., Montecchiani, F., Pagliuca, D., & Tappini, A. (2019, September). ChordLink: A new hybrid visualization model. In International Symposium on Graph Drawing and Network Visualization (pp. 276-290). Springer, Cham. Becht, E., & Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology, 37(1), 38-44. Chalamalla, A., & Papotti, P. (2014, June). Descriptive and prescriptive data cleaning. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 445-456). Crawford, K., & Schultz, J. (2014). Big data and due process: Toward a framework to redress predictive privacy harms. BCL Rev., 55, 93. García, S., & Herrera, F. (2016). Big data preprocessing: methods and prospects. Big Data Analytics, 1(1), 9. Hazen, B. T.,& Jones-Farmer, L. A. (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics, 154, 72-80. Kahn, M. G., & Liaw, S. T. (2016). A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. Egems, 4(1). Kassambara, A. (2017). Practical guide to cluster analysis in R: Unsupervised machine learning (Vol. 1). Sthda. Liu, D., & Liu, Y. (2015). Duplicate detectable opportunistic forwarding in duty-cycled wireless sensor networks. IEEE/ACM Transactions On Networking, 24(2), 662-673. Mal, A., & Cottrell, L. (2016, January). Analysis and clustering of PingER network data. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence) (pp. 268-273). IEEE. Pan, A.,& Leslie, R. (2016, January). Application for the emulation of PingER on android devices. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence) (pp. 537-541). IEEE. Quevedo, D. E.,& Nesic, D. (2011). Packetized predictive control of stochastic systems over bit-rate limited channels with packet loss. IEEE Transactions on Automatic Control, 56(12), 2854-2868. Sampson, R.,& Cottrell, L. (2017, January). Implementation of pinger on android. In 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence (pp. 306-312). IEEE. Shkedi, R.,2011. Method and stored program for accumulating descriptive profile data along with source information for use in targeting third-party advertisements. U.S. Patent 7,979,307. Soltanpoor, R., & Sellis, T. (2016, September). Prescriptive analytics for big data. In Australasian Database Conference (pp. 245-256). Springer, Cham. ZHANG, X., & Lei, Z. H. U. (2019, June). Fast salient object detection based on multi-scale feature aggression. In 2019 Chinese Control And Decision Conference (CCDC) (pp. 5734-5738). IEEE.
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS