Machine Learning On Big Data Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance

MACHINE LEARNING ON BIG DATA: OPPORTUNITIES AND CHALLENGES - FUTURERESEARCH DIRECTION FOR PHDSCHOLARS An Academic presentation by Dr. Nancy Agnes, Head, Technical Operations, Phdassistance Group www.phdassistance.com Email:info@phdassistance.com

TODAY'SDISCUSSION Outline In-brief Introduction Machine learning Bigdata Data preprocessing opportunities and challenges Evaluation opportunities andchallenges Future research Conclusion

In-Brief Machine Learning(ML)is rapidly used in a variety of applications. It has risen to prominence in recent years, owing in part to the emergence of big data. When it comes tobigdata,MLalgorithmshaveneverbeenmorepromising.Bigdataallowsmachine learningalgorithmsto discover finer-grained patterns and make more timely and precise predictions than ever before; however, it also poses significant challenges to machine learning, such as model scalability and distributedcomputing.

INTRODUCTION In various fields as computer vision, speech recognition, natural language comprehension, neuroscience, fitness, and the Internet of Things, ML techniques have had enormous societalimpacts. The emergence of the era of big data has stirred upinterest in Machine LearningBig Datahas never promised or questioned machine learning algorithms to gain new insights into a variety of business applications and human behaviours. Contd...

On the one hand, big data provides ML algorithms with unparalleled amounts of data from which to derive underlying patterns and create predictive models; on the other hand, conventional ML algorithms face crucial challenges such as scalability in order to fully unlock the value of bigdata. With the ever-expanding world of big data, ML must develop and grow in order to turn big data into actionableintelligence. Contd...

ML aims to answer the question of how to build a computer system that improves itself overtime. The problem of learning from experience with respect to certain tasks and performance metrics is referred to as an MLproblem. Users mayuseML techniquesto deduce underlying structure and make predictions from largedatasets. Contd...

ML thrives on strong computational environments, efficient learning techniques (algorithms), and rich and/or largedata. As a result, ML has a lot of potential and is an essential part of big dataanalytics

Fig. 1. A Framework of machine learning on big data(MLBid)

MACHINE LEARNING Data pre-processing, learning, and assessment are common stages of MachineLearning. Data pre-processingaids in the transformation of raw data into the "right form" for further learningsteps. Via data cleaning, extraction, transformation, and fusion, the pre-processing phase transforms such data into a form that can be used as inputs tolearning. Contd...

Using the pre-processed input data, the learning step selects learning algorithms and tunes model parameters to produce desiredoutputs. Data pre-processing can be done with some learning methods, especially representationallearning. After that, the trained models are evaluated to see how well theydo. The essence of learning input, the goal of learning activities, and the timing of data availability are all characteristics of machinelearning. Contd...

ML can be divided into three major categories based on the quality of the input available to a learning system: supervised learning, unsupervised learning, and reinforcement learning. ML can be divided into two types: representational learning and task learning, depending on whether the learning goal is to learn particular tasks using input features or to learn the featuresthemselves. Each Machine LearningAlgorithmcan be classified in a variety ofways.

Fig. 2. A multi-dimensional taxonomy of machinelearning

BIGDATA Volume, velocity, variety, veracity, and value are the five dimensions of bigdata. Starting from the bottom, we organised the five dimensions into a stack of high, data, and valuelayers. The data layer is integral to big data, and the meaning factor characterises the influence of big data real-worldapplications. Contd...

The lower layer is more reliant on technical advancements, while the higher layer is more focused on applications that leverage big data's strategicstrength. Established machine learning paradigms and algorithms must be modified to understand the potential of big data analytics and to process big dataefficiently. We recognise key opportunities and challenges in thissection. We go through them individually for each of the three phases of machine learning: preprocessing, learning, andassessment. Contd...

Fig. 3. Big datastack

DATAPREPROCESSING OPPORTUNITIES AND CHALLENGES DATAREDUNDANCY When two or more data samples represent the same object, duplicationoccurs. Data replication or inconsistency can have a significant impact on machinelearning. Traditional methods such as pairwise similarity comparison are no longer feasible for big data, despite a variety of techniques for detecting duplicates produced in the last 20years. Contd...

Furthermore, the conventional presumption that duplicated pairs are rarer than non-duplicated pairs is no longertrue. Dynamic Time Warping can be much faster than current Euclidean distance algorithms in thisregard DATA HETEROGENEITY Big data promises to include multi-view data from a variety of repositories, in a variety of formats, and from a variety of population samples, and thus is highly heterogeneous. Contd...

The value of these multi-view heterogeneous data. As a result, combining all of the characteristics and treating them equally relevant is unlikely to result in optimal learningoutcomes. Big data offers the possibility of simultaneously learning from different views and then assembling multiple findings by learning the relevance of feature views to thetask. The approach is supposed to be resistant to data outliers and to be able to solve optimization and convergenceproblems. Contd...

DATA DISCRETIZATION However, most current discretization dealing with large amounts ofdata. methods would be ineffective when Traditional discretization approaches have been parallelized in big data platforms to solve big data problems, with a distributed variant of the entropy minimization discretizer based on the Minimum Description Length Principle improving both efficiency andaccuracy. Contd...

DATA LABELLING Active learning can be used as an optimization technique for marking activities in crowd-sourced databases, reducing the number of questions posed to the crowd and enabling crowd-sourced applications toscale. Designing active LearningAlgorithmsfor a crowd-sourced dataset, on the other hand, presents a number of practical challenges, including generality, scalability, andusability. Another problem is that such a dataset cannot cover all user-specific contexts, resulting in output that is often inferior to user-centrictraining. Contd...

IMBALANCED DATA Traditional stratified random sampling approaches have tackled the problem of unbalanceddata. However, if iterations of sub-sample generation and error metrics measurement are needed, the process can take a longtime. Furthermore, conventional sampling methods are unable to support data sampling over a user-specified subset of data that includes value-based samplingefficiently. Parallel data sampling is needed by bigdata.

FUTURE RESEARCH This paper provides a summary of the benefits and drawbacks of machine learning on bigdata. Big data poses new possibilities for inspiring revolutionary and novel ML technologies to solve many associated technological problems and generate real-world impacts, while also posing multiple challenges for conventional ML in terms of scalability, adaptability, andusability. Contd...

These opportunities and challenges can be used to evaluate current research in thisfield. According to the components of the MLBiD system, we also highlight someopen Researchissues in ML on big data, as shown inTable.

CONCLUSION In conclusion, machine learning is needed to address the challenges faced by big data and to discover hidden patterns, information, and insights from big data in order to transform its potential into real value for business decision-making and scientificexploration. The combination of machine learning and big data points to a bright future in a modernfrontier.

ContactUs UNITEDKINGDOM +44-1143520021 INDIA +91-4448137070 EMAIL info@phdassistance.com

Machine Learning On Big Data Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance