1 option
Mastering Java machine learning : mastering and implementing advanced techniques in machine learning / Dr. Uday Kamath, Krishna Choppella.
- Format:
- Book
- Author/Creator:
- Kamath, Uday, author.
- Choppella, Krishna, author.
- Language:
- English
- Subjects (All):
- Machine learning.
- Java (Computer program language).
- Physical Description:
- 1 online resource (556 pages) : illustrations
- Edition:
- 1st ed.
- Place of Publication:
- Birmingham, [England] ; Mumbai, [India] : Packt, 2017.
- Biography/History:
- Kamath Uday: Dr. Uday Kamath is the chief data scientist at BAE Systems Applied Intelligence. He specializes in scalable machine learning and has spent 20 years in the domain of AML, fraud detection in financial crime, cyber security, and bioinformatics, to name a few. Dr. Kamath is responsible for key products in areas focusing on the behavioral, social networking and big data machine learning aspects of analytics at BAE AI. He received his PhD at George Mason University, under the able guidance of Dr. Kenneth De Jong, where his dissertation research focused on machine learning for big data and automated sequence mining. Choppella Krishna: Krishna Choppella builds tools and client solutions in his role as a solutions architect for analytics at BAE Systems Applied Intelligence. He has been programming in Java for 20 years. His interests are data science, functional programming, and distributed computing.
- Summary:
- Become an advanced practitioner with this progressive set of master classes on application-oriented machine learningKey Features[*] Comprehensive coverage of key topics in machine learning with an emphasis on both the theoretical and practical aspects[*] More than 15 open source Java tools in a wide range of techniques, with code and practical usage.[*] More than 10 real-world case studies in machine learning highlighting techniques ranging from data ingestion up to analyzing the results of experiments, all preparing the user for the practical, real-world use of tools and data analysis.Book DescriptionJava is one of the main languages used by practicing data scientists; much of the Hadoop ecosystem is Java-based, and it is certainly the language that most production systems in Data Science are written in. If you know Java, Mastering Machine Learning with Java is your next step on the path to becoming an advanced practitioner in Data Science. This book aims to introduce you to an array of advanced techniques in machine learning, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning. Accompanying each chapter are illustrative examples and real-world case studies that show how to apply the newly learned techniques using sound methodologies and the best Java-based tools available today. On completing this book, you will have an understanding of the tools and techniques for building powerful machine learning models to solve data science problems in just about any domain. What you will learn[*] Master key Java machine learning libraries, and what kind of problem each can solve, with theory and practical guidance.[*] Explore powerful techniques in each major category of machine learning such as classification, clustering, anomaly detection, graph modeling, and text mining.[*] Apply machine learning to real-world data with methodologies, processes, applications, and analysis.[*] Techniques and experiments developed around the latest specializations in machine learning, such as deep learning, stream data mining, and active and semi-supervised learning.[*] Build high-performing, real-time, adaptive predictive models for batch- and stream-based big data learning using the latest tools and methodologies.[*] Get a deeper understanding of technologies leading towards a more powerful AI applicable in various domains such as Security, Financial Crime, Internet of Things, social networking, and so on.Who this book is forThis book will appeal to anyone with a serious interest in topics in Data Science or those already working in related areas: ideally, intermediate-level data analysts and data scientists with experience in Java. Preferably, you will have experience with the fundamentals of machine learning and now have a desire to explore the area further, are up to grappling with the mathematical complexities of its algorithms, and you wish to learn the complete ins and outs of practical machine learning.
- Contents:
- Cover
- Copyright
- Credits
- Foreword
- About the Authors
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Machine Learning Review
- Machine learning - history and definition
- What is not machine learning?
- Machine learning - concepts and terminology
- Machine learning - types and subtypes
- Datasets used in machine learning
- Machine learning applications
- Practical issues in machine learning
- Machine learning - roles and process
- Roles
- Process
- Machine learning - tools and datasets
- Datasets
- Summary
- Chapter 2: Practical Approach to Real-World Supervised Learning
- Formal description and notation
- Data quality analysis
- Descriptive data analysis
- Basic label analysis
- Basic feature analysis
- Visualization analysis
- Univariate feature analysis
- Multivariate feature analysis
- Data transformation and preprocessing
- Feature construction
- Handling missing values
- Outliers
- Discretization
- Data sampling
- Is sampling needed?
- Undersampling and oversampling
- Training, validation, and test set
- Feature relevance analysis and dimensionality reduction
- Feature search techniques
- Feature evaluation techniques
- Filter approach
- Wrapper approach
- Embedded approach
- Model building
- Linear models
- Linear Regression
- Naïve Bayes
- Logistic Regression
- Non-linear models
- Decision Trees
- K-Nearest Neighbors (KNN)
- Support vector machines (SVM)
- Ensemble learning and meta learners
- Bootstrap aggregating or bagging
- Boosting
- Model assessment, evaluation, and comparisons
- Model assessment
- Model evaluation metrics
- Confusion matrix and related metrics
- ROC and PRC curves
- Gain charts and lift curves
- Model comparisons
- Comparing two algorithms
- Comparing multiple algorithms.
- Case Study - Horse Colic Classification
- Business problem
- Machine learning mapping
- Data analysis
- Label analysis
- Features analysis
- Supervised learning experiments
- Weka experiments
- RapidMiner experiments
- Results, observations, and analysis
- References
- Chapter 3: Unsupervised Machine Learning Techniques
- Issues in common with supervised learning
- Issues specific to unsupervised learning
- Feature analysis and dimensionality reduction
- Notation
- Linear methods
- Principal component analysis (PCA)
- Random projections (RP)
- Multidimensional Scaling (MDS)
- Nonlinear methods
- Kernel Principal Component Analysis (KPCA)
- Manifold learning
- Clustering
- Clustering algorithms
- k-Means
- DBSCAN
- Mean shift
- Expectation maximization (EM) or Gaussian mixture modeling (GMM)
- Hierarchical clustering
- Self-organizing maps (SOM)
- Spectral clustering
- Affinity propagation
- Clustering validation and evaluation
- Internal evaluation measures
- External evaluation measures
- Outlier or anomaly detection
- Outlier algorithms
- Statistical-based
- Distance-based methods
- Density-based methods
- Clustering-based methods
- High-dimensional-based methods
- One-class SVM
- Outlier evaluation techniques
- Supervised evaluation
- Unsupervised evaluation
- Real-world case study
- Tools and software
- Data collection
- Data sampling and transformation
- Observations on feature analysis and dimensionality reduction
- Clustering models, results, and evaluation
- Observations and clustering analysis
- Outlier models, results, and evaluation
- Chapter 4: Semi-Supervised and Active Learning
- Semi-supervised learning.
- Representation, notation, and assumptions
- Semi-supervised learning techniques
- Self-training SSL
- Co-training SSL or multi-view SSL
- Cluster and label SSL
- Transductive graph label propagation
- Transductive SVM (TSVM)
- Case study in semi-supervised learning
- Datasets and analysis
- Experiments and results
- Active learning
- Representation and notation
- Active learning scenarios
- Active learning approaches
- Uncertainty sampling
- Version space sampling
- Query by disagreement (QBD)
- Advantages and limitations
- Data distribution sampling
- How does it work?
- Case study in active learning
- Data Collection
- Models, results, and evaluation
- Pool-based scenarios
- Stream-based scenarios
- Analysis of active learning results
- Chapter 5: Real-Time Stream Machine Learning
- Assumptions and mathematical notations
- Basic stream processing and computational techniques
- Stream computations
- Sliding windows
- Sampling
- Concept drift and drift detection
- Data management
- Partial memory
- Full memory
- Detection methods
- Adaptation methods
- Incremental supervised learning
- Modeling techniques
- Linear algorithms
- Non-linear algorithms
- Ensemble algorithms
- Validation, evaluation, and comparisons in online setting
- Model validation techniques
- Incremental unsupervised learning using clustering
- Partition based
- Hierarchical based and micro clustering
- Density based
- Grid based.
- Validation and evaluation techniques
- Unsupervised learning using outlier detection
- Partition-based clustering for outlier detection
- Inputs and outputs
- Distance-based clustering for outlier detection
- Validation and evaluation techniques
- Case study in stream learning
- Clustering experiments
- Outlier detection experiments
- Analysis of stream learning results
- Chapter 6: Probabilistic Graph Modeling
- Probability revisited
- Concepts in probability
- Conditional probability
- Chain rule and Bayes' theorem
- Random variables, joint, and marginal distributions
- Marginal independence and conditional independence
- Factors
- Distribution queries
- Graph concepts
- Graph structure and properties
- Subgraphs and cliques
- Path, trail, and cycles
- Bayesian networks
- Representation
- Definition
- Reasoning patterns
- Independencies, flow of influence, D-Separation, I-Map
- Inference
- Elimination-based inference
- Propagation-based techniques
- Sampling-based techniques
- Learning
- Learning parameters
- Learning structures
- Markov networks and conditional random fields
- Parameterization
- Independencies
- Conditional random fields
- Specialized networks
- Tree augmented network
- Input and output
- Markov chains
- Hidden Markov models
- Most probable path in HMM
- Posterior decoding in HMM
- Tools and usage
- OpenMarkov.
- Weka Bayesian Network GUI
- Case study
- Feature analysis
- Analysis of results
- Chapter 7: Deep Learning
- Multi-layer feed-forward neural network
- Inputs, neurons, activation function, and mathematical notation
- Multi-layered neural network
- Structure and mathematical notations
- Activation functions in NN
- Training neural network
- Limitations of neural networks
- Vanishing gradients, local optimum, and slow training
- Deep learning
- Building blocks for deep learning
- Rectified linear activation function
- Restricted Boltzmann Machines
- Autoencoders
- Unsupervised pre-training and supervised fine-tuning
- Deep feed-forward NN
- Deep Autoencoders
- Deep Belief Networks
- Deep learning with dropouts
- Sparse coding
- Convolutional Neural Network
- CNN Layers
- Recurrent Neural Networks
- Basic data handling
- Multi-layer perceptron
- Convolutional Network
- Variational Autoencoder
- DBN
- Parameter search using Arbiter
- Results and analysis
- Chapter 8: Text Mining and Natural Language Processing
- NLP, subfields, and tasks
- Text categorization
- Part-of-speech tagging (POS tagging)
- Text clustering
- Information extraction and named entity recognition
- Sentiment analysis and opinion mining
- Coreference resolution
- Word sense disambiguation
- Machine translation
- Semantic reasoning and inferencing
- Text summarization
- Automating question and answers
- Issues with mining unstructured data
- Text processing components and transformations.
- Document collection and standardization.
- Notes:
- Includes bibliographical references at the end of each chapters and index.
- Description based on online resource; title from PDF title page (ebrary, viewed August 11, 2017).
- ISBN:
- 1-78588-855-2
- OCLC:
- 994715866
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.