2 options
Java data science cookbook : explore the power of MLlib, DL4j, Weka, and more / Rushdi Shams.
- Format:
- Book
- Author/Creator:
- Shams, Rushdi, author.
- Language:
- English
- Subjects (All):
- Java (Computer program language).
- Physical Description:
- 1 online resource (366 pages)
- Edition:
- 1st edition
- Place of Publication:
- Birmingham, [England] ; Mumbai, [India] : Packt, 2017.
- System Details:
- text file
- Biography/History:
- Shams Rushdi: Rushdi Shams has a Ph. D. on Application of machine learning in Natural Language Processing (NLP) problem areas from Western University, Canada. Before starting work as a machine learning and NLP specialist in the industry, he was engaged in teaching undergrad and grad courses. He has been successfully maintaining his YouTube channel named "Learn with Rushdi" for learning computer technologies.
- Summary:
- Recipes to help you overcome your data science hurdles using Java About This Book This book provides modern recipes in small steps to help an apprentice cook become a master chef in data science Use these recipes to obtain, clean, analyze, and learn from your data Learn how to get your data science applications to production and enterprise environments effortlessly Who This Book Is For This book is for Java developers who are familiar with the fundamentals of data science and want to improve their skills to become a pro. What You Will Learn Find out how to clean and make datasets ready so you can acquire actual insights by removing noise and outliers Develop the skills to use modern machine learning techniques to retrieve information and transform data to knowledge. retrieve information from large amount of data in text format. Familiarize yourself with cutting-edge techniques to store and search large volumes of data and retrieve information from large amounts of data in text format Develop basic skills to apply big data and deep learning technologies on large volumes of data Evolve your data visualization skills and gain valuable insights from your data Get to know a step-by-step formula to develop an industry-standard, large-scale, real-life data product Gain the skills to visualize data and interact with users through data insights In Detail If you are looking to build data science models that are good for production, Java has come to the rescue. With the aid of strong libraries such as MLlib, Weka, DL4j, and more, you can efficiently perform all the data science tasks you need to. This unique book provides modern recipes to solve your common and not-so-common data science-related problems. We start with recipes to help you obtain, clean, index, and search data. Then you will learn a variety of techniques to analyze, learn from, and retrieve information from data. You will also understand how to handle big data, learn deeply from data, and visualize data. Finally, you will work through unique recipes that solve your problems while taking data science to production, writing distributed data science applications, and much more - things that will come in handy at work. Style and approach This book contains short yet very effective recipes to solve most common problems. Some recipes cater to very specific, rare pain points. The recipes cover different data sets and work very closely to real production environments
- Contents:
- Cover
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Obtaining and Cleaning Data
- Introduction
- Retrieving all filenames from hierarchical directories using Java
- Getting ready
- How to do it…
- Retrieving all filenames from hierarchical directories using Apache Commons IO
- Reading contents from text files all at once using Java 8
- Reading contents from text files all at once using Apache Commons IO
- Extracting PDF text using Apache Tika
- Cleaning ASCII text files using Regular Expressions
- Parsing Comma Separated Value (CSV) Files using Univocity
- Parsing Tab Separated Value (TSV) file using Univocity
- Parsing XML files using JDOM
- Writing JSON files using JSON.simple
- Reading JSON files using JSON.simple
- How to do it …
- Extracting web data from a URL using JSoup
- Extracting web data from a website using Selenium Webdriver
- Reading table data from a MySQL database
- Chapter 2: Indexing and Searching Data
- Indexing data with Apache Lucene
- How it works…
- Searching indexed data with Apache Lucene
- Chapter 3: Analyzing Data Statistically
- Generating descriptive statistics
- Generating summary statistics
- Generating summary statistics from multiple distributions
- There's more….
- Computing frequency distribution
- Counting word frequency in a string
- Counting word frequency in a string using Java 8
- Computing simple regression
- Computing ordinary least squares regression
- Computing generalized least squares regression
- Calculating covariance of two sets of data points
- Calculating Pearson's correlation of two sets of data points
- Conducting a paired t-test
- Conducting a Chi-square test
- Conducting the one-way ANOVA test
- Conducting a Kolmogorov-Smirnov test
- Chapter 4: Learning from Data - Part 1
- Creating and saving an Attribute-Relation File Format (ARFF) file
- Cross-validating a machine learning model
- Classifying unseen test data
- Classifying unseen test data with a filtered classifier
- Generating linear regression models
- Generating logistic regression models
- Clustering data points using the KMeans algorithm
- Clustering data from classes
- Learning association rules from data
- Selecting features/attributes using the low-level method, the filtering method, and the meta-classifier method
- Chapter 5: Learning from Data - Part 2
- Applying machine learning on data using Java Machine Learning (Java-ML) library
- Classifying data points using the Stanford classifier
- Classifying data points using Massive Online Analysis (MOA).
- Getting ready
- Classifying multilabeled data points using Mulan
- Chapter 6: Retrieving Information from Text Data
- Detecting tokens (words) using Java
- Detecting sentences using Java
- Detecting tokens (words) and sentences using OpenNLP
- Retrieving lemma, part-of-speech, and recognizing named entities from tokens using Stanford CoreNLP
- Measuring text similarity with Cosine Similarity measure using Java 8
- Extracting topics from text documents using Mallet
- Classifying text documents using Mallet
- Classifying text documents using Weka
- Chapter 7: Handling Big Data
- Training an online logistic regression model using Apache Mahout
- Applying an online logistic regression model using Apache Mahout
- Solving simple text mining problems with Apache Spark
- Clustering using KMeans algorithm with MLib
- Creating a linear regression model with MLib
- Classifying data points with Random Forest model using MLib
- Chapter 8: Learn Deeply from Data
- Creating a Word2vec neural net using Deep Learning for Java (DL4j)
- There's more
- Creating a Deep Belief neural net using Deep Learning for Java (DL4j)
- Creating a deep autoencoder using Deep Learning for Java (DL4j)
- How to do it….
- How it works…
- Chapter 9: Visualizing Data
- Plotting a 2D sine graph
- Plotting histograms
- Plotting a bar chart
- Plotting box plots or whisker diagrams
- Plotting scatter plots
- Plotting donut plots
- Plotting area graphs
- Index.
- Notes:
- Includes index.
- Description based on online resource; title from PDF title page (ebrary, viewed April 11, 2017).
- ISBN:
- 9781787127654
- 1787127656
- OCLC:
- 983204794
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.