3 options
Java data analysis : data mining, big data analysis, NoSQL, and data visualization / John R. Hubbard.
- Format:
- Book
- Author/Creator:
- Hubbard, John R., author.
- Language:
- English
- Subjects (All):
- Java (Computer program language).
- Physical Description:
- 1 online resource (1 volume) : illustrations
- Edition:
- 1st edition
- Place of Publication:
- Birmingham, England ; Mumbai, [India] : Packt Publishing, 2017.
- System Details:
- text file
- Summary:
- Get the most out of the popular Java libraries and tools to perform efficient data analysis About This Book Get your basics right for data analysis with Java and make sense of your data through effective visualizations. Use various Java APIs and tools such as Rapidminer and WEKA for effective data analysis and machine learning. This is your companion to understanding and implementing a solid data analysis solution using Java Who This Book Is For If you are a student or Java developer or a budding data scientist who wishes to learn the fundamentals of data analysis and learn to perform data analysis with Java, this book is for you. Some familiarity with elementary statistics and relational databases will be helpful but is not mandatory, to get the most out of this book. A firm understanding of Java is required. What You Will Learn Develop Java programs that analyze data sets of nearly any size, including text Implement important machine learning algorithms such as regression, classification, and clustering Interface with and apply standard open source Java libraries and APIs to analyze and visualize data Process data from both relational and non-relational databases and from time-series data Employ Java tools to visualize data in various forms Understand multimedia data analysis algorithms and implement them in Java. In Detail Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks. This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. After getting a quick overview of what data science is and the steps involved in the process, you'll learn the statistical data analysis techniques and implement them using the popular Java APIs and libraries. Through practical examples, you will also learn the machine learning concepts such as classification and regression. In the process, you'll familiarize yourself with tools such as Rapidminer and WEKA and see how these Java-based tools can be used effectively for analysis. You will also learn how to analyze text and other types of multimedia. Learn to work with relational, NoSQL, and time-series data. This book will also show you how you can utilize different Java-based libraries to create insightful and easy to understand plots and graphs. By the end of this book, you will have a solid understanding of...
- Contents:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Introduction to Data Analysis
- Origins of data analysis
- The scientific method
- Actuarial science
- Calculated by steam
- A spectacular example
- Herman Hollerith
- ENIAC
- VisiCalc
- Data, information, and knowledge
- Why Java?
- Java Integrated Development Environments
- Summary
- Chapter 2: Data Preprocessing
- Data types
- Variables
- Data points and datasets
- Null values
- Relational database tables
- Key fields
- Key-value pairs
- Hash tables
- File formats
- Microsoft Excel data
- XML and JSON data
- Generating test datasets
- Metadata
- Data cleaning
- Data scaling
- Data filtering
- Sorting
- Merging
- Hashing
- Chapter 3: Data Visualization
- Tables and graphs
- Scatter plots
- Line graphs
- Bar charts
- Histograms
- Time series
- Java implementation
- Moving average
- Data ranking
- Frequency distributions
- The normal distribution
- A thought experiment
- The exponential distribution
- Java example
- Chapter 4: Statistics
- Descriptive statistics
- Random sampling
- Random variables
- Probability distributions
- Cumulative distributions
- The binomial distribution
- Multivariate distributions
- Conditional probability
- The independence of probabilistic events
- Contingency tables
- Bayes' theorem
- Covariance and correlation
- The standard normal distribution
- The central limit theorem
- Confidence intervals
- Hypothesis testing
- Chapter 5: Relational Databases
- The relation data model
- Relational databases
- Foreign keys
- Relational database design
- Creating a database
- SQL commands
- Inserting data into the database
- Database queries
- SQL data types
- JDBC.
- Using a JDBC PreparedStatement
- Batch processing
- Database views
- Subqueries
- Table indexes
- Chapter 6: Regression Analysis
- Linear regression
- Linear regression in Excel
- Computing the regression coefficients
- Variation statistics
- Java implementation of linear regression
- Anscombe's quartet
- Polynomial regression
- Multiple linear regression
- The Apache Commons implementation
- Curve fitting
- Chapter 7: Classification Analysis
- Decision trees
- What does entropy have to do with it?
- The ID3 algorithm
- Java Implementation of the ID3 algorithm
- The Weka platform
- The ARFF filetype for data
- Java implementation with Weka
- Bayesian classifiers
- Support vector machine algorithms
- Logistic regression
- K-Nearest Neighbors
- Fuzzy classification algorithms
- Chapter 8: Cluster Analysis
- Measuring distances
- The curse of dimensionality
- Hierarchical clustering
- Weka implementation
- K-means clustering
- K-medoids clustering
- Affinity propagation clustering
- Chapter 9: Recommender Systems
- Utility matrices
- Similarity measures
- Cosine similarity
- A simple recommender system
- Amazon's item-to-item collaborative filtering recommender
- Implementing user ratings
- Large sparse matrices
- Using random access files
- The Netflix prize
- Chapter 10: NoSQL Databases
- The Map data structure
- SQL versus NoSQL
- The Mongo database system
- The Library database
- Java development with MongoDB
- The MongoDB extension for geospatial databases
- Indexing in MongoDB
- Why NoSQL and why MongoDB?
- Other NoSQL database systems
- Chapter 11: Big Data Analysis with Java
- Scaling, data striping, and sharding
- Google's PageRank algorithm
- Google's MapReduce framework.
- Some examples of MapReduce applications
- The WordCount example
- Scalability
- Matrix multiplication with MapReduce
- MapReduce in MongoDB
- Apache Hadoop
- Hadoop MapReduce
- Appendix: Java Tools
- The command line
- Java
- NetBeans
- MySQL
- MySQL Workbench
- Accessing the MySQL database from NetBeans
- The Apache Commons Math Library
- The javax JSON Library
- The Weka libraries
- MongoDB
- Index.
- Notes:
- Includes index.
- Description based on online resource; title from PDF title page (ebrary, viewed October 18, 2017).
- OCLC:
- 1008968666
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.