My Account Log in

3 options

Learning data mining with Python : use Python to manipulate data and build predictive models / Robert Layton.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Layton, Robert, 1986- author.
Language:
English
Subjects (All):
Python (Computer program language).
Physical Description:
1 online resource (348 pages)
Edition:
Second edition.
Other Title:
Use Python to manipulate data and build predictive models
Place of Publication:
Birmingham, [England] ; Mumbai, [India] : Packt Publishing, 2017.
System Details:
text file
Summary:
Harness the power of Python to develop data mining applications, analyze data, delve into machine learning, explore object detection using Deep Neural Networks, and create insightful predictive models. About This Book Use a wide variety of Python libraries for practical data mining purposes. Learn how to find, manipulate, analyze, and visualize data using Python. Step-by-step instructions on data mining techniques with Python that have real-world applications. Who This Book Is For If you are a Python programmer who wants to get started with data mining, then this book is for you. If you are a data analyst who wants to leverage the power of Python to perform data mining efficiently, this book will also help you. No previous experience with data mining is expected. What You Will Learn Apply data mining concepts to real-world problems Predict the outcome of sports matches based on past results Determine the author of a document based on their writing style Use APIs to download datasets from social media and other online services Find and extract good features from difficult datasets Create models that solve real-world problems Design and develop data mining applications using a variety of datasets Perform object detection in images using Deep Neural Networks Find meaningful insights from your data through intuitive visualizations Compute on big data, including real-time data from the internet In Detail This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations. Style and approach This book will be your comprehensive guide to learning the various data mining techniques and implementing them in Python. A variety of real-world datasets is used to explain data mining techniques in a very crisp...
Contents:
Cover
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Getting Started with Data Mining
Introducing data mining
Using Python and the Jupyter Notebook
Installing Python
Installing Jupyter Notebook
Installing scikit-learn
A simple affinity analysis example
What is affinity analysis?
Product recommendations
Loading the dataset with NumPy
Downloading the example code
Implementing a simple ranking of rules
Ranking to find the best rules
A simple classification example
What is classification?
Loading and preparing the dataset
Implementing the OneR algorithm
Testing the algorithm
Summary
Chapter 2: Classifying with scikit-learn Estimators
scikit-learn estimators
Nearest neighbors
Distance metrics
Loading the dataset
Moving towards a standard workflow
Running the algorithm
Setting parameters
Preprocessing
Standard pre-processing
Putting it all together
Pipelines
Chapter 3: Predicting Sports Winners with Decision Trees
Collecting the data
Using pandas to load the dataset
Cleaning up the dataset
Extracting new features
Decision trees
Parameters in decision trees
Using decision trees
Sports outcome prediction
Random forests
How do ensembles work?
Setting parameters in Random Forests
Applying random forests
Engineering new features
Chapter 4: Recommending Movies Using Affinity Analysis
Affinity analysis
Algorithms for affinity analysis
Overall methodology
Dealing with the movie recommendation problem
Obtaining the dataset
Loading with pandas
Sparse data formats
Understanding the Apriori algorithm and its implementation.
Looking into the basics of the Apriori algorithm
Implementing the Apriori algorithm
Extracting association rules
Evaluating the association rules
Chapter 5: Features and scikit-learn Transformers
Feature extraction
Representing reality in models
Common feature patterns
Creating good features
Feature selection
Selecting the best individual features
Feature creation
Principal Component Analysis
Creating your own transformer
The transformer API
Implementing a Transformer
Unit testing
Chapter 6: Social Media Insight using Naive Bayes
Disambiguation
Downloading data from a social network
Loading and classifying the dataset
Creating a replicable dataset from Twitter
Text transformers
Bag-of-words models
n-gram features
Other text features
Naive Bayes
Understanding Bayes' theorem
Naive Bayes algorithm
How it works
Applying of Naive Bayes
Extracting word counts
Converting dictionaries to a matrix
Evaluation using the F1-score
Getting useful features from models
Chapter 7: Follow Recommendations Using Graph Mining
Classifying with an existing model
Getting follower information from Twitter
Building the network
Creating a graph
Creating a similarity graph
Finding subgraphs
Connected components
Optimizing criteria
Chapter 8: Beating CAPTCHAs with Neural Networks
Artificial neural networks
An introduction to neural networks
Creating the dataset
Drawing basic CAPTCHAs
Splitting the image into individual letters
Creating a training dataset
Training and classifying
Back-propagation
Predicting words
Improving accuracy using a dictionary
Ranking mechanisms for word similarity.
Putting it all together
Chapter 9: Authorship Attribution
Attributing documents to authors
Applications and use cases
Authorship attribution
Getting the data
Using function words
Counting function words
Classifying with function words
Support Vector Machines
Classifying with SVMs
Kernels
Character n-grams
Extracting character n-grams
The Enron dataset
Accessing the Enron dataset
Creating a dataset loader
Evaluation
Chapter 10: Clustering News Articles
Trending topic discovery
Using a web API to get data
Reddit as a data source
Extracting text from arbitrary websites
Finding the stories in arbitrary websites
Extracting the content
Grouping news articles
The k-means algorithm
Evaluating the results
Extracting topic information from clusters
Using clustering algorithms as transformers
Clustering ensembles
Evidence accumulation
Implementation
Online learning
Chapter 11: Object Detection in Images using Deep Neural Networks
Object classification
Use cases
Application scenario
Deep neural networks
Intuition
Implementing deep neural networks
An Introduction to TensorFlow
Using Keras
Convolutional Neural Networks
GPU optimization
When to use GPUs for computation
Running our code on a GPU
Setting up the environment
Application
Creating the neural network
Chapter 12: Working with Big Data
Big data
Applications of big data
MapReduce
The intuition behind MapReduce
A word count example
Hadoop MapReduce
Applying MapReduce
Naive Bayes prediction
The mrjob package
Extracting the blog posts.
Training Naive Bayes
Training on Amazon's EMR infrastructure
Appendix: Next Steps...
Getting Started with Data Mining
Scikit-learn tutorials
Extending the Jupyter Notebook
More datasets
Other Evaluation Metrics
More application ideas
Classifying with scikit-learn Estimators
Scalability with the nearest neighbor
More complex pipelines
Comparing classifiers
Automated Learning
Predicting Sports Winners with Decision Trees
More complex features
Dask
Research
Recommending Movies Using Affinity Analysis
New datasets
The Eclat algorithm
Collaborative Filtering
Extracting Features with Transformers
Adding noise
Vowpal Wabbit
word2vec
Social Media Insight Using Naive Bayes
Spam detection
Natural language processing and part-of-speech tagging
Discovering Accounts to Follow Using Graph Mining
More complex algorithms
NetworkX
Beating CAPTCHAs with Neural Networks
Better (worse?) CAPTCHAs
Deeper networks
Reinforcement learning
Authorship Attribution
Increasing the sample size
Blogs dataset
Local n-grams
Clustering News Articles
Clustering Evaluation
Temporal analysis
Real-time clusterings
Classifying Objects in Images Using Deep Learning
Mahotas
Magenta
Working with Big Data
Courses on Hadoop
Pydoop
Recommendation engine
W.I.L.L
More resources
Kaggle competitions
Coursera
Index.
Notes:
Includes bibliographical references and index.
Description based on online resource; title from PDF title page (ebrary, viewed July 12, 2017).
ISBN:
9781787129566
178712956X
OCLC:
987331258

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account