My Account Log in

3 options

Python data science essentials : become an efficient data science practitioner by thoroughly understanding the key concepts of Python / Alberto Boschetti, Luca Massaron.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Boschetti, Alberto, author.
Massaron, Luca, author.
Series:
Community experience distilled.
Community Experience Distilled
Language:
English
Subjects (All):
Python (Computer program language).
Scripting languages (Computer science).
Physical Description:
1 online resource (258 p.)
Edition:
1st edition
Other Title:
Become an efficient data science practitioner by thoroughly understanding the key concepts of Python
Place of Publication:
Birmingham, England ; Mumbai, [India] : Packt Publishing, 2015.
Language Note:
English
System Details:
text file
Summary:
If you are an aspiring data scientist and you have at least a working knowledge of data analysis and Python, this book will get you started in data science. Data analysts with experience of R or MATLAB will also find the book to be a comprehensive reference to enhance their data manipulation and machine learning skills.
Contents:
Cover; Copyright; Credits; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: First Steps; Introducing data science and Python; Installing Python; Python 2 or Python 3?; Step-by-step installation; A glance at the essential Python packages; NumPy; SciPy; pandas; Scikit-learn; IPython; Matplotlib; Statsmodels; Beautiful Soup; NetworkX; NLTK; Gensim; PyPy; The installation of packages; Package upgrades; Scientific distributions; Anaconda; Enthought Canopy; PythonXY; WinPython; Introducing IPython; The IPython Notebook
Datasets and code used in the bookScikit-learn toy datasets; MLdata.org public repository; LIBSVM data examples; Loading data directly from CSV or text files; Scikit-learn sample generators; Summary; Chapter 2: Data Munging; The data science process; Data loading and preprocessing with pandas; Fast and easy data loading; Dealing with problematic data; Dealing with big datasets; Accessing other data formats; Data preprocessing; Data selection; Working with categorical and textual data; A special type of data-text; Data processing with NumPy; NumPy's n-dimensional array
The basics of NumPy ndarray objectsCreating NumPy arrays; From lists to unidimensional arrays; Controlling the memory size; Heterogeneous lists; From lists to multidimensional arrays; Resizing arrays; Arrays derived from NumPy functions; Getting an array directly from a file; Extracting data from pandas; NumPy fast operation and computations; Matrix operations; Slicing and indexing with NumPy arrays; Stacking NumPy arrays; Summary; Chapter 3: Data Science Pipeline; Introducing EDA; Feature creation; Dimensionality reduction; Covariance matrix; Principal Component Analysis (PCA)
A variation of PCA for big data-randomized PCALatent Factor Analysis (LFA); Linear Discriminant Analysis (LDA); Latent Semantical Analysis (LSA); Independent Component Analysis (ICA); Kernel PCA; Restricted Boltzmann Machine (RBM); Detection and treatment of outliers; Univariate outlier detection; EllipticEnvelope; OneClassSVM; Scoring functions; Multilabel classification; Binary classification; Regression; Testing and validating; Cross-validation; Using cross-validation iterators; Sampling and bootstrapping; Hyper-parameters optimization; Building custom scoring functions
Reducing grid search runtimeFeature selection; Univariate selection; Recursive elimination; Stability and L1-based selection; Summary; Chapter 4: Machine Learning; Linear and logistic regression; Naive Bayes; The k-Nearest Neighbors; Advanced nonlinear algorithms; SVM for classification; SVM for regression; Tuning SVM; Ensemble strategies; Pasting by random samples; Bagging with weak ensembles; Random Subspaces and Random Patches; Sequences of models - AdaBoost; Gradient tree boosting (GTB); Dealing with big data; Creating some big datasets as examples; Scalability with volume
Keeping up with velocity
Notes:
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed June 8, 2015).
OCLC:
910639609

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account