My Account Log in

2 options

Python feature engineering cookbook / Soledad Galli.

EBSCOhost Academic eBook Collection (North America) Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Galli, Soledad, author.
Language:
English
Subjects (All):
Python (Computer program language).
Application software--Development.
Application software.
Machine learning.
Physical Description:
1 online resource (386 pages)
Edition:
Second edition.
Place of Publication:
Birmingham, England : Packt Publishing, Limited, [2022]
Biography/History:
Galli Soledad: Soledad Galli is a lead data scientist with more than 10 years of experience in world-class academic institutions and renowned businesses. She has researched, developed, and put into production machine learning models for insurance claims, credit risk assessment, and fraud prevention. Soledad received a Data Science Leaders' award in 2018 and was named one of LinkedIn's voices in data science and analytics in 2019. She is passionate about enabling people to step into and excel in data science, which is why she mentors data scientists and speaks at data science meetings regularly. She also teaches online courses on machine learning in a prestigious Massive Open Online Course platform, which have reached more than 10, 000 students worldwide.
Summary:
Python Feature Engineering Cookbook, Second Edition will give you the practice, tools, and techniques to streamline your feature engineering pipelines and simplify and improve the quality of your code. With more than 70 methods to transform or create variables, you will find solutions tailored to different datasets and machine learning models.
Contents:
Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Chapter 1: Imputing Missing Data
Technical requirements
Removing observations with missing data
How to do it...
How it works...
Performing mean or median imputation
Imputing categorical variables
Replacing missing values with an arbitrary number
Finding extreme values for imputation
Marking imputed values
Performing multivariate imputation by chained equations
See also
Estimating missing data with nearest neighbors
Chapter 2: Encoding Categorical Variables
Creating binary variables through one-hot encoding
There's more...
Performing one-hot encoding of frequent categories
Replacing categories with counts or the frequency of observations
Replacing categories with ordinal numbers
Performing ordinal encoding based on the target value
Implementing target mean encoding
How it works…
There's more…
Encoding with the Weight of Evidence
Grouping rare or infrequent categories
Performing binary encoding
Chapter 3: Transforming Numerical Variables
Transforming variables with the logarithm function.
Getting ready
Transforming variables with the reciprocal function
Using the square root to transform variables
Using power transformations
Performing Box-Cox transformation
Performing Yeo-Johnson transformation
Chapter 4: Performing Variable Discretization
Performing equal-width discretization
Implementing equal-frequency discretization
Discretizing the variable into arbitrary intervals
Performing discretization with k-means clustering
Implementing feature binarization
Getting ready
Using decision trees for discretization
Chapter 5: Working with Outliers
Visualizing outliers with boxplots
Finding outliers using the mean and standard deviation
Finding outliers with the interquartile range proximity rule
Removing outliers
Capping or censoring outliers
Capping outliers using quantiles
Chapter 6: Extracting Features from Date and Time Variables
Extracting features from dates with pandas
How it works.
There's more…
Extracting features from time with pandas
Capturing the elapsed time between datetime variables
Working with time in different time zones
Automating feature extraction with Feature-engine
Chapter 7: Performing Feature Scaling
Standardizing the features
Scaling to the maximum and minimum values
Scaling with the median and quantiles
Performing mean normalization
Implementing maximum absolute scaling
Scaling to vector unit length
Chapter 8: Creating New Features
Combining features with mathematical functions
Comparing features to reference variables
How to do it…
Performing polynomial expansion
Combining features with decision trees
Creating periodic features from cyclical variables
Creating spline features
Chapter 9: Extracting Features from Relational Data with Featuretools
Setting up an entity set and creating features automatically.
Creating features with general and cumulative operations
Combining numerical features
Extracting features from date and time
Extracting features from text
Creating features with aggregation primitives
Chapter 10: Creating Features from a Time Series with tsfresh
Extracting features automatically from a time series
Creating and selecting features for a time series
Tailoring feature creation to different time series
Creating pre-selected features
Embedding feature creation in a scikit-learn pipeline
Chapter 11: Extracting Features from Text Variables
Counting characters, words, and vocabulary
Estimating text complexity by counting sentences
Creating features with bag-of-words and n-grams
Implementing term frequency-inverse document frequency
Cleaning and stemming text variables
Index
About Packt
Other Books You May Enjoy.
Notes:
Includes index.
Description based on print version record.
ISBN:
9781523151547
1523151544
9781804615393
1804615390
OCLC:
1350412247

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account