2 options
Python feature engineering cookbook / Soledad Galli.
- Format:
- Book
- Author/Creator:
- Galli, Soledad, author.
- Language:
- English
- Subjects (All):
- Python (Computer program language).
- Application software--Development.
- Application software.
- Machine learning.
- Physical Description:
- 1 online resource (386 pages)
- Edition:
- Second edition.
- Place of Publication:
- Birmingham, England : Packt Publishing, Limited, [2022]
- Biography/History:
- Galli Soledad: Soledad Galli is a lead data scientist with more than 10 years of experience in world-class academic institutions and renowned businesses. She has researched, developed, and put into production machine learning models for insurance claims, credit risk assessment, and fraud prevention. Soledad received a Data Science Leaders' award in 2018 and was named one of LinkedIn's voices in data science and analytics in 2019. She is passionate about enabling people to step into and excel in data science, which is why she mentors data scientists and speaks at data science meetings regularly. She also teaches online courses on machine learning in a prestigious Massive Open Online Course platform, which have reached more than 10, 000 students worldwide.
- Summary:
- Python Feature Engineering Cookbook, Second Edition will give you the practice, tools, and techniques to streamline your feature engineering pipelines and simplify and improve the quality of your code. With more than 70 methods to transform or create variables, you will find solutions tailored to different datasets and machine learning models.
- Contents:
- Cover
- Title Page
- Copyright and Credits
- Contributors
- Table of Contents
- Preface
- Chapter 1: Imputing Missing Data
- Technical requirements
- Removing observations with missing data
- How to do it...
- How it works...
- Performing mean or median imputation
- Imputing categorical variables
- Replacing missing values with an arbitrary number
- Finding extreme values for imputation
- Marking imputed values
- Performing multivariate imputation by chained equations
- See also
- Estimating missing data with nearest neighbors
- Chapter 2: Encoding Categorical Variables
- Creating binary variables through one-hot encoding
- There's more...
- Performing one-hot encoding of frequent categories
- Replacing categories with counts or the frequency of observations
- Replacing categories with ordinal numbers
- Performing ordinal encoding based on the target value
- Implementing target mean encoding
- How it works…
- There's more…
- Encoding with the Weight of Evidence
- Grouping rare or infrequent categories
- Performing binary encoding
- Chapter 3: Transforming Numerical Variables
- Transforming variables with the logarithm function.
- Getting ready
- Transforming variables with the reciprocal function
- Using the square root to transform variables
- Using power transformations
- Performing Box-Cox transformation
- Performing Yeo-Johnson transformation
- Chapter 4: Performing Variable Discretization
- Performing equal-width discretization
- Implementing equal-frequency discretization
- Discretizing the variable into arbitrary intervals
- Performing discretization with k-means clustering
- Implementing feature binarization
- Getting ready
- Using decision trees for discretization
- Chapter 5: Working with Outliers
- Visualizing outliers with boxplots
- Finding outliers using the mean and standard deviation
- Finding outliers with the interquartile range proximity rule
- Removing outliers
- Capping or censoring outliers
- Capping outliers using quantiles
- Chapter 6: Extracting Features from Date and Time Variables
- Extracting features from dates with pandas
- How it works.
- There's more…
- Extracting features from time with pandas
- Capturing the elapsed time between datetime variables
- Working with time in different time zones
- Automating feature extraction with Feature-engine
- Chapter 7: Performing Feature Scaling
- Standardizing the features
- Scaling to the maximum and minimum values
- Scaling with the median and quantiles
- Performing mean normalization
- Implementing maximum absolute scaling
- Scaling to vector unit length
- Chapter 8: Creating New Features
- Combining features with mathematical functions
- Comparing features to reference variables
- How to do it…
- Performing polynomial expansion
- Combining features with decision trees
- Creating periodic features from cyclical variables
- Creating spline features
- Chapter 9: Extracting Features from Relational Data with Featuretools
- Setting up an entity set and creating features automatically.
- Creating features with general and cumulative operations
- Combining numerical features
- Extracting features from date and time
- Extracting features from text
- Creating features with aggregation primitives
- Chapter 10: Creating Features from a Time Series with tsfresh
- Extracting features automatically from a time series
- Creating and selecting features for a time series
- Tailoring feature creation to different time series
- Creating pre-selected features
- Embedding feature creation in a scikit-learn pipeline
- Chapter 11: Extracting Features from Text Variables
- Counting characters, words, and vocabulary
- Estimating text complexity by counting sentences
- Creating features with bag-of-words and n-grams
- Implementing term frequency-inverse document frequency
- Cleaning and stemming text variables
- Index
- About Packt
- Other Books You May Enjoy.
- Notes:
- Includes index.
- Description based on print version record.
- ISBN:
- 9781523151547
- 1523151544
- 9781804615393
- 1804615390
- OCLC:
- 1350412247
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.