1 option
Handbook of regression modeling in people analytics : with examples in R and Python / Keith McNulty.
- Format:
- Book
- Author/Creator:
- McNulty, Keith, author.
- Language:
- English
- Subjects (All):
- Regression analysis.
- R (Computer program language).
- Python (Computer program language).
- Mathematical statistics.
- Physical Description:
- 1 online resource (272 pages)
- Edition:
- 1st ed.
- Place of Publication:
- Boca Raton, Florida : CRC Press, [2021]
- Summary:
- "This book is a learning resource on inferential statistics and regression analysis. It teaches how to do a wide range of statistical analyses in both R and in Python, ranging from simple hypothesis testing to advanced multivariate modelling. Although it is primarily focused on examples related to the analysis of people and talent, the methods easily transfer to any discipline. The book hits a 'sweet spot' where there is just enough mathematical theory to support a strong understanding of the methods, but with a step-by-step guide and easily reproducible examples and code, so that the methods can be put into practice immediately. This makes the book accessible to a wide readership, from public and private sector analysts and practitioners to students and researchers"-- Provided by publisher.
- Contents:
- Cover
- Half Title
- Title Page
- Copyright Page
- Contents
- Foreword by Alexis Fink
- Introduction
- 1. The Importance of Regression in People Analytics
- 1.1. Why is regression modeling so important in people analytics?
- 1.2. What do we mean by 'modeling' ?
- 1.2.1. The theory of inferential modeling
- 1.2.2. The process of inferential modeling
- 1.3. The structure, system and organization of this book
- 2. The Basics of the R Programming Language
- 2.1. What is R?
- 2.2. How to start using R
- 2.3. Data in R
- 2.3.1. Data types
- 2.3.2. Homogeneous data structures
- 2.3.3. Heterogeneous data structures
- 2.4. Working with dataframes
- 2.4.1. Loading and tidying data in dataframes
- 2.4.2. Manipulating dataframes
- 2.5. Functions, packages and libraries
- 2.5.1. Using functions
- 2.5.2. Help with functions
- 2.5.3. Writing your own functions
- 2.5.4. Installing packages
- 2.5.5. Using packages
- 2.5.6. The pipe operator
- 2.6. Errors, warnings and messages
- 2.7. Plotting and graphing
- 2.7.1. Plotting in base R
- 2.7.2. Specialist plotting and graphing packages
- 2.8. Documenting your work using R Markdown
- 2.9. Learning exercises
- 2.9.1. Discussion questions
- 2.9.2. Data exercises
- 3. Statistics Foundations
- 3.1. Elementary descriptive statistics of populations and samples
- 3.1.1. Mean, variance and standard deviation
- 3.1.2. Covariance and correlation
- 3.2. Distribution of random variables
- 3.2.1. Sampling of random variables
- 3.2.2. Standard errors, the t-distribution and confidence intervals
- 3.3. Hypothesis testing
- 3.3.1. Testing for a difference in means (Welch's t-test)
- 3.3.2. Testing for a non-zero correlation between two variables t-test for correlation).
- 3.3.3. Testing for a difference in frequency distribution between different categories in a data set (Chi-square test)
- 3.4. Foundational statistics in Python
- 3.5. Learning exercises
- 3.5.1. Discussion questions
- 3.5.2. Data exercises
- 4. Linear Regression for Continuous Outcomes
- 4.1. When to use it
- 4.1.1. Origins and intuition of linear regression
- 4.1.2. Use cases for linear regression
- 4.1.3. Walkthrough example
- 4.2. Simple linear regression
- 4.2.1. Linear relationship between a single input and an outcome
- 4.2.2. Minimising the error
- 4.2.3. Determining the best fit
- 4.2.4. Measuring the fit of the model
- 4.3. Multiple linear regression
- 4.3.1. Running a multiple linear regression model and interpreting its coefficients
- 4.3.2. Coefficient confidence
- 4.3.3. Model 'goodness-of-fit'
- 4.3.4. Making predictions from your model
- 4.4. Managing inputs in linear regression
- 4.4.1. Relevance of input variables
- 4.4.2. Sparseness ('missingness') of data
- 4.4.3. Transforming categorical inputs to dummy variables
- 4.5. Testing your model assumptions
- 4.5.1. Assumption of linearity and additivity
- 4.5.2. Assumption of constant error variance
- 4.5.3. Assumption of normally distributed errors
- 4.5.4. Avoiding high collinearity and multicollinearity between input variables
- 4.6. Extending multiple linear regression
- 4.6.1. Interactions between input variables
- 4.6.2. Quadratic and higher-order polynomial terms
- 4.7. Learning exercises
- 4.7.1. Discussion questions
- 4.7.2. Data exercises
- 5. Binomial Logistic Regression for Binary Outcomes
- 5.1. When to use it
- 5.1.1. Origins and intuition of binomial logistic regression
- 5.1.2. Use cases for binomial logistic regression
- 5.1.3. Walkthrough example
- 5.2. Modeling probabilistic outcomes using a logistic function.
- 5.2.1. Deriving the concept of log odds
- 5.2.2. Modeling the log odds and interpreting the coefficients
- 5.2.3. Odds versus probability
- 5.3. Running a multivariate binomial logistic regression model
- 5.3.1. Running and interpreting a multivariate binomial logistic regression model
- 5.3.2. Understanding the fit and goodness-of-fit of a binomial logistic regression model
- 5.3.3. Model parsimony
- 5.4. Other considerations in binomial logistic regression
- 5.5. Learning exercises
- 5.5.1. Discussion questions
- 5.5.2. Data exercises
- 6. Multinomial Logistic Regression for Nominal Category Outcomes
- 6.1. When to use it
- 6.1.1. Intuition for multinomial logistic regression
- 6.1.2. Use cases for multinomial logistic regression
- 6.1.3. Walkthrough example
- 6.2. Running stratified binomial models
- 6.2.1. Modeling the choice of Product A versus other products
- 6.2.2. Modeling other choices
- 6.3. Running a multinomial regression model
- 6.3.1. Defining a reference level and running the model
- 6.3.2. Interpreting the model
- 6.3.3. Changing the reference
- 6.4. Model simplification, fit and goodness-of-fit for multinomial logistic regression models
- 6.4.1. Gradual safe elimination of variables
- 6.4.2. Model fit and goodness-of-fit
- 6.5. Learning exercises
- 6.5.1. Discussion questions
- 6.5.2. Data exercises
- 7. Proportional Odds Logistic Regression for Ordered Category Outcomes
- 7.1. When to use it
- 7.1.1. Intuition for proportional odds logistic regression
- 7.1.2. Use cases for proportional odds logistic regression
- 7.1.3. Walkthrough example
- 7.2. Modeling ordinal outcomes under the assumption of proportional odds
- 7.2.1. Using a latent continuous outcome variable to derive a proportional odds model
- 7.2.2. Running a proportional odds logistic regression model.
- 7.2.3. Calculating the likelihood of an observation being in a specific ordinal category
- 7.2.4. Model diagnostics
- 7.3. Testing the proportional odds assumption
- 7.3.1. Sighting the coefficients of stratified binomial models
- 7.3.2. The Brant-Wald test
- 7.3.3. Alternatives to proportional odds models
- 7.4. Learning exercises
- 7.4.1. Discussion questions
- 7.4.2. Data exercises
- 8. Modeling Explicit and Latent Hierarchy in Data
- 8.1. Mixed models for explicit hierarchy in data
- 8.1.1. Fixed and random effects
- 8.1.2. Running a mixed model
- 8.2. Structural equation models for latent hierarchy in data
- 8.2.1. Running and assessing the measurement model
- 8.2.2. Running and interpreting the structural model
- 8.3. Learning exercises
- 8.3.1. Discussion questions
- 8.3.2. Data exercises
- 9. Survival Analysis for Modeling Singular Events Over Time
- 9.1. Tracking and illustrating survival rates over the study period
- 9.2. Cox proportional hazard regression models
- 9.2.1. Running a Cox proportional hazard regression model
- 9.2.2. Checking the proportional hazard assumption
- 9.3. Frailty models
- 9.4. Learning exercises
- 9.4.1. Discussion questions
- 9.4.2. Data exercises
- 10. Alternative Technical Approaches in R and Python
- 10.1. 'Tidier' modeling approaches in R
- 10.1.1. The broom package
- 10.1.2. The parsnip package
- 10.2. Inferential statistical modeling in Python
- 10.2.1. Ordinary Least Squares (OLS) linear regression
- 10.2.2. Binomial logistic regression
- 10.2.3. Multinomial logistic regression
- 10.2.4. Structural equation models
- 10.2.5. Survival analysis
- 10.2.6. Other model variants
- 11. Power Analysis to Estimate Required Sample Sizes for Modeling
- 11.1. Errors, effect sizes and statistical power
- 11.2. Power analysis for simple hypothesis tests.
- 11.3. Power analysis for linear regression models
- 11.4. Power analysis for log-likelihood regression models
- 11.5. Power analysis for hierarchical regression models
- 11.6. Power analysis using Python
- 12. Further Exercises for Practice
- 12.1. Analyzing graduate salaries
- 12.1.1. The graduates data set
- 12.1.2. Discussion questions
- 12.1.3. Data exercises
- 12.2. Analyzing a recruiting process
- 12.2.1. The recruiting data set
- 12.2.2. Discussion questions
- 12.2.3. Data exercises
- 12.3. Analyzing the drivers of performance ratings
- 12.3.1. The employee_performance data set
- 12.3.2. Discussion questions
- 12.3.3. Data exercises
- 12.4. Analyzing promotion differences between groups
- 12.4.1. The promotion data set
- 12.4.2. Discussion questions
- 12.4.3. Data exercises
- 12.5. Analyzing feedback on learning programs
- 12.5.1. The learning data set
- 12.5.2. Discussion questions
- 12.5.3. Data exercises
- References
- Glossary
- Index.
- Notes:
- Includes bibliographical references and index.
- Description based on print version record.
- ISBN:
- 1-00-319415-X
- 1-003-19415-X
- 1-000-42789-7
- 9781003194156
- OCLC:
- 1257666817
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.