1 option
R for Health Data Science / Ewen Harrison, Riinu Pius.
- Format:
- Book
- Author/Creator:
- Harrison, Ewen, author.
- Pius, Riinu, author.
- Language:
- English
- Subjects (All):
- Computational biology.
- Physical Description:
- 1 online resource (364 pages)
- Edition:
- 1st ed.
- Place of Publication:
- Boca Raton : Chapman and Hall/CRC, 2020.
- Language Note:
- In English.
- Summary:
- In this age of information, the manipulation, analysis, and interpretation of data have become a fundamental part of professional life; nowhere more so than in the delivery of healthcare. From the understanding of disease and the development of new treatments, to the diagnosis and management of individual patients, the use of data and technology is now an integral part of the business of healthcare. Those working in healthcare interact daily with data, often without realising it. The conversion of this avalanche of information to useful knowledge is essential for high-quality patient care. R for Health Data Science includes everything a healthcare professional needs to go from R novice to R guru. By the end of this book, you will be taking a sophisticated approach to health data science with beautiful visualisations, elegant tables, and nuanced analyses. Features Provides an introduction to the fundamentals of R for healthcare professionals Highlights the most popular statistical approaches to health data science Written to be as accessible as possible with minimal mathematics Emphasises the importance of truly understanding the underlying data through the use of plots Includes numerous examples that can be adapted for your own data Helps you create publishable documents and collaborate across teams With this book, you are in safe hands - Prof. Harrison is a clinician and Dr. Pius is a data scientist, bringing 25 years' combined experience of using R at the coal face. This content has been taught to hundreds of individuals from a variety of backgrounds, from rank beginners to experts moving to R from other platforms.
- Contents:
- Cover
- Half Title
- Title Page
- Copyright Page
- Dedication
- Contents
- Preface
- About the Authors
- I. Data wrangling and visualisation
- 1. Why we love R
- 1.1. Help, what's a script?
- 1.2. What is RStudio?
- 1.3. Getting started
- 1.4. Getting help
- 1.5. Work in a Project
- 1.6. Restart R regularly
- 1.7. Notation throughout this book
- 2. R basics
- 2.1. Reading data into R
- 2.1.1. Import Dataset interface
- 2.1.2. Reading in the Global Burden of Disease example dataset
- 2.2. Variable types and why we care
- 2.2.1. Numeric variables (continuous)
- 2.2.2. Character variables
- 2.2.3. Factor variables (categorical)
- 2.2.4. Date/time variables
- 2.3. Objects and functions
- 2.3.1. data frame/tibble
- 2.3.2. Naming objects
- 2.3.3. Function and its arguments
- 2.3.4. Working with objects
- 2.3.5. <
- - and =
- 2.3.6. Recap: object, function, input, argument
- 2.4. Pipe - %>
- %
- 2.4.1. Using . to direct the pipe
- 2.5. Operators for filtering data
- 2.5.1. Worked examples
- 2.6. The combine function: c ()
- 2.7. Missing values (NAs) and filters
- 2.8. Creating new columns - mutate ()
- 2.8.1. Worked example/exercise
- 2.9. Conditional calculations - if_else ()
- 2.10. Create labels - paste()
- 2.11. Joining multiple datasets
- 2.11.1 Further notes about joins
- 3. Summarising data
- 3.1. Get the data
- 3.2. Plot the data
- 3.3. Aggregating: group_by (), summarise ()
- 3.4. Add new columns: mutate ()
- 3.4.1. Percentages formatting: percent ()
- 3.5. summarise () vs mutate ()
- 3.6. Common arithmetic functions - sum (), mean (), median (), etc.
- 3.7. select () columns
- 3.8. Reshaping data - long vs wide format
- 3.8.1. Pivot values from rows into columns (wider)
- 3.8.2. Pivot values from columns to rows (longer)
- 3.8.3. separate () a column into multiple columns.
- 3.9. arrange () rows
- 3.9.1. Factor levels
- 3.10. Exercises
- 3.10.1. Exercise - pivot_wider ()
- 3.10.2. Exercise - group_by (), summarise ()
- 3.10.3. Exercise - full_join (), percent ()
- 3.10.4. Exercise - mutate (), summarise ()
- 3.10.5. Exercise - filter (), summarise (), pivot_wider ()
- 4. Different types of plots
- 4.1. Get the data
- 4.2. Anatomy of ggplot explained
- 4.3. Set your theme - grey vs white
- 4.4. Scatter plots/bubble plots
- 4.5. Line plots/time series plots
- 4.5.1. Exercise
- 4.6. Bar plots
- 4.6.1. Summarised data
- 4.6.2. Countable data
- 4.6.3. colour vs fill
- 4.6.4. Proportions
- 4.6.5. Exercise
- 4.7. Histograms
- 4.8. Box plots
- 4.9. Multiple geoms, multiple aes ()
- 4.9.1. Worked example - three geoms together
- 4.10. All other types of plots
- 4.11. Solutions
- 4.12. Extra: Advanced examples
- 5. Fine tuning plots
- 5.1. Get the data
- 5.2. Scales
- 5.2.1. Logarithmic
- 5.2.2. Expand limits
- 5.2.3. Zoom in
- 5.2.4. Exercise
- 5.2.5. Axis ticks
- 5.3. Colours
- 5.3.1. Using the Brewer palettes:
- 5.3.2. Legend title
- 5.3.3. Choosing colours manually
- 5.4. Titles and labels
- 5.4.1. Annotation
- 5.4.2. Annotation with a superscript and a variable
- 5.5. Overall look - theme ()
- 5.5.1. Text size
- 5.5.2. Legend position
- 5.6. Saving your plot
- II. Data analysis
- 6. Working with continuous outcome variables
- 6.1. Continuous data
- 6.2. The Question
- 6.3. Get and check the data
- 6.4. Plot the data
- 6.4.1. Histogram
- 6.4.2. Quantile-quantile (Q-Q) plot
- 6.4.3. Boxplot
- 6.5. Compare the means of two groups
- 6.5.1. t-test
- 6.5.2. Two-sample t-tests
- 6.5.3. Paired t-tests
- 6.5.4. What if I run the wrong test?
- 6.6. Compare the mean of one group: one sample t-tests
- 6.6.1. Interchangeability of t-tests.
- 6.7. Compare the means of more than two groups
- 6.7.1. Plot the data
- 6.7.2. ANOVA
- 6.7.3. Assumptions
- 6.8. Multiple testing
- 6.8.1. Pairwise testing and multiple comparisons
- 6.9. Non-parametric tests
- 6.9.1. Transforming data
- 6.9.2. Non-parametric test for comparing two groups
- 6.9.3. Non-parametric test for comparing more than two groups
- 6.10. Finalfit approach
- 6.11. Conclusions
- 6.12. Exercises
- 6.12.1. Exercise
- 6.12.2. Exercise
- 6.12.3. Exercise
- 6.12.4. Exercise
- 6.13. Solutions
- 7. Linear regression
- 7.1. Regression
- 7.1.1. The Question (1)
- 7.1.2. Fitting a regression line
- 7.1.3. When the line fits well
- 7.1.4. The fitted line and the linear equation
- 7.1.5. Effect modification
- 7.1.6. R-squared and model fit
- 7.1.7. Confounding
- 7.1.8. Summary
- 7.2. Fitting simple models
- 7.2.1. The Question (2)
- 7.2.2. Get the data
- 7.2.3. Check the data
- 7.2.4. Plot the data
- 7.2.5. Simple linear regression
- 7.2.6. Multivariable linear regression
- 7.2.7. Check assumptions
- 7.3. Fitting more complex models
- 7.3.1. The Question (3)
- 7.3.2. Model fitting principles
- 7.3.3. AIC
- 7.3.4. Get the data
- 7.3.5. Check the data
- 7.3.6. Plot the data
- 7.3.7. Linear regression with finalfit
- 7.3.8. Summary
- 7.4. Exercises
- 7.4.1. Exercise
- 7.4.2. Exercise
- 7.4.3. Exercise
- 7.4.4. Exercise
- 7.5. Solutions
- 8. Working with categorical outcome variables
- 8.1. Factors
- 8.2. The Question
- 8.3. Get the data
- 8.4. Check the data
- 8.5. Recode the data
- 8.6. Should I convert a continuous variable to a categorical variable?
- 8.6.1. Equal intervals vs quantiles
- 8.7. Plot the data
- 8.8. Group factor levels together - fct_collapse ()
- 8.9. Change the order of values within a factor - fct_relevel ()
- 8.10. Summarising factors with finalfit.
- 8.11. Pearson's chi-squared and Fisher's exact tests
- 8.11.1. Base R
- 8.12. Fisher's exact test
- 8.13. Chi-squared / Fisher's exact test using finalfit
- 8.14. Exercises
- 8.14.1. Exercise
- 8.14.2. Exercise
- 8.14.3. Exercise
- 9. Logistic regression
- 9.1. Generalised linear modelling
- 9.2. Binary logistic regression
- 9.2.1. The Question (1)
- 9.2.2. Odds and probabilities
- 9.2.3. Odds ratios
- 9.2.4. Fitting a regression line
- 9.2.5. The fitted line and the logistic regression equation
- 9.2.6. Effect modification and confounding
- 9.3. Data preparation and exploratory analysis
- 9.3.1. The Question (2)
- 9.3.2. Get the data
- 9.3.3. Check the data
- 9.3.4. Recode the data
- 9.3.5. Plot the data
- 9.3.6. Tabulate data
- 9.4. Model assumptions
- 9.4.1. Linearity of continuous variables to the response
- 9.4.2. Multicollinearity
- 9.5. Fitting logistic regression models in base R
- 9.6. Modelling strategy for binary outcomes
- 9.7. Fitting logistic regression models with finalfit
- 9.7.1. Criterion-based model fitting
- 9.8. Model fitting
- 9.8.1. Odds ratio plot
- 9.9. Correlated groups of observations
- 9.9.1. Simulate data
- 9.9.2. Plot the data
- 9.9.3. Mixed effects models in base R
- 9.10. Exercises
- 9.10.1. Exercise
- 9.10.2. Exercise
- 9.10.3. Exercise
- 9.10.4. Exercise
- 9.11. Solutions
- 10. Time-to-event data and survival
- 10.1. The Question
- 10.2. Get and check the data
- 10.3. Death status
- 10.4. Time and censoring
- 10.5. Recode the data
- 10.6. Kaplan Meier survival estimator
- 10.6.1. KM analysis for whole cohort
- 10.6.2. Model
- 10.6.3. Life table
- 10.7. Kaplan Meier plot
- 10.8. Cox proportional hazards regression
- 10.8.1. coxph ()
- 10.8.2. finalfit ()
- 10.8.3. Reduced model
- 10.8.4. Testing for proportional hazards
- 10.8.5. Stratified models.
- 10.8.6. Correlated groups of observations
- 10.8.7. Hazard ratio plot
- 10.9. Competing risks regression
- 10.10. Summary
- 10.11. Dates in R
- 10.11.1. Converting dates to survival time
- 10.12. Exercises
- 10.12.1. Exercise
- 10.12.2. Exercise
- 10.13. Solutions
- III. Workflow
- 11. The problem of missing data
- 11.1. Identification of missing data
- 11.1.1. Missing completely at random (MCAR)
- 11.1.2. Missing at random (MAR)
- 11.1.3. Missing not at random (MNAR)
- 11.2. Ensure your data are coded correctly: ff_glimpse ()
- 11.2.1. The Question
- 11.3. Identify missing values in each variable: missing_plot ()
- 11.4. Look for patterns of missingness: missing_pattern ()
- 11.5. Including missing data in demographics tables
- 11.6. Check for associations between missing and observed data
- 11.6.1. For those who like an omnibus test
- 11.7. Handling missing data: MCAR
- 11.7.1. Common solution: row-wise deletion
- 11.7.2. Other considerations
- 11.8. Handling missing data: MAR
- 11.8.1. Common solution: Multivariate Imputation by Chained Equations (mice)
- 11.9. Handling missing data: MNAR
- 11.10. Summary
- 12. Notebooks and Markdown
- 12.1. What is a Notebook?
- 12.2. What is Markdown?
- 12.3. What is the difference between a Notebook and an R Markdown file?
- 12.4. Notebook vs HTML vs PDF vs Word
- 12.5. The anatomy of a Notebook / R Markdown file
- 12.5.1. YAML header
- 12.5.2. R code chunks
- 12.5.3. Setting default chunk options
- 12.5.4. Setting default figure options
- 12.5.5. Markdown elements
- 12.6. Interface and outputting
- 12.6.1. Running code and chunks, knitting
- 12.7. File structure and workflow
- 12.7.1. Why go to all this bother?
- 13. Exporting and reporting
- 13.1. Which format should I use?
- 13.2. Working in a .R file
- 13.3. Demographics table.
- 13.4. Logistic regression table.
- Notes:
- Description based on: online resource; title from PDF information screen (Routledge, viewed December 29, 2022).
- ISBN:
- 1-000-22610-7
- 0-367-85542-9
- 1-000-22616-6
- 9780367855420
- OCLC:
- 1222803132
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.