1 option
Data analysis for the life sciences with R / Rafael A. Irizarry, Michael I. Love.
Veterinary: Atwood Library (Campus) QH323.5 .I75 2017
Available
- Format:
- Book
- Author/Creator:
- Irizarry, Rafael A., author.
- Language:
- English
- Subjects (All):
- Life sciences--Statistical methods.
- Life sciences.
- Physical Description:
- xxi, 353 pages ; 26 cm
- Place of Publication:
- Boca Raton, FL : CRC Press, [2017]
- Summary:
- Genomics is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are next generation sequencing and microarrays. This book was written for the many life science researchers who are becoming data analysts due to the emergence of these new types of data. Although the content of the book is mostly focused on advanced statistical concepts, the basics are covered to make sure readers have a strong grounding on the fundamental statistical concepts required for all data analysis. The book begins with statistical inference and then proceeds to an introduction to linear models and matrix algebra, high-dimensional data, distance and dimension reduction, and batch effects and factor analysis. The emphasis is on using a computer to perform data analysis. All sections of this book are reproducible as they were made with R markdown documents that include the code used to produce the book's figures, tables and results. Book jacket.
- Contents:
- 1 Getting Started 1
- 1.1 Installing R 1
- 1.2 Installing R Studio 1
- 1.3 Learn R Basics 1
- 1.4 Installing Packages 2
- 1.5 Importing Data into R 3
- 1.6 Exercises 5
- 1.7 Brief Introduction to dplyr 6
- 1.8 Exercises 8
- 1.9 Mathematical Notation 8
- 2 Inference 13
- 2.1 Introduction 13
- 2.2 Random Variables 14
- 2.3 The Null Hypothesis 15
- 2.4 Distributions 16
- 2.5 Probability Distribution 17
- 2.6 Normal Distribution 19
- 2.7 Exercises 21
- 2.8 Populations, Samples and Estimates 22
- 2.9 Exercises 23
- 2.10 Central Limit Theorem and t-distribution 24
- 2.11 Exercises 28
- 2.12 Central Limit Theorem in Practice 30
- 2.13 Exercises 33
- 2.14 T-tests in Practice 36
- 2.15 The t-distribution in Practice 37
- 2.16 Confidence Intervals 39
- 2.17 Power Calculations 45
- 2.18 Exercises 51
- 2.19 Monte Carlo Simulation 54
- 2.20 Parametric Simulations for the Observations 58
- 2.21 Exercises 58
- 2.22 Permutation Tests 60
- 2.23 Exercises 62
- 2.24 Association Tests 63
- 2.25 Exercises 67
- 3 Exploratory Data Analysis 69
- 3.1 Quantile Quantile Plots 69
- 3.2 Boxplots 72
- 3.3 Scatterplots and Correlation 74
- 3.4 Stratification 74
- 3.5 Bivariate Normal Distribution 75
- 3.6 Plots to Avoid 78
- 3.7 Misunderstanding Correlation (Advanced) 91
- 3.8 Exercises 93
- 3.9 Robust Summaries 94
- 3.10 Wilcoxon Rank Sum Test 99
- 3.11 Exercises 100
- 4 Matrix Algebra 103
- 4.1 Motivating Examples 103
- 4.2 Exercises 109
- 4.3 Matrix Notation 110
- 4.4 Solving Systems of Equations 110
- 4.5 Vectors, Matrices, and Scalars 111
- 4.6 Exercises 113
- 4.7 Matrix Operations 113
- 4.8 Exercises 117
- 4.9 Examples 118
- 4.10 Exercises 122
- 5 Linear Models 125
- 5.1 Exercises 125
- 5.2 The Design Matrix 127
- 5.3 Exercises 134
- 5.4 The Mathematics Behind lm() 135
- 5.5 Exercises 137
- 5.6 Standard Errors 139
- 5.7 Exercises 145
- 5.8 Interactions and Contrasts 146
- 5.9 Linear Model with Interactions 156
- 5.10 Analysis of Variance 160
- 5.11 Exercises 166
- 5.12 Collinearity 168
- 5.13 Rank 169
- 5.14 Removing Confounding 170
- 5.15 Exercises 170
- 5.16 The QR Factorization (Advanced) 172
- 5.17 Going Further 175
- 6 Inference for High Dimensional Data 177
- 6.1 Introduction 177
- 6.2 Exercises 179
- 6.3 Inference in Practice 180
- 6.4 Exercises 183
- 6.5 Procedures 184
- 6.6 Error Rates 184
- 6.7 The Bonferroni Correction 187
- 6.8 False Discovery Rate 189
- 6.9 Direct Approach to FDR and q-values (Advanced) 195
- 6.10 Exercises 198
- 6.11 Basic Exploratory Data Analysis 201
- 6.12 Exercises 206
- 7 Statistical Models 209
- 7.1 The Binomial Distribution 209
- 7.2 The Poisson Distribution 209
- 7.3 Maximum Likelihood Estimation 213
- 7.4 Distributions for Positive Continuous Values 215
- 7.5 Exercises 220
- 7.6 Bayesian Statistics 224
- 7.7 Exercises 229
- 7.8 Hierarchical Models 230
- 7.9 Exercises 234
- 8 Distance and Dimension Reduction 237
- 8.1 Introduction 237
- 8.2 Euclidean Distance 237
- 8.3 Distance in High Dimensions 239
- 8.4 Exercises 241
- 8.5 Dimension Reduction Motivation 241
- 8.6 Singular Value Decomposition 246
- 8.7 Exercises 252
- 8.8 Projections 254
- 8.9 Rotations 258
- 8.10 Multi-Dimensional Scaling Plots 261
- 8.11 Exercises 266
- 8.12 Principal Component Analysis 267
- 9 Basic Machine Learning 273
- 9.1 Clustering 273
- 9.2 Exercises 279
- 9.3 Conditional Probabilities and Expectations 281
- 9.4 Exercises 283
- 9.5 Smoothing 285
- 9.6 Bin Smoothing 286
- 9.7 Loess 288
- 9.8 Exercises 290
- 9.9 Class Prediction 291
- 9.10 Cross-validation 297
- 9.11 Exercises 302
- 10 Batch Effects 305
- 10.1 Confounding 307
- 10.2 Confounding: High-Throughput Example 311
- 10.3 Exercises 312
- 10.4 Discovering Batch Effects with EDA 313
- 10.5 Gene Expression Data 314
- 10.6 Exorcises 321
- 10.7 Motivation for Statistical Approaches 323
- 10.8 Adjusting for Batch Effects with Linear Models 325
- 10.9 Exercises 329
- 10.10 Factor Analysis 330
- 10.11 Exercises 333
- 10.12 Modeling Batch Effects with Factor Analysis 335
- 10.13 Exercises 342.
- ISBN:
- 9781498775670
- 1498775675
- OCLC:
- 959328452
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.