My Account Log in

2 options

Analysis of microarray gene expression data / Mei-Ling Ting Lee.

Holman Biotech Commons QP624.5.D726 L44 2004
Loading location information...

Available This item is available for access.

Log in to request item
Levy Dental Medicine Library - Stacks QP624.5.D726 L44 2004
Loading location information...

Available This item is available for access.

Log in to request item
Format:
Book
Author/Creator:
Lee, Mei-Ling Ting.
Language:
English
Subjects (All):
DNA microarrays--Statistical methods.
DNA microarrays.
Gene expression--Statistical methods.
Gene expression.
Oligonucleotide Array Sequence Analysis--methods.
Gene Expression--methods.
Statistics.
Medical Subjects:
Oligonucleotide Array Sequence Analysis--methods.
Gene Expression--methods.
Physical Description:
xvi, 371 pages : illustrations (some color) ; 25 cm
Place of Publication:
Boston : Kluwer Academic, [2004]
Summary:
After genomic sequencing, microarray technology has emerged as a widely used platform for genomic studies in the life sciences. Microarray technology provides a systematic way to survey DNA and RNA variation. With the abundance of data produced from microarray studies, however, the ultimate impact of the studies on biology will depend heavily on data mining and statistical analysis. The contribution of this book is to provide readers with an integrated presentation of various topics on analyzing microarray data.
Contents:
Part I Genome Probing Using Microarrays
2. DNA, RNA, Proteins, and Gene Expression 7
2.1 The Molecules of Life 7
2.2 Genes 8
2.3 DNA 9
2.4 RNA 12
2.5 The Genetic Code 13
2.6 Proteins 14
2.7 Gene Expression and Microarrays 15
2.8 Complementary DNA (cDNA) 16
2.9 Nucleic Acid Hybridization 16
3. Microarray Technology 19
3.1 Transcriptional Profiling 20
3.1.1 Sequencing-based Transcriptional Profiling 20
3.1.2 Hybridization-based Transcriptional Profiling 22
3.2 Microarray Technological Platforms 23
3.3 Probe Selection and Synthesis 24
3.4 Array Manufacturing 30
3.5 Target Labeling 31
3.6 Hybridization 34
3.7 Scanning and Image Analysis 35
3.8 Microarray Data 36
3.8.1 Spotted Array Data 36
3.8.2 In-situ Oligonucleotide Array Data 37
3.9 So I Have My Microarray Data - What's Next? 39
3.9.1 Confirming Microarray Results 39
3.9.2 Northern Blot Analysis 40
3.9.3 Reverse-transcription PCR and Quantitative Real-time RT-PCR 40
4. Inherent Variability in Array Data 45
4.1 Genetic Populations 45
4.2 Variability in Gene Expression Levels 47
4.2.1 Variability Due to Specimen Sampling 47
4.2.2 Variability Due to Cell Cycle Regulation 48
4.2.3 Experimental Variability 48
4.3 Test the Variability by Replication 50
4.3.1 Duplicated Spots 50
4.3.2 Multiple Arrays and Biological Replications 51
5. Background Noise 53
5.1 Pixel-by-pixel Analysis of Individual Spots 53
5.2 General Models for Background Noise 56
5.2.1 Additive Background Noise 57
5.2.2 Correction for Background Noise 58
5.2.3 Example: Replication Test Data Set 59
5.2.4 Noise Models for GeneChip Arrays 62
5.2.5 Elusive Nature of Background Noise 63
6. Transformation and Normalization 67
6.1 Data Transformations 67
6.1.1 Logarithmic Transformation 67
6.1.2 Square Root Transformation 68
6.1.3 Box-Cox Transformation Family 69
6.1.4 Affine Transformation 69
6.1.5 The Generalized-log Transformation 71
6.2 Data Normalization 72
6.2.1 Normalization Across G Genes 74
6.2.2 Example: Mouse Juvenile Cystic Kidney Data Set 75
6.2.3 Normalization Across G Genes and N Samples 77
6.2.4 Color Effects and MA Plots 78
6.2.5 Normalization Based on LOWESS Function 80
6.2.6 Normalization Based on Rank-invariant Genes 82
6.2.7 Normalization Based on a Sample Pool 82
6.2.8 Global Normalization Using ANOVA Models 82
6.2.9 Other Normalization Issues 83
7. Missing Values in Array Data 85
7.1 Missing Values in Array Data 85
7.1.1 Sources of Problem 85
7.2 Statistical Classification of Missing Data 86
7.3 Missing Values in Replicated Designs 88
7.4 Imputation of Missing Values 89
8. Saturated Intensity Readings 93
8.1 Saturated Intensity Readings 93
8.2 Multiple Power-levels for Spotted Arrays 93
8.2.1 Imputing Saturated Intensity Readings 95
8.3 High Intensities in Oligonucleotide Arrays 97
Part II Statistical Models and Analysis
9. Experimental Design 103
9.1 Factors Involved in Experiments 103
9.2 Types of Design Structures 106
9.3 Common Practice in Microarray Studies 112
9.3.1 Reference Design 112
9.3.2 Time-course Experiment 114
9.3.3 Color Reversal 115
9.3.4 Loop Design 116
9.3.5 Example: Time-course Loop Design 117
10. ANOVA Models for Microarray Data 121
10.1 A Basic Log-linear Model 121
10.2 ANOVA With Multiple Factors 123
10.2.1 Main Effects 123
10.2.2 Interaction Effects 123
10.3 A Generic Fixed-Effects ANOVA Model 124
10.3.1 Estimation for Interaction Effects 126
10.4 Two-stage Estimation Procedures 126
10.5 Identifying Differentially Expressed Genes 130
10.5.1 Standard MSE-based Approach 130
10.5.2 Other Approaches 132
10.5.3 Modified MSE-based Approach 132
10.6 Mixed-effects Models 135
10.7 ANOVA for Split-plot Design 136
10.8 Log Intensity Versus Log Ratio 138
11. Multiple Testing in Microarray Studies 143
11.1 Hypothesis Testing for Any Individual Gene 143
11.2 Multiple Testing for the Entire Gene Set 144
11.2.1 Framework for Multiple Testing 144
11.2.2 Test Statistic for Each Gene 145
11.2.3 Two Error Control Criteria in Multiple Testing 146
11.2.4 Implementation Algorithms 147
11.2.5 Example of Multiple Testing Algorithms 152
12. Permutation Tests in Microarray Data 157
12.2 Permutation Tests in Microarray Studies 160
12.2.1 Exchangeability in Microarray Designs 160
12.2.2 Limitation of Having Few Permutations 162
12.2.3 Pooling Test Results Across Genes 162
12.3 Lipopolysaccharide-E. coli Data Set 163
12.3.1 Statistical Model 164
12.3.2 Permutation Testing and Results 166
13. Bayesian Methods for Microarray Data 171
13.1 Mixture Model for Gene Expression 171
13.1.1 Variations on the Mixture Model 173
13.1.2 Example of Gamma Models 175
13.2 Mixture Model for Differential Expression 176
13.2.1 Mixture Model for Color Ratio Data 176
13.2.2 Relation of Mixture Model to ANOVA Model 180
13.2.3 Bayes Interpretation of Mixture Model 182
13.3 Empirical Bayes Methods 183
13.3.1 Example of Empirical Bayes Fitting 184
13.4 Hierarchical Bayes Models 187
13.4.1 Example of Hierarchical Modeling 189
14. Power and Sample Size Considerations 193
14.1 Test Hypotheses in Microarray Studies 194
14.2 Distributions of Estimated Differential Expression 196
14.3 Summary Measures of Estimated Differential Expression 196
14.4 Multiple Testing Framework 197
14.5 Dependencies of Estimation Errors 199
14.6 Familywise Type I Error Control 200
14.6.1 Type I Error Control: the Sidak Approach 201
14.6.2 Type I Error Control: the Bonferroni Approach 203
14.7 Familywise Type II Error Control 204
14.7.1 Type II Error Control: the Sidak Approach 206
14.7.2 Type II Error Control: the Bonferroni Approach 206
14.8 Contrast of Planning and Implementation in Multiple Testing 207
14.9 Power Calculations for Different Summary Measures 208
14.9.1 Designs with Linear Summary Measure 208
14.9.2 Numerical Example for Linear Summary 210
14.9.3 Designs with Quadratic Summary Measure 211
14.9.4 Numerical Example for Quadratic Summary 213
14.10 A Bayesian Perspective on Power and Sample Size 214
14.10.1 Connection to Local Discovery Rates 215
14.10.2 Representative Local True Discovery Rate 215
14.10.3 Numerical Example for TDR and FDR 216
14.11 Applications to Standard Designs 216
14.11.1 Treatment-control Designs 217
14.11.2 Sample Size for a Treatment-control Design 218
14.11.3 Multiple-treatment Designs 221
14.11.4 Power Table for a Multiple-treatment Design 224
14.11.5 Time-course and Similar Multiple-treatment Designs 227
14.12 Relation Between Power, Replication and Design 228
14.12.1 Effects of Replication 228
14.12.2 Controlling Sources of Variability 229
14.13 Assessing Power from Microarray Pilot Studies 230
14.13.1 Example 1: Juvenile Cystic Kidney Disease 230
14.13.2 Example 2: Opioid Dependence 231
Part III Unsupervised Exploratory Analysis
15. Cluster Analysis 237
15.1 Distance and Similarity Measures 238
15.2 Distance Measures 239
15.2.1 Properties of Distance Measures 239
15.2.2 Minkowski Distance Measures 240
15.2.3 Mahalanobis Distance 241
15.3 Similarity Measures 241
15.3.1 Inner Product 241
15.3.2 Pearson Correlation Coefficient 242
15.3.3 Spearman Rank Correlation Coefficient 243
15.4 Inter-cluster Distance 243
15.4.1 Mahalanobis Inter-cluster Distance 244
15.4.2 Neighbor-based Inter-cluster Distance 244
15.5 Hierarchical Clustering 244
15.5.1 Single Linkage Method 245
15.5.2 Complete Linkage Method 245
15.5.3 Average Linkage Clustering 245
15.5.4 Centroid Linkage Method 246
15.5.5 Median Linkage Clustering 246
15.5.6 Ward's Clustering Method 246
15.5.7 Applications 246
15.5.8 Comparisons of Clustering Algorithms 247
15.6 K-means Clustering 247
15.7 Bayesian Cluster Analysis 248
15.8 Two-way Clustering Methods 248
15.9 Reliability of Clustering Patterns for Microarray Data 249
16. Principal Components and Singular Value Decomposition 251
16.1 Principal Component Analysis 251
16.1.1 Applications of Dominant Principal Components 253
16.2 Singular-value Decomposition 254
16.3 Computational Procedures for SVD 255
16.4 Eigengenes and Eigenarrays 256
16.5 Fraction of Eigenexpression 256
16.6 Generalized Singular Value Decomposition 257
16.7 Robust Singular Value Decomposition 257
17. Self-Organizing Maps 261
17.1 The Basic Logic of a SOM 261
17.2 The SOM Updating Algorithm 265
17.3 Program GENECLUSTER 267
17.4 Supervised SOM 268
17.5 Applications 268
17.5.1 Using SOM to Cluster Genes 268
17.5.2 Using SOM to Cluster Tumors 269
17.5.3 Multiclass Cancer Diagnosis 270
Part IV Supervised Learning Methods
18. Discrimination and Classification 277
18.1 Fisher's Linear Discriminant Analysis 278
18.2 Maximum Likelihood Discriminant Rules 279
18.3 Bayesian Classification 280
18.4 k-Nearest Neighbor Classifier 281
18.5 Neighborhood Analysis 282
18.6 A Gene-casting Weighted Voting Scheme 283
18.7 Example: Classification of Leukemia Samples 284
19. Artificial Neural Networks 287
19.1 Single-layer Neural Network 288
19.1.1 Separating Hyperplanes 288
19.1.2 Class Labels 289
19.1.3 Decision Rules 290
19.1.4 Risk Functions 290
19.1.5 Gradient Descent Procedures 290
19.1.6 Rosenblatt's Perceptron Method 291
19.2 General Structure of Multilayer Neural Networks 292
19.3 Training a Multilayer Neural Network 294
19.3.1 Sigmoid Functions 294
19.3.2 Mathematical Formulation 295
19.3.3 Training Algorithm 296
19.4 Cancer Classification Using Neural Networks 298
20. Support Vector Machines 301
20.1 Geometric Margins for Linearly Separable Groups 301
20.2 Convex Optimization in the Dual Space 305
20.3 Support Vectors 306
20.4 Linearly Nonseparable Groups 307
20.5 Nonlinear Separating Boundary 308
20.5.1 Kernel Functions 309
20.5.2 Kernels Defined by Symmetric Functions 309
20.5.3 Use of SVM for Classifying Genes 310
20.6.1 Functional Classification of Genes 311
20.6.2 SVM and One-versus-All Classification Scheme 313
Sample Size Table for Treatment-control Designs 317
Power Table for Multiple-treatment Designs 327.
Notes:
Includes bibliographical references (pages [351]-365) and indexes.
ISBN:
0792370872
1402077890
1402077882
OCLC:
54081725

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account