My Account Log in

1 option

Algorithms in bioinformatics : a practical introduction / Wing-Kin Sung.

Van Pelt Library QH324.2 .S86 2010
Loading location information...

Available This item is available for access.

Log in to request item
Format:
Book
Author/Creator:
Sung, Wing-Kin.
Contributor:
Hazel M. Hussong Fund.
Series:
Chapman and Hall/CRC mathematical & computational biology series
Chapman & Hall/CRC mathematical and computational biology series
Language:
English
Subjects (All):
Bioinformatics.
Genetic algorithms.
Physical Description:
xvii, 381 pages : illustrations (some color) ; 25 cm.
Place of Publication:
Boca Raton, FL : Chapman & Hall/CRC, [2010]
Summary:
Algorithms in Bioinformatics: A practical Introduction provides an in-depth introduction to the algorithmic techniques applied in bioinformatics. For each topic, the author clearly details the biological motivation, precisely defines the corresponding computational problems and includes detailed examples to illustrate each algorithm.
The book begins with basic molecular biology concepts. It then describes ways to measure sequence similarity, presents simple applications of the suffix tree, and discusses the problem of searching sequence databases. After introducing methods for aligning multiple biological sequences and genomes, the text explores applications of the phylogenetic tree, methods for comparing phylogenetic trees, the problem of genome rearrangement, and the problem of motif finding. It also covers methods for predicting the secondary structure of RNA and for reconstructing the peptide sequence using mass spectrometry. The final chapter examines the computational problem related to population genetics.
Features
Presents a comprehensive overview of principles and methods in bioinformatics
Covers numerous applications of algorithms in bioinformatics
Discusses the practical issues and actual performance of using various methods with real biological data
Assumes no prior knowledge of molecular biology
Offers supplementary material on the author’s Web site
Contents:
1 Introduction to Molecular Biology 1
1.1 DNA, RNA, and Protein 1
1.1.1 Proteins 1
1.1.2 DNA 4
1.1.3 RNA 9
1.2 Genome, Chromosome, and Gene 10
1.2.1 Genome 10
1.2.2 Chromosome 10
1.2.3 Gene 11
1.2.4 Complexity of the Organism versus Genome Size 11
1.2.5 Number of Genes versus Genome Size 11
1.3 Replication and Mutation of DNA 12
1.4 Central Dogma (from DNA to Protein) 13
1.4.1 Transcription (Prokaryotes) 13
1.4.3 Transcription (Eukaryote) 14
1.4.3 Translation 15
1.5 Post-Translation Modification (PTM) 17
1.6 Population Genetics 18
1.7 Basic Biotechnological Tools 18
1.7.1 Restriction Enzymes 19
1.7.2 Sonication 19
1.7.3 Cloning 19
1.7.4 PCR 20
1.7.5 Gel Electrophoresis 22
1.7.6 Hybridization 23
1.7.7 Next Generation DNA Sequencing 24
1.8 Brief History of Bioinformatics 26
1.9 Exercises 27
2 Sequence Similarity 29
2.2 Introduction 29
2.2 Global Alignment Problem 30
2.2.1 Needleman-Wunsch Algorithm 32
2.2.2 Running Time Issue 34
2.2.3 Space Efficiency Issue 35
2.2.4 More on Global Alignment 38
2.3 Local Alignment 39
2.4 Semi-Global Alignment 41
2.5 Gap Penalty 42
2.5.1 General Gap Penalty Model 43
2.5.2 Affine Gap Penalty Model 43
2.5.3 Convex Gap Model 45
2.6 Scoring Function 50
2.6.1 Scoring Function for DNA 50
2.6.2 Scoring Function for Protein 51
2.7 Exercises 53
3 Suffix Tree 57
3.1 Introduction 57
3.2 Suffix Tree 57
3.3 Simple Applications of a Suffix Tree 59
3.3.1 Exact String Matching Problem 59
3.3.2 Longest Repeated Substring Problem 60
3.3.3 Longest Common Substring Problem 60
3.3.4 Longest Common Prefix (LCP) 61
3.3.5 Finding a Palindrome 62
3.3.6 Extracting the Embedded Suffix Tree of a String from the Generalized Suffix Tree 63
3.3.7 Common Substring of 2 or More Strings 64
3.4 Construction of a Suffix Tree 65
3.4.1 Step 1: Construct the Odd Suffix Tree 68
3.4.2 Step 2: Construct the Even Suffix Tree 69
3.4.3 Step 3: Merge the Odd and the Even Suffix Trees 70
3.5 Suffix Array 72
3.5.1 Construction of a Suffix Array 73
3.5.2 Exact String Matching Using a Suffix Array 73
3.6 FM-Index 76
3.6.1 Definition 77
3.6.2 The occ Data Structure 78
3.6.3 Exact String Matching Using the FM-Index 79
3.7 Approximate Searching Problem 81
3.8 Exercises 82
4 Genome Alignment 87
4.1 Introduction 87
4.2 Maximum Unique Match (MUM) 88
4.2.1 How to Find MUMs 89
4.3 MUMmer1: LCS 92
4.3.1 Dynamic Programming Algorithm in 0(n²) Time 93
4.3.2 An O (n log n)-Time Algorithm 93
4.4 MUMmer2 and MUMmer3 96
4.4.1 Reducing Memory Usage 97
4.4.2 Employing a New Alternative Algorithm for Finding MUMs 97
4.4.3 Clustering Matches 97
4.4.4 Extension of the Definition of MUM 98
4.5 Mutation Sensitive Alignment 99
4.5.1 Concepts and Definitions 99
4.5.2 The Idea of the Heuristic Algorithm 100
4.5.3 Experimental Results 102
4.6 Dot Plot for Visualizing the Alignment 103
4.7 Further Reading 105
4.8 Exercises 105
5 Database Search 109
5.1 Introduction 109
5.1.1 Biological Database 109
5.1.2 Database Searching 109
5.1.3 Types of Algorithms 110
5.2 Smith-Waterman Algorithm 111
5.3 FastA 111
5.3.1 FastP Algorithm 112
5.3.2 FastA Algorithm 113
5.4 Blast 114
5.4.1 Blast1 115
5.4.2 Blast2 116
5.4.3 Blast1 versus Blast2 118
5.4.4 Blast versus FastA 118
5.4.5 Statistics for Local Alignment 119
5.5 Variations of the Blast Algorithm 120
5.5.1 MegaBlast 120
5.5.2 Blat 121
5.5.3 PatternHunter 121
5.5.4 PSI-Blast (Position-Specific Iterated Blast) 123
5.6 Q-gram Alignment based on Suffix A Rrays (QUASAR) 124
5.6.1 Algorithm 124
5.6.2 Speeding Up and Reducing the Space for QUASAR 127
5.6.3 Time Analysis 127
5.7 Locality-Sensitive Hashing 128
5.8 BWT-SW 130
5.8.1 Aligning Query Sequence to Suffix Tree 130
5.8.2 Meaningful Alignment 133
5.9 Are Existing Database Searching Methods Sensitive Enough? 136
5.10 Exercises 136
6 Multiple Sequence Alignment 139
6.1 Introduction 139
6.2 Formal Definition of the Multiple Sequence Alignment Problem 139
6.3 Methods for Solving the MSA Problem 141
6.4 Dynamic Programming Method 142
6.5 Center Star Method 143
6.6 Progressive Alignment Method 146
6.6.1 ClustalW 147
6.6.2 Profile-Profile Alignment 147
6.6.3 Limitation of Progressive Alignment Construction 149
6.7 Iterative Method 149
6.7.1 Muscle 150
6.7.2 Log-Expectation (LE) Score 151
6.8 Further Reading 151
6.9 Exercises 152
7 Phylogeny Reconstruction 155
7.1 Introduction 155
7.1.1 Mitochondrial DNA and Inheritance 155
7.1.2 The Constant Molecular Clock 155
7.1.3 Phylogeny 156
7.1.4 Applications of Phylogeny 157
7.1.5 Phylogenetic Tree Reconstruction 158
7.2 Character-Based Phylogeny Reconstruction Algorithm 159
7.2.1 Maximum Parsimony 159
7.2.2 Compatibility 165
7.2.3 Maximum Likelihood Problem 172
7.3 Distance-Based Phylogeny Reconstruction Algorithm 178
7.3.1 Additive Metric and Ultrametric 179
7.3.2 Unweighted Pan Group Method with Arithmetic Mean (UPGMA) 184
7.3.3 Additive Tree Reconstruction 187
7.3.4 Nearly Additive Tree Reconstruction 189
7.3.5 Can We Apply Distance-Based Methods Given a Character-State Matrix? 190
7.4 Bootstrapping 191
7.5 Can Tree Reconstruction Methods Infer the Correct Tree? 192
7.6 Exercises 193
8 Phylogeny Comparison 199
8.1 Introduction 199
8.2 Similarity Measurement 200
8.2.1 Computing MAST by Dynamic Programming 201
8.2.2 MAST for Unrooted Trees 202
8.3 Dissimilarity Measurements 203
8.3.1 Robinson-Foulds Distance 204
8.3.2 Nearest Neighbor Interchange Distance (NNI) 209
8.3.3 Subtree Transfer Distance (STT) 210
8.3.4 Quartet Distance 211
8.4 Consensus Tree Problem 214
8.4.1 Strict Consensus Tree 215
8.4.2 Majority Rule Consensus Tree 216
8.4.3 Median Consensus Tree 218
8.4.4 Greedy Consensus Tree 218
8.4.5 R* Tree 219
8.5 Further Reading 220
8.6 Exercises 222
9 Genome Rearrangement 225
9.1 Introduction 225
9.2 Types of Genome Rearrangements 225
9.3 Computational Problems 227
9.4 Sorting an Unsigned Permutation by Reversals 227
9.4.1 Upper and Lower Bound on an Unsigned Reversal Distance 228
9.4.2 4-Approximation Algorithm for Sorting an Unsigned Permutation 229
9.4.3 2-Approximation Algorithm for Sorting an Unsigned Permutation 230
9.5 Sorting a Signed Permutation by Reversals 232
9.5.1 Upper Bound on Signed Reversal Distance 232
9.5.2 Elementary Intervals, Cycles, and Components 233
9.5.3 The Hannenhalli-Pevzner Theorem 238
9.6 Further Reading 243
9.7 Exercises 244
10 Motif Finding 247
10.1 Introduction 247
10.2 Identifying Binding Regions of TFs 248
10.3 Motif Model 250
10.4 The Motif Finding Problem 252
10.5 Scanning for Known Motifs 253
10.6 Statistical Approaches 254
10.6.1 Gibbs Motif Sampler 255
10.6.2 MEME 257
10.7 Combinatorial Approaches 260
10.7.1 Exhaustive Pattern-Driven Algorithm 261
10.7.2 Sample-Driven Approach 262
10.7.3 Suffix Tree-Based Algorithm 263
10.7.4 Graph-Based Method 265
10.8 Scoring Function 266
10.9 Motif Ensemble Methods 267
10.9.1 Approach of MotifVoter 268
10.9.2 Motif Filtering by the Discriminative and Consensus Criteria 268
10.9.3 Sites Extraction and Motif Generation 270
10.10 Can Motif Finders Discover the Correct Motifs? 271
10.11 Motif Finding Utilizing Additional Information 274
10.11.1 Regulatory Element Detection Using Correlation with Expression 274
10.11.2 Discovery of Regulatory Elements by Phylogenetic Footprinting 277
10.12 Exercises 279
11 RNA Secondary Structure
Prediction 281
11.1 Introduction 281
11.1.1 Base Interactions in RNA 282
11.1.2 RNA Structures 282
11.2 Obtaining RNA Secondary Structure Experimentally 285
11.3 RNA Structure Prediction Based on Sequence Only 286
11.4 Structure Prediction with the Assumption That There is No Pseudoknot 286
11.5 Nussinov Folding Algorithm 288
11.6 ZUKER Algorithm 290
11.6.1 Time Analysis 292
11.6.2 Speeding up Multi-Loops 292
11.6.3 Speeding up Internal Loops 294
11.7 Structure Prediction with Pseudoknots 296
11.7.1 Definition of a Simple Pseudoknot 296
11.7.2 Akutsu's Algorithm for Predicting an RNA Secondary Structure with Simple Pseudoknots 297
11.8 Exercises 300
12 Peptide Sequencing 305
12.1 Introduction 305
12.2 Obtaining the Mass Spectrum of a Peptide 306
12.3 Modeling the Mass Spectrum of a Fragmented Peptide 310
12.3.1 Amino Acid Residue Mass 310
12.3.2 Fragment Ion Mass 310
12.4 De Novo Peptide Sequencing Using Dynamic Programming 312
12.4.1 Scoring by Considering y-Ions 313
12.4.2 Scoring by Considering y-Ions and b-Ions 314
12.5 De Novo Sequencing Using Graph-Based Approach 317
12.6 Peptide Sequencing via Database Search 319
12.7 Further Reading 320
12.8 Exercises 321
13 Population Genetics 323
13.1 Introduction 323
13.1.1 Locus, Genotype, Allele, and SNP 323
13.1.2 Genotype Frequency and Allele Frequency 324
13.1.3 Haplotype and Phenotype 325
13.1.4 Technologies for Studying the Human Population . 325
13.1.5 Bioinformatics Problems 325
13.2 Hardy-Weinberg Equilibrium 326
13.3 Linkage Disequilibrium 327
13.3.1 D and D' 328
13.3.2 r² 328
13.4 Genotype Phasing 328
13.4.1 Clark's Algorithm 329
13.4.2 Perfect Phylogeny Haplotyping Problem 330
13.4.3 Maximum Likelihood Approach 334
13.4.4 Phase Algorithm 336
13.5 Tag SNP Selection 337
13.5.1 Zhang et al's Algorithm 338
13.5.2 IdSelect 339
13.6 Association Study 339
13.6.1 Categorical Data Analysis 340
13.6.2 Relative Risk and Odds Ratio 341
13.6.3 Linear Regression 342
13.6.4 Logistic Regression 343
13.7 Exercises 344.
Notes:
Includes bibliographical references and index.
Local Notes:
Acquired for the Penn Libraries with assistance from the Hazel M. Hussong Fund.
ISBN:
9781420070330
1420070339
OCLC:
429634761
Publisher Number:
99937177475

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account