1 option

Mathematical foundations for data analysis / Jeff M. Phillips.

Springer Nature - Springer Mathematics and Statistics eBooks 2021 English International Available online

Format:: Book
Author/Creator:: Phillips, Jeff M., author.
Series:: Springer series in the data sciences.; Springer Series in the Data Sciences
Language:: English
Subjects (All):: Data mining--Mathematics.; Data mining.; Machine learning--Mathematics.; Machine learning.
Physical Description:: 1 online resource (299 pages).
Place of Publication:: Cham, Switzerland : Springer, [2021]
Summary:: This textbook, suitable for an early undergraduate up to a graduate course, provides an overview of many basic principles and techniques needed for modern data analysis. In particular, this book was designed and written as preparation for students planning to take rigorous Machine Learning and Data Mining courses. It introduces key conceptual tools necessary for data analysis, including concentration of measure and PAC bounds, cross validation, gradient descent, and principal component analysis. It also surveys basic techniques in supervised (regression and classification) and unsupervised learning (dimensionality reduction and clustering) through an accessible, simplified presentation. Students are recommended to have some background in calculus, probability, and linear algebra. Some familiarity with programming and algorithms is useful to understand advanced topics on computational techniques.
Contents:: Intro; Preface; Acknowledgements; Contents; 1 Probability Review; 1.1 Sample Spaces; 1.2 Conditional Probability and Independence; 1.3 Density Functions; 1.4 Expected Value; 1.5 Variance; 1.6 Joint, Marginal, and Conditional Distributions; 1.7 Bayes' Rule; 1.7.1 Model Given Data; 1.8 Bayesian Inference; Exercises; 2 Convergence and Sampling; 2.1 Sampling and Estimation; 2.2 Probably Approximately Correct (PAC); 2.3 Concentration of Measure; 2.3.1 Markov Inequality; 2.3.2 Chebyshev Inequality; 2.3.3 Chernoff-Hoeffding Inequality; 2.3.4 Union Bound and Examples; 2.4 Importance Sampling; 2.4.1 Sampling Without Replacement with Priority Sampling; 3 Linear Algebra Review; 3.1 Vectors and Matrices; 3.2 Addition and Multiplication; 3.3 Norms; 3.4 Linear Independence; 3.5 Rank; 3.6 Square Matrices and Properties; 3.7 Orthogonality; 4 Distances and Nearest Neighbors; 4.1 Metrics; 4.2 Lp Distances and their Relatives; 4.2.1 Lp Distances; 4.2.2 Mahalanobis Distance; 4.2.3 Cosine and Angular Distance; 4.2.4 KL Divergence; 4.3 Distances for Sets and Strings; 4.3.1 Jaccard Distance; 4.3.2 Edit Distance; 4.4 Modeling Text with Distances; 4.4.1 Bag-of-Words Vectors; 4.4.2 k-Grams; 4.5 Similarities; 4.5.1 Set Similarities; 4.5.2 Normed Similarities; 4.5.3 Normed Similarities between Sets; 4.6 Locality Sensitive Hashing; 4.6.1 Properties of Locality Sensitive Hashing; 4.6.2 Prototypical Tasks for LSH; 4.6.3 Banding to Amplify LSH; 4.6.4 LSH for Angular Distance; 4.6.5 LSH for Euclidean Distance; 4.6.6 Min Hashing as LSH for Jaccard Distance; 5 Linear Regression; 5.1 Simple Linear Regression; 5.2 Linear Regression with Multiple Explanatory Variables; 5.3 Polynomial Regression; 5.4 Cross-Validation.; 5.4.1 Other ways to Evaluate Linear Regression Models; 5.5 Regularized Regression; 5.5.1 Tikhonov Regularization for Ridge Regression; 5.5.2 Lasso; 5.5.3 Dual Constrained Formulation; 5.5.4 Matching Pursuit; 6 Gradient Descent; 6.1 Functions; 6.2 Gradients; 6.3 Gradient Descent; 6.3.1 Learning Rate; 6.4 Fitting a Model to Data; 6.4.1 Least Mean Squares Updates for Regression; 6.4.2 Decomposable Functions; 7 Dimensionality Reduction; 7.1 Data Matrices; 7.1.1 Projections; 7.1.2 Sum of Squared Errors Goal; 7.2 Singular Value Decomposition; 7.2.1 Best Rank-k Approximation of a Matrix; 7.3 Eigenvalues and Eigenvectors; 7.4 The Power Method; 7.5 Principal Component Analysis; 7.6 Multidimensional Scaling; 7.6.1 Why does Classical MDS work?; 7.7 Linear Discriminant Analysis; 7.8 Distance Metric Learning; 7.9 Matrix Completion; 7.10 Random Projections; 8 Clustering; 8.1 Voronoi Diagrams; 8.1.1 Delaunay Triangulation; 8.1.2 Connection to Assignment-Based Clustering; 8.2 Gonzalez's Algorithm for k-Center Clustering; 8.3 Lloyd's Algorithm for k-Means Clustering; 8.3.1 Lloyd's Algorithm; 8.3.2 k-Means++; 8.3.3 k-Mediod Clustering; 8.3.4 Soft Clustering; 8.4 Mixture of Gaussians; 8.4.1 Expectation-Maximization; 8.5 Hierarchical Clustering; 8.6 Density-Based Clustering and Outliers; 8.6.1 Outliers; 8.7 Mean Shift Clustering; 9 Classification; 9.1 Linear Classifiers; 9.1.1 Loss Functions; 9.1.2 Cross-Validation and Regularization; 9.2 Perceptron Algorithm; 9.3 Support Vector Machines and Kernels; 9.3.1 The Dual: Mistake Counter; 9.3.2 Feature Expansion; 9.3.3 Support Vector Machines; 9.4 Learnability and VC dimension; 9.5 kNN Classifiers; 9.6 Decision Trees; 9.7 Neural Networks.; 9.7.1 Training with Back-propagation; 10 Graph Structured Data; 10.1 Markov Chains; 10.1.1 Ergodic Markov Chains; 10.1.2 Metropolis Algorithm; 10.2 PageRank; 10.3 Spectral Clustering on Graphs; 10.3.1 Laplacians and their EigenStructures; 10.4 Communities in Graphs; 10.4.1 Preferential Attachment; 10.4.2 Betweenness; 10.4.3 Modularity; 11 Big Data and Sketching; 11.1 The Streaming Model; 11.1.1 Mean and Variance; 11.1.2 Reservoir Sampling; 11.2 Frequent Items; 11.2.1 Warm-Up: Majority; 11.2.2 Misra-Gries Algorithm; 11.2.3 Count-Min Sketch; 11.2.4 Count Sketch; 11.3 Matrix Sketching; 11.3.1 Covariance Matrix Summation; 11.3.2 Frequent Directions; 11.3.3 Row Sampling; 11.3.4 Random Projections and Count Sketch Hashing; Index.
Notes:: Description based on print version record.
ISBN:: 3-030-62341-6
OCLC:: 1244535120

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Mathematical foundations for data analysis / Jeff M. Phillips.

Find

My Account

Guides