1 option
Data analysis : a gentle introduction for future data scientists / Graham Upton and Dan Brawn.
- Format:
- Book
- Author/Creator:
- Upton, Graham J. G., author.
- Brawn, Dan, author.
- Series:
- Oxford scholarship online.
- Oxford scholarship online
- Language:
- English
- Subjects (All):
- Mathematical statistics.
- Probabilities.
- Physical Description:
- 1 online resource (161 pages)
- Edition:
- First edition.
- Place of Publication:
- Oxford : Oxford University Press, 2023.
- Summary:
- Provides a very approachable guide to the techniques and basic ideas of probability and statistics and more advanced techniques such as generalised linear models, classification using logistic regression, and support-vector machines.
- Contents:
- Cover
- Titlepage
- Copyright
- Contents
- Preface
- 1 First steps
- 1.1 Types of data
- 1.2 Sample and population
- 1.2.1 Observations and random variables
- 1.2.2 Sampling variation
- 1.3 Methods for sampling a population
- 1.3.1 The simple random sample
- 1.3.2 Cluster sampling
- 1.3.3 Stratified sampling
- 1.3.4 Systematic sampling
- 1.4 Oversampling and the use of weights
- 2 Summarizing data
- 2.1 Measures of location
- 2.1.1 The mode
- 2.1.2 The mean
- 2.1.3 The trimmed mean
- 2.1.4 The Winsorized mean
- 2.1.5 The median
- 2.2 Measures of spread
- 2.2.1 The range
- 2.2.2 The interquartile range
- 2.3 Boxplot
- 2.4 Histograms
- 2.5 Cumulative frequency diagrams
- 2.6 Step diagrams
- 2.7 The variance and standard deviation
- 2.8 Symmetric and skewed data
- 3 Probability
- 3.1 Probability
- 3.2 The rules of probability
- 3.3 Conditional probability and independence
- 3.4 The total probability theorem
- 3.5 Bayes' theorem
- 4 Probability distributions
- 4.1 Notation
- 4.2 Mean and variance of a probability distribution
- 4.3 The relation between sample and population
- 4.4 Combining means and variances
- 4.5 Discrete uniform distribution
- 4.6 Probability density function
- 4.7 The continuous uniform distribution
- 5 Estimation and confidence
- 5.1 Point estimates
- 5.1.1 Maximum likelihood estimation (mle)
- 5.2 Confidence intervals
- 5.3 Confidence interval for the population mean
- 5.3.1 The normal distribution
- 5.3.2 The Central Limit Theorem
- 5.3.3 Construction of the confidence interval
- 5.4 Confidence interval for a proportion
- 5.4.1 The binomial distribution
- 5.4.2 Confidence interval for a proportion (large sample case)
- 5.4.3 Confidence interval for a proportion (small sample)
- 5.5 Confidence bounds for other summary statistics
- 5.5.1 The bootstrap.
- 5.6 Some other probability distributions
- 5.6.1 The Poisson and exponential distributions
- 5.6.2 The Weibull distribution
- 5.6.3 The chi-squared (χ2) distribution
- 6 Models, p-values, and hypotheses
- 6.1 Models
- 6.2 p-values and the null hypothesis
- 6.2.1 Two-sided or one-sided?
- 6.2.2 Interpreting p-values
- 6.2.3 Comparing p-values
- 6.2.4 Link with confidence interval
- 6.3 p-values when comparing two samples
- 6.3.1 Do the two samples come from the same population?
- 6.3.2 Do the two populations have the same mean?
- 7 Comparing proportions
- 7.1 The 2 2 table
- 7.2 Some terminology
- 7.2.1 Odds, odds ratios, and independence
- 7.2.2 Relative risk
- 7.2.3 Sensitivity, specificity, and related quantities
- 7.3 The R C table
- 7.3.1 Residuals
- 7.3.2 Partitioning
- 8 Relations between two continuous variables
- 8.1 Scatter diagrams
- 8.2 Correlation
- 8.2.1 Testing for independence
- 8.3 The equation of a line
- 8.4 The method of least squares
- 8.5 A random dependent variable, Y
- 8.5.1 Estimation of σ2
- 8.5.2 Confidence interval for the regression line
- 8.5.3 Prediction interval for future values
- 8.6 Departures from linearity
- 8.6.1 Transformations
- 8.6.2 Extrapolation
- 8.6.3 Outliers
- 8.7 Distinguishing x and Y
- 8.8 Why `regression'?
- 9 Several explanatory variables
- 9.1 AIC and related measures
- 9.2 Multiple regression
- 9.2.1 Two variables
- 9.2.2 Collinearity
- 9.2.3 Using a dummy variable
- 9.2.4 The use of multiple dummy variables
- 9.2.5 Model selection
- 9.2.6 Interactions
- 9.2.7 Residuals
- 9.3 Cross-validation
- 9.3.1 k-fold cross-validation
- 9.3.2 Leave-one-out cross-validation (LOOCV)
- 9.4 Reconciling bias and variability
- 9.5 Shrinkage
- 9.5.1 Standardization
- 9.6 Generalized linear models (GLMs)
- 9.6.1 Logistic regression
- 9.6.2 Loglinear models.
- 10 Classification
- 10.1 Naive Bayes classification
- 10.2 Classification using logistic regression
- 10.3 Classification trees
- 10.4 The random forest classifier
- 10.5 k-nearest neighbours (kNN)
- 10.6 Support-vector machines
- 10.7 Ensemble approaches
- 10.8 Combining variables
- 11 Last words
- Further reading
- Index.
- Notes:
- Also issued in print: 2023.
- Includes bibliographical references and index.
- Description based on online resource and publisher information; title from PDF title page (viewed on September 4, 2023).
- ISBN:
- 9780191980855
- 0191980854
- 9780192885791
- 0192885790
- OCLC:
- 1394117533
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.