My Account Log in

1 option

Data analysis : a gentle introduction for future data scientists / Graham Upton and Dan Brawn.

Oxford Scholarship Online: Mathematics Available online

View online
Format:
Book
Author/Creator:
Upton, Graham J. G., author.
Brawn, Dan, author.
Series:
Oxford scholarship online.
Oxford scholarship online
Language:
English
Subjects (All):
Mathematical statistics.
Probabilities.
Physical Description:
1 online resource (161 pages)
Edition:
First edition.
Place of Publication:
Oxford : Oxford University Press, 2023.
Summary:
Provides a very approachable guide to the techniques and basic ideas of probability and statistics and more advanced techniques such as generalised linear models, classification using logistic regression, and support-vector machines.
Contents:
Cover
Titlepage
Copyright
Contents
Preface
1 First steps
1.1 Types of data
1.2 Sample and population
1.2.1 Observations and random variables
1.2.2 Sampling variation
1.3 Methods for sampling a population
1.3.1 The simple random sample
1.3.2 Cluster sampling
1.3.3 Stratified sampling
1.3.4 Systematic sampling
1.4 Oversampling and the use of weights
2 Summarizing data
2.1 Measures of location
2.1.1 The mode
2.1.2 The mean
2.1.3 The trimmed mean
2.1.4 The Winsorized mean
2.1.5 The median
2.2 Measures of spread
2.2.1 The range
2.2.2 The interquartile range
2.3 Boxplot
2.4 Histograms
2.5 Cumulative frequency diagrams
2.6 Step diagrams
2.7 The variance and standard deviation
2.8 Symmetric and skewed data
3 Probability
3.1 Probability
3.2 The rules of probability
3.3 Conditional probability and independence
3.4 The total probability theorem
3.5 Bayes' theorem
4 Probability distributions
4.1 Notation
4.2 Mean and variance of a probability distribution
4.3 The relation between sample and population
4.4 Combining means and variances
4.5 Discrete uniform distribution
4.6 Probability density function
4.7 The continuous uniform distribution
5 Estimation and confidence
5.1 Point estimates
5.1.1 Maximum likelihood estimation (mle)
5.2 Confidence intervals
5.3 Confidence interval for the population mean
5.3.1 The normal distribution
5.3.2 The Central Limit Theorem
5.3.3 Construction of the confidence interval
5.4 Confidence interval for a proportion
5.4.1 The binomial distribution
5.4.2 Confidence interval for a proportion (large sample case)
5.4.3 Confidence interval for a proportion (small sample)
5.5 Confidence bounds for other summary statistics
5.5.1 The bootstrap.
5.6 Some other probability distributions
5.6.1 The Poisson and exponential distributions
5.6.2 The Weibull distribution
5.6.3 The chi-squared (χ2) distribution
6 Models, p-values, and hypotheses
6.1 Models
6.2 p-values and the null hypothesis
6.2.1 Two-sided or one-sided?
6.2.2 Interpreting p-values
6.2.3 Comparing p-values
6.2.4 Link with confidence interval
6.3 p-values when comparing two samples
6.3.1 Do the two samples come from the same population?
6.3.2 Do the two populations have the same mean?
7 Comparing proportions
7.1 The 2 2 table
7.2 Some terminology
7.2.1 Odds, odds ratios, and independence
7.2.2 Relative risk
7.2.3 Sensitivity, specificity, and related quantities
7.3 The R C table
7.3.1 Residuals
7.3.2 Partitioning
8 Relations between two continuous variables
8.1 Scatter diagrams
8.2 Correlation
8.2.1 Testing for independence
8.3 The equation of a line
8.4 The method of least squares
8.5 A random dependent variable, Y
8.5.1 Estimation of σ2
8.5.2 Confidence interval for the regression line
8.5.3 Prediction interval for future values
8.6 Departures from linearity
8.6.1 Transformations
8.6.2 Extrapolation
8.6.3 Outliers
8.7 Distinguishing x and Y
8.8 Why `regression'?
9 Several explanatory variables
9.1 AIC and related measures
9.2 Multiple regression
9.2.1 Two variables
9.2.2 Collinearity
9.2.3 Using a dummy variable
9.2.4 The use of multiple dummy variables
9.2.5 Model selection
9.2.6 Interactions
9.2.7 Residuals
9.3 Cross-validation
9.3.1 k-fold cross-validation
9.3.2 Leave-one-out cross-validation (LOOCV)
9.4 Reconciling bias and variability
9.5 Shrinkage
9.5.1 Standardization
9.6 Generalized linear models (GLMs)
9.6.1 Logistic regression
9.6.2 Loglinear models.
10 Classification
10.1 Naive Bayes classification
10.2 Classification using logistic regression
10.3 Classification trees
10.4 The random forest classifier
10.5 k-nearest neighbours (kNN)
10.6 Support-vector machines
10.7 Ensemble approaches
10.8 Combining variables
11 Last words
Further reading
Index.
Notes:
Also issued in print: 2023.
Includes bibliographical references and index.
Description based on online resource and publisher information; title from PDF title page (viewed on September 4, 2023).
ISBN:
9780191980855
0191980854
9780192885791
0192885790
OCLC:
1394117533

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account