1 option
Fundamentals of data science : theory and practice / Jugal K. Kalita, Dhruba K. Bhattacharyya, Swarup Roy.
- Format:
- Book
- Author/Creator:
- Kalita, Jugal Kumar, author.
- Bhattacharyya, Dhruba K., author.
- Roy, Swarup (Computer scientist), author.
- Language:
- English
- Subjects (All):
- Big data.
- Physical Description:
- 1 online resource
- Place of Publication:
- London ; San Diego, CA : Academic Press, [2024]
- Contents:
- Front Cover
- Fundamentals of Data Science
- Copyright
- Contents
- Preface
- Acknowledgment
- Foreword
- 1 Introduction
- 1.1 Data, information, and knowledge
- 1.2 Data Science: the art of data exploration
- 1.2.1 Brief history
- 1.2.2 General pipeline
- 1.2.2.1 Data collection and integration
- 1.2.2.2 Data preparation
- 1.2.2.3 Learning-model construction
- 1.2.2.4 Knowledge interpretation and presentation
- 1.2.3 Multidisciplinary science
- 1.3 What is not Data Science?
- 1.4 Data Science tasks
- 1.4.1 Predictive Data Science
- 1.4.2 Descriptive Data Science
- 1.4.3 Diagnostic Data Science
- 1.4.4 Prescriptive Data Science
- 1.5 Data Science objectives
- 1.5.1 Hidden knowledge discovery
- 1.5.2 Prediction of likely outcomes
- 1.5.3 Grouping
- 1.5.4 Actionable information
- 1.6 Applications of Data Science
- 1.7 How to read the book?
- References
- 2 Data, sources, and generation
- 2.1 Introduction
- 2.2 Data attributes
- 2.2.1 Qualitative
- 2.2.1.1 Nominal
- 2.2.1.2 Binary
- 2.2.1.3 Ordinal
- 2.2.2 Quantitative
- 2.2.2.1 Discrete
- 2.2.2.2 Continuous
- 2.2.2.3 Interval
- 2.2.2.4 Ratio
- 2.3 Data-storage formats
- 2.3.1 Structured data
- 2.3.2 Unstructured data
- 2.3.3 Semistructured data
- 2.4 Data sources
- 2.4.1 Primary sources
- 2.4.2 Secondary sources
- 2.4.3 Popular data sources
- 2.4.4 Homogeneous vs. heterogeneous data sources
- 2.5 Data generation
- 2.5.1 Types of synthetic data
- 2.5.2 Data-generation steps
- 2.5.3 Generation methods
- 2.5.4 Tools for data generation
- 2.5.4.1 Software tools
- 2.5.4.2 Python libraries
- 2.6 Summary
- 3 Data preparation
- 3.1 Introduction
- 3.2 Data cleaning
- 3.2.1 Handling missing values
- 3.2.1.1 Ignoring and discarding data
- 3.2.1.2 Parameter estimation
- 3.2.1.3 Imputation
- 3.2.2 Duplicate-data detection
- 3.2.2.1 Knowledge-based methods
- 3.2.2.2 ETL method
- 3.3 Data reduction
- 3.3.1 Parametric data reduction
- 3.3.2 Sampling
- 3.3.3 Dimensionality reduction
- 3.4 Data transformation
- 3.4.1 Discretization
- 3.4.1.1 Supervised discretization
- 3.4.1.2 Unsupervised discretization
- 3.5 Data normalization
- 3.5.1 Min-max normalization
- 3.5.2 Z-score normalization
- 3.5.3 Decimal-scaling normalization
- 3.5.4 Quantile normalization
- 3.5.5 Logarithmic normalization
- 3.6 Data integration
- 3.6.1 Consolidation
- 3.6.2 Federation
- 3.7 Summary
- 4 Machine learning
- 4.1 Introduction
- 4.2 Machine Learning paradigms
- 4.2.1 Supervised learning
- 4.2.2 Unsupervised learning
- 4.2.3 Semisupervised learning
- 4.3 Inductive bias
- 4.4 Evaluating a classifier
- 4.4.1 Evaluation steps
- 4.4.1.1 Validation
- 4.4.1.2 Testing
- 4.4.1.3 K-fold crossvalidation
- 4.4.2 Handling unbalanced classes
- 4.4.3 Model generalization
- 4.4.3.1 Underfitting
- 4.4.3.2 Overfitting
- 4.4.3.3 Accurate fittings
- Notes:
- Includes bibliographical references and index.
- Electronic reproduction. Amsterdam Available via World Wide Web.
- Description based on online resource; title from digital title page (viewed on February 20, 2024).
- Local Notes:
- Acquired for the Penn Libraries with assistance from the Rosengarten Family Fund.
- ISBN:
- 9780323972635
- 0323972632
- Publisher Number:
- 99996333416
- Access Restriction:
- Restricted for use by site license.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.