1 option
Fundamentals of data science : theory and practice / Jugal K. Kalita, Dhruba K. Bhattacharyya, Swarup Roy.
- Format:
- Book
- Author/Creator:
- Kalita, Jugal Kumar, author.
- Bhattacharyya, Dhruba K., author.
- Roy, Swarup (Computer scientist), author.
- Language:
- English
- Subjects (All):
- Big data.
- Physical Description:
- 1 online resource
- Place of Publication:
- London ; San Diego, CA : Academic Press, [2024]
- Summary:
- Fundamentals of Data Science: Theory and Practice presents basic and advanced concepts in data science along with real-life applications. The book provides students, researchers and professionals at different levels a good understanding of the concepts of data science, machine learning, data mining and analytics. Users will find the authors' research experiences and achievements in data science applications, along with in-depth discussions on topics that are essential for data science projects, including pre-processing, that is carried out before applying predictive and descriptive data analysis tasks and proximity measures for numeric, categorical and mixed-type data.The book's authors include a systematic presentation of many predictive and descriptive learning algorithms, including recent developments that have successfully handled large datasets with high accuracy. In addition, a number of descriptive learning tasks are included.
- Contents:
- Front Cover
- Fundamentals of Data Science
- Copyright
- Contents
- Preface
- Acknowledgment
- Foreword
- 1 Introduction
- 1.1 Data, information, and knowledge
- 1.2 Data Science: the art of data exploration
- 1.2.1 Brief history
- 1.2.2 General pipeline
- 1.2.2.1 Data collection and integration
- 1.2.2.2 Data preparation
- 1.2.2.3 Learning-model construction
- 1.2.2.4 Knowledge interpretation and presentation
- 1.2.3 Multidisciplinary science
- 1.3 What is not Data Science?
- 1.4 Data Science tasks
- 1.4.1 Predictive Data Science
- 1.4.2 Descriptive Data Science
- 1.4.3 Diagnostic Data Science
- 1.4.4 Prescriptive Data Science
- 1.5 Data Science objectives
- 1.5.1 Hidden knowledge discovery
- 1.5.2 Prediction of likely outcomes
- 1.5.3 Grouping
- 1.5.4 Actionable information
- 1.6 Applications of Data Science
- 1.7 How to read the book?
- References
- 2 Data, sources, and generation
- 2.1 Introduction
- 2.2 Data attributes
- 2.2.1 Qualitative
- 2.2.1.1 Nominal
- 2.2.1.2 Binary
- 2.2.1.3 Ordinal
- 2.2.2 Quantitative
- 2.2.2.1 Discrete
- 2.2.2.2 Continuous
- 2.2.2.3 Interval
- 2.2.2.4 Ratio
- 2.3 Data-storage formats
- 2.3.1 Structured data
- 2.3.2 Unstructured data
- 2.3.3 Semistructured data
- 2.4 Data sources
- 2.4.1 Primary sources
- 2.4.2 Secondary sources
- 2.4.3 Popular data sources
- 2.4.4 Homogeneous vs. heterogeneous data sources
- 2.5 Data generation
- 2.5.1 Types of synthetic data
- 2.5.2 Data-generation steps
- 2.5.3 Generation methods
- 2.5.4 Tools for data generation
- 2.5.4.1 Software tools
- 2.5.4.2 Python libraries
- 2.6 Summary
- 3 Data preparation
- 3.1 Introduction
- 3.2 Data cleaning
- 3.2.1 Handling missing values
- 3.2.1.1 Ignoring and discarding data
- 3.2.1.2 Parameter estimation
- 3.2.1.3 Imputation
- 3.2.2 Duplicate-data detection
- 3.2.2.1 Knowledge-based methods
- 3.2.2.2 ETL method
- 3.3 Data reduction
- 3.3.1 Parametric data reduction
- 3.3.2 Sampling
- 3.3.3 Dimensionality reduction
- 3.4 Data transformation
- 3.4.1 Discretization
- 3.4.1.1 Supervised discretization
- 3.4.1.2 Unsupervised discretization
- 3.5 Data normalization
- 3.5.1 Min-max normalization
- 3.5.2 Z-score normalization
- 3.5.3 Decimal-scaling normalization
- 3.5.4 Quantile normalization
- 3.5.5 Logarithmic normalization
- 3.6 Data integration
- 3.6.1 Consolidation
- 3.6.2 Federation
- 3.7 Summary
- 4 Machine learning
- 4.1 Introduction
- 4.2 Machine Learning paradigms
- 4.2.1 Supervised learning
- 4.2.2 Unsupervised learning
- 4.2.3 Semisupervised learning
- 4.3 Inductive bias
- 4.4 Evaluating a classifier
- 4.4.1 Evaluation steps
- 4.4.1.1 Validation
- 4.4.1.2 Testing
- 4.4.1.3 K-fold crossvalidation
- 4.4.2 Handling unbalanced classes
- 4.4.3 Model generalization
- 4.4.3.1 Underfitting
- 4.4.3.2 Overfitting
- 4.4.3.3 Accurate fittings
- Notes:
- OCLC-licensed vendor bibliographic record.
- ISBN:
- 0-323-91778-X
- 0-323-97263-2
- OCLC:
- 1410389582
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.