My Account Log in

1 option

Data mining and data warehousing : principles and practical techniques / Parteek Bhatia.

Van Pelt Library QA76.9.D343 B435 2019
Loading location information...

Available This item is available for access.

Log in to request item
Format:
Book
Author/Creator:
Bhatia, Parteek, author.
Language:
English
Subjects (All):
Data mining--Textbooks.
Data mining.
Data warehousing--Textbooks.
Data warehousing.
Genre:
Textbooks.
Physical Description:
xxix, 477 pages ; 25 cm
Place of Publication:
Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2019.
Summary:
"This textbook is written to cater to the needs of undergraduate students of computer science, engineering, and information technology for a course on data mining and data warehousing. It brings together fundamental concepts of data mining and data warehousing in a single volume. Important topics including information theory, decision tree, Naïve Bayes classifier, distance metrics, partitioning clustering, associate mining, data marts and operational data store are discussed comprehensively. The text simplifies the understanding of the concepts through exercises and practical examples. Chapters such as classification, associate mining and cluster analysis are discussed in detail with their practical implementation using Weka and R language data mining tools. Advanced topics including big data analytics, relational data models, and NoSQL are discussed in detail. Unsolved problems and multiple-choice questions are interspersed throughout the book for better understanding"-- Provided by publisher.
Contents:
Machine generated contents note: 1.1. Introduction to Machine Learning
1.2. Applications of Machine Learning
1.3. Defining Machine Learning
1.4. Classification of Machine Learning Algorithms
1.4.1. Supervised learning
1.4.2. Unsupervised learning
1.4.3. Supervised and unsupervised learning in real life scenario
1.4.4. Reinforcement learning
2.1. Introduction to Data Mining
2.2. Need of Data Mining
2.3. What Can Data Mining Do and Not Do?
2.4. Data Mining Applications
2.5. Data Mining Process
2.6. Data Mining Techniques
2.6.1. Predictive modeling
2.6.2. Database segmentation
2.6.3. Link analysis
2.6.4. Deviation detection
2.7. Difference between Data Mining and Machine Learning
3.1. About Weka
3.2. Installing Weka
3.3. Understanding Fisher's Iris Flower Dataset
3.4. Preparing the Dataset
3.5. Understanding ARFF (Attribute Relation File Format)
3.5.1. ARFF header section
3.5.2. ARFF data section
3.6. Working with a Dataset in Weka
3.6.1. Removing input/output attributes
3.6.2. Histogram
3.6.3. Attribute statistics
3.6.4. ARFF Viewer
3.6.5. Visualizer
3.7. Introduction to R
3.7.1. Features of R
3.7.2. Installing R
3.8. Variable Assignment and Output Printing in R
3.9. Data Types
3.10. Basic Operators in R
3.10.1. Arithmetic operators
3.10.2. Relational operators
3.10.3. Logical operators
3.10.4. Assignment operators
3.11. Installing Packages
3.12. Loading of Data
3.12.1. Working with the Iris dataset in R
4.1. Need for Data Preprocessing
4.2. Data Preprocessing Methods
4.2.1. Data cleaning
4.2.2. Data integration
4.2.3. Data transformation
4.2.4. Data reduction
5.1. Introduction to Classification
5.2. Types of Classification
5.2.1. Posteriori classification
5.2.2. Priori classification
5.3. Input and Output Attributes
5.4. Working of Classification
5.5. Guidelines for Size and Quality of the Training Dataset
5.6. Introduction to the Decision Tree Classifier
5.6.1. Building decision tree
5.6.2. Concept of information theory
5.6.3. Defining information in terms of probability
5.6.4. Information gain
5.6.5. Building a decision tree for the example dataset
5.6.6. Drawbacks of information gain theory
5.6.7. Split algorithm based on Gini Index
5.6.8. Building a decision tree with Gini Index
5.6.9. Advantages of the decision tree method
5.6.10. Disadvantages of the decision tree
5.7. Naive Bayes Method
5.7.1. Applying Naive Bayes classifier to the 'Whether Play' dataset
5.7.2. Working of Naive Bayes classifier using the Laplace Estimator
5.8. Understanding Metrics to Assess the Quality of Classifiers
5.8.1. The boy who cried wolf
5.8.2. True positive
5.8.3. True negative
5.8.4. False positive
5.8.5. False negative
5.8.6. Confusion matrix
5.8.7. Precision
5.8.8. Recall
5.8.9. F-Measure
6.1. Building a Decision Tree Classifier in Weka
6.1.1. Steps to take when applying the decision tree classifier on the Iris dataset in Weka
6.1.2. Understanding the confusion matrix
6.1.3. Understanding the decision tree
6.1.4. Reading decision tree rules
6.1.5. Interpreting results
6.1.6. Using rules for prediction
6.2. Applying Naive Bayes
6.3. Creating the Testing Dataset
6.4. Decision Tree Operation with R
6.5. Naive Bayes Operation using R
7.1. Introduction to Cluster Analysis
7.2. Applications of Cluster Analysis
7.3. Desired Features of Clustering
7.4. Distance Metrics
7.4.1. Euclidean distance
7.4.2. Manhattan distance
7.4.3. Chebyshev distance
7.5. Major Clustering Methods/Algorithms
7.6. Partitioning Clustering
7.6.1. k-means clustering
7.6.2. Starting values for the k-means algorithm
7.6.3. Issues with the k-means algorithm
7.6.4. Scaling and weighting
7.7. Hierarchical Clustering Algorithms (HCA)
7.7.1. Agglomerative clustering
7.7.2. Divisive clustering
7.7.3. Density-based clustering
7.7.4. DBSCAN algorithm
7.7.5. Strengths of DBSCAN algorithm
7.7.6. Weakness of DBSCAN algorithm
8.1. Introduction
8.2. Clustering Fisher's Iris Dataset with the Simple k-Means Algorithm
8.3. Handling Missing Values
8.4. Results Analysis after Applying Clustering
8.4.1. Identification of centroids for each cluster
8.4.2. Concept of within cluster sum of squared error
8.4.3. Identification of the optimum number of clusters using within cluster sum of squared error
8.5. Classification of Unlabeled Data
8.5.1. Adding clusters to dataset
8.5.2. Applying the classification algorithm by using added cluster attribute as class attribute
8.5.3. Pruning the decision tree
8.6. Clustering in R using Simple k-Means
8.6.1. Comparison of clustering results with the original dataset
8.6.2. Adding generated clusters to the original dataset
8.6.3. Apply J48 on the clustered dataset
9.1. Introduction to Association Rule Mining
9.2. Defining Association Rule Mining
9.3. Representations of Items for Association Mining
9.4. The Metrics to Evaluate the Strength of Association Rules
9.4.1. Support
9.4.2. Confidence
9.4.3. Lift
9.5. The Naive Algorithm for Finding Association Rules
9.5.1. Working of the Naive algorithm
9.5.2. Limitations of the Naive algorithm
9.5.3. Improved Naive algorithm to deal with larger datasets
9.6. Approaches for Transaction Database Storage
9.6.1. Simple transaction storage
9.6.2. Horizontal storage
9.6.3. Vertical representation
9.7. The Apriori Algorithm
9.7.1. About the inventors of Apriori
9.7.2. Working of the Apriori algorithm
9.8. Closed and Maximal Itemsets
9.9. The Apriori-TID Algorithm for Generating Association Mining Rules
9.10. Direct Hashing and Pruning (DHP)
9.11. Dynamic Itemset Counting (DIC)
9.12. Mining Frequent Patterns without Candidate Generation (FP Growth)
9.12.1. Advantages of the FP-tree approach
9.12.2. Further improvements of FP growth
10.1. Association Mining with Weka
10.2. Applying Predictive Apriori in Weka
10.3. Rules Generation Similar to Classifier Using Predictive Apriori
10.4. Comparison of Association Mining CAR Rules with J48 Classifier Rules
10.5. Applying the Apriori Algorithm in Weka
10.6. Applying the Apriori Algorithm in Weka on a Real World Dataset
10.7. Applying the Apriori Algorithm in Weka on a Real World Larger Dataset
10.8. Applying the Apriori Algorithm on a Numeric Dataset
10.9. Process of Performing Manual Discretization
10.10. Applying Association Mining in R
10.11. Implementing Apriori Algorithm
10.12. Generation of Rules Similar to Classifier
10.13. Comparison of Association Mining CAR Rules with J48 Classifier Rules
10.14. Application of Association Mining on Numeric Data in R
11.1. Introduction
11.2. Web Content Mining
11.2.1. Web document clustering
11.2.2. Suffix Tree Clustering (STC)
11.2.3. Resemblance and containment
11.2.4. Fingerprinting
11.3. Web Usage Mining
11.4. Web Structure Mining
11.4.1. Hyperlink Induced Topic Search (HITS) algorithm
11.5. Introduction to Modern Search Engines
11.6. Working of a Search Engine
11.6.1. Web crawler
11.6.2. Indexer
11.6.3. Query processor
11.7. PageRank Algorithm
11.8. Precision and Recall
12.1. The Need for an Operational Data Store (ODS)
12.2. Operational Data Store
12.2.1. Types of ODS
12.2.2. Architecture of ODS
12.2.3. Advantages of the ODS
12.3. Data Warehouse
12.3.1. Historical developments in data warehousing
12.3.2. Defining data warehousing
12.3.3. Data warehouse architecture
12.3.4. Benefits of data warehousing
12.4. Data Marts
12.5. Comparative Study of Data Warehouse with OLTP and ODS
12.5.1. Data warehouses versus OLTP: similarities and distinction
13.1. Introduction to Data Warehouse Schema
13.1.1. Dimension
13.1.2. Measure
13.1.3. Fact Table
13.1.4. Multi-dimensional view of data
13.2. Star Schema
13.3. Snowflake Schema
13.4. Fact Constellation Schema (Galaxy Schema)
13.5. Comparison among Star, Snowflake and Fact Constellation Schema
14.1. Introduction to Online Analytical Processing
14.1.1. Defining OLAP
14.1.2. OLAP applications
14.1.3. Features of OLAP
14.1.4. OLAP Benefits
14.1.5. Strengths of OLAP
14.1.6. Comparison between OLTP and OLAP
14.1.7. Differences between OLAP and data mining
14.2. Representation of Multi-dimensional Data
14.2.1. Data Cube
14.3. Implementing Multi-dimensional View of Data in Oracle
14.4. Improving efficiency of OLAP by pre-computing the queries
14.5. Types of OLAP Servers
14.5.1. Relational OLAP
14.5.2. MOLAP
14.5.3. Comparison of ROLAP and MOLAP
14.6. OLAP Operations
14.6.1. Roll-up
14.6.2. Drill-down
14.6.3. Slice and dice
14.6.4. Dice
14.6.5. Pivot
15.1. The Rise of Relational Databases
15.2. Major Issues with Relational Databases
15.3. Challenges from the Internet Boom
15.3.1. The rapid growth of unstructured data
15.3.2. Types of data in the era of the Internet boom
Contents note continued: 15.4. Emergence of Big Data due to the Internet Boom
15.5. Possible Solutions to Handle Huge Amount of Data
15.6. The Emergence of Technologies for Cluster Environment
15.7. Birth of NoSQI
15.8. Defining NoSQL from the Characteristics it Shares
15.9. Some Misconceptions about NoSQL
15.10. Data Models of NoSQI
15.10.1. Key-value data model
15.10.2. Column-family data model
15.10.3. Document data model
15.10.4. Graph databases
15.11. Consistency in a Distributed Environment
15.12. CAP Theorem
15.13. Future of NoSQL
15.14. Difference between NoSQL and Relational Data Models (RDBMS).
Notes:
Includes bibliographical references and index.
ISBN:
9781108727747
1108727743
OCLC:
1055456089
Publisher Number:
99987421676

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account