1 option

Data mining and data warehousing : principles and practical techniques / Parteek Bhatia.

QA76.9.D343 B435 2019

Loading location information...

Available This item is available for access.

Format:: Book
Author/Creator:: Bhatia, Parteek, author.
Language:: English
Subjects (All):: Data mining--Textbooks.; Data mining.; Data warehousing--Textbooks.; Data warehousing.
Genre:: Textbooks.
Physical Description:: xxix, 477 pages ; 25 cm
Place of Publication:: Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2019.
Summary:: "This textbook is written to cater to the needs of undergraduate students of computer science, engineering, and information technology for a course on data mining and data warehousing. It brings together fundamental concepts of data mining and data warehousing in a single volume. Important topics including information theory, decision tree, Naïve Bayes classifier, distance metrics, partitioning clustering, associate mining, data marts and operational data store are discussed comprehensively. The text simplifies the understanding of the concepts through exercises and practical examples. Chapters such as classification, associate mining and cluster analysis are discussed in detail with their practical implementation using Weka and R language data mining tools. Advanced topics including big data analytics, relational data models, and NoSQL are discussed in detail. Unsolved problems and multiple-choice questions are interspersed throughout the book for better understanding"-- Provided by publisher.
Contents:: Machine generated contents note: 1.1. Introduction to Machine Learning; 1.2. Applications of Machine Learning; 1.3. Defining Machine Learning; 1.4. Classification of Machine Learning Algorithms; 1.4.1. Supervised learning; 1.4.2. Unsupervised learning; 1.4.3. Supervised and unsupervised learning in real life scenario; 1.4.4. Reinforcement learning; 2.1. Introduction to Data Mining; 2.2. Need of Data Mining; 2.3. What Can Data Mining Do and Not Do?; 2.4. Data Mining Applications; 2.5. Data Mining Process; 2.6. Data Mining Techniques; 2.6.1. Predictive modeling; 2.6.2. Database segmentation; 2.6.3. Link analysis; 2.6.4. Deviation detection; 2.7. Difference between Data Mining and Machine Learning; 3.1. About Weka; 3.2. Installing Weka; 3.3. Understanding Fisher's Iris Flower Dataset; 3.4. Preparing the Dataset; 3.5. Understanding ARFF (Attribute Relation File Format); 3.5.1. ARFF header section; 3.5.2. ARFF data section; 3.6. Working with a Dataset in Weka; 3.6.1. Removing input/output attributes; 3.6.2. Histogram; 3.6.3. Attribute statistics; 3.6.4. ARFF Viewer; 3.6.5. Visualizer; 3.7. Introduction to R; 3.7.1. Features of R; 3.7.2. Installing R; 3.8. Variable Assignment and Output Printing in R; 3.9. Data Types; 3.10. Basic Operators in R; 3.10.1. Arithmetic operators; 3.10.2. Relational operators; 3.10.3. Logical operators; 3.10.4. Assignment operators; 3.11. Installing Packages; 3.12. Loading of Data; 3.12.1. Working with the Iris dataset in R; 4.1. Need for Data Preprocessing; 4.2. Data Preprocessing Methods; 4.2.1. Data cleaning; 4.2.2. Data integration; 4.2.3. Data transformation; 4.2.4. Data reduction; 5.1. Introduction to Classification; 5.2. Types of Classification; 5.2.1. Posteriori classification; 5.2.2. Priori classification; 5.3. Input and Output Attributes; 5.4. Working of Classification; 5.5. Guidelines for Size and Quality of the Training Dataset; 5.6. Introduction to the Decision Tree Classifier; 5.6.1. Building decision tree; 5.6.2. Concept of information theory; 5.6.3. Defining information in terms of probability; 5.6.4. Information gain; 5.6.5. Building a decision tree for the example dataset; 5.6.6. Drawbacks of information gain theory; 5.6.7. Split algorithm based on Gini Index; 5.6.8. Building a decision tree with Gini Index; 5.6.9. Advantages of the decision tree method; 5.6.10. Disadvantages of the decision tree; 5.7. Naive Bayes Method; 5.7.1. Applying Naive Bayes classifier to the 'Whether Play' dataset; 5.7.2. Working of Naive Bayes classifier using the Laplace Estimator; 5.8. Understanding Metrics to Assess the Quality of Classifiers; 5.8.1. The boy who cried wolf; 5.8.2. True positive; 5.8.3. True negative; 5.8.4. False positive; 5.8.5. False negative; 5.8.6. Confusion matrix; 5.8.7. Precision; 5.8.8. Recall; 5.8.9. F-Measure; 6.1. Building a Decision Tree Classifier in Weka; 6.1.1. Steps to take when applying the decision tree classifier on the Iris dataset in Weka; 6.1.2. Understanding the confusion matrix; 6.1.3. Understanding the decision tree; 6.1.4. Reading decision tree rules; 6.1.5. Interpreting results; 6.1.6. Using rules for prediction; 6.2. Applying Naive Bayes; 6.3. Creating the Testing Dataset; 6.4. Decision Tree Operation with R; 6.5. Naive Bayes Operation using R; 7.1. Introduction to Cluster Analysis; 7.2. Applications of Cluster Analysis; 7.3. Desired Features of Clustering; 7.4. Distance Metrics; 7.4.1. Euclidean distance; 7.4.2. Manhattan distance; 7.4.3. Chebyshev distance; 7.5. Major Clustering Methods/Algorithms; 7.6. Partitioning Clustering; 7.6.1. k-means clustering; 7.6.2. Starting values for the k-means algorithm; 7.6.3. Issues with the k-means algorithm; 7.6.4. Scaling and weighting; 7.7. Hierarchical Clustering Algorithms (HCA); 7.7.1. Agglomerative clustering; 7.7.2. Divisive clustering; 7.7.3. Density-based clustering; 7.7.4. DBSCAN algorithm; 7.7.5. Strengths of DBSCAN algorithm; 7.7.6. Weakness of DBSCAN algorithm; 8.1. Introduction; 8.2. Clustering Fisher's Iris Dataset with the Simple k-Means Algorithm; 8.3. Handling Missing Values; 8.4. Results Analysis after Applying Clustering; 8.4.1. Identification of centroids for each cluster; 8.4.2. Concept of within cluster sum of squared error; 8.4.3. Identification of the optimum number of clusters using within cluster sum of squared error; 8.5. Classification of Unlabeled Data; 8.5.1. Adding clusters to dataset; 8.5.2. Applying the classification algorithm by using added cluster attribute as class attribute; 8.5.3. Pruning the decision tree; 8.6. Clustering in R using Simple k-Means; 8.6.1. Comparison of clustering results with the original dataset; 8.6.2. Adding generated clusters to the original dataset; 8.6.3. Apply J48 on the clustered dataset; 9.1. Introduction to Association Rule Mining; 9.2. Defining Association Rule Mining; 9.3. Representations of Items for Association Mining; 9.4. The Metrics to Evaluate the Strength of Association Rules; 9.4.1. Support; 9.4.2. Confidence; 9.4.3. Lift; 9.5. The Naive Algorithm for Finding Association Rules; 9.5.1. Working of the Naive algorithm; 9.5.2. Limitations of the Naive algorithm; 9.5.3. Improved Naive algorithm to deal with larger datasets; 9.6. Approaches for Transaction Database Storage; 9.6.1. Simple transaction storage; 9.6.2. Horizontal storage; 9.6.3. Vertical representation; 9.7. The Apriori Algorithm; 9.7.1. About the inventors of Apriori; 9.7.2. Working of the Apriori algorithm; 9.8. Closed and Maximal Itemsets; 9.9. The Apriori-TID Algorithm for Generating Association Mining Rules; 9.10. Direct Hashing and Pruning (DHP); 9.11. Dynamic Itemset Counting (DIC); 9.12. Mining Frequent Patterns without Candidate Generation (FP Growth); 9.12.1. Advantages of the FP-tree approach; 9.12.2. Further improvements of FP growth; 10.1. Association Mining with Weka; 10.2. Applying Predictive Apriori in Weka; 10.3. Rules Generation Similar to Classifier Using Predictive Apriori; 10.4. Comparison of Association Mining CAR Rules with J48 Classifier Rules; 10.5. Applying the Apriori Algorithm in Weka; 10.6. Applying the Apriori Algorithm in Weka on a Real World Dataset; 10.7. Applying the Apriori Algorithm in Weka on a Real World Larger Dataset; 10.8. Applying the Apriori Algorithm on a Numeric Dataset; 10.9. Process of Performing Manual Discretization; 10.10. Applying Association Mining in R; 10.11. Implementing Apriori Algorithm; 10.12. Generation of Rules Similar to Classifier; 10.13. Comparison of Association Mining CAR Rules with J48 Classifier Rules; 10.14. Application of Association Mining on Numeric Data in R; 11.1. Introduction; 11.2. Web Content Mining; 11.2.1. Web document clustering; 11.2.2. Suffix Tree Clustering (STC); 11.2.3. Resemblance and containment; 11.2.4. Fingerprinting; 11.3. Web Usage Mining; 11.4. Web Structure Mining; 11.4.1. Hyperlink Induced Topic Search (HITS) algorithm; 11.5. Introduction to Modern Search Engines; 11.6. Working of a Search Engine; 11.6.1. Web crawler; 11.6.2. Indexer; 11.6.3. Query processor; 11.7. PageRank Algorithm; 11.8. Precision and Recall; 12.1. The Need for an Operational Data Store (ODS); 12.2. Operational Data Store; 12.2.1. Types of ODS; 12.2.2. Architecture of ODS; 12.2.3. Advantages of the ODS; 12.3. Data Warehouse; 12.3.1. Historical developments in data warehousing; 12.3.2. Defining data warehousing; 12.3.3. Data warehouse architecture; 12.3.4. Benefits of data warehousing; 12.4. Data Marts; 12.5. Comparative Study of Data Warehouse with OLTP and ODS; 12.5.1. Data warehouses versus OLTP: similarities and distinction; 13.1. Introduction to Data Warehouse Schema; 13.1.1. Dimension; 13.1.2. Measure; 13.1.3. Fact Table; 13.1.4. Multi-dimensional view of data; 13.2. Star Schema; 13.3. Snowflake Schema; 13.4. Fact Constellation Schema (Galaxy Schema); 13.5. Comparison among Star, Snowflake and Fact Constellation Schema; 14.1. Introduction to Online Analytical Processing; 14.1.1. Defining OLAP; 14.1.2. OLAP applications; 14.1.3. Features of OLAP; 14.1.4. OLAP Benefits; 14.1.5. Strengths of OLAP; 14.1.6. Comparison between OLTP and OLAP; 14.1.7. Differences between OLAP and data mining; 14.2. Representation of Multi-dimensional Data; 14.2.1. Data Cube; 14.3. Implementing Multi-dimensional View of Data in Oracle; 14.4. Improving efficiency of OLAP by pre-computing the queries; 14.5. Types of OLAP Servers; 14.5.1. Relational OLAP; 14.5.2. MOLAP; 14.5.3. Comparison of ROLAP and MOLAP; 14.6. OLAP Operations; 14.6.1. Roll-up; 14.6.2. Drill-down; 14.6.3. Slice and dice; 14.6.4. Dice; 14.6.5. Pivot; 15.1. The Rise of Relational Databases; 15.2. Major Issues with Relational Databases; 15.3. Challenges from the Internet Boom; 15.3.1. The rapid growth of unstructured data; 15.3.2. Types of data in the era of the Internet boom; Contents note continued: 15.4. Emergence of Big Data due to the Internet Boom; 15.5. Possible Solutions to Handle Huge Amount of Data; 15.6. The Emergence of Technologies for Cluster Environment; 15.7. Birth of NoSQI; 15.8. Defining NoSQL from the Characteristics it Shares; 15.9. Some Misconceptions about NoSQL; 15.10. Data Models of NoSQI; 15.10.1. Key-value data model; 15.10.2. Column-family data model; 15.10.3. Document data model; 15.10.4. Graph databases; 15.11. Consistency in a Distributed Environment; 15.12. CAP Theorem; 15.13. Future of NoSQL; 15.14. Difference between NoSQL and Relational Data Models (RDBMS).
Notes:: Includes bibliographical references and index.
ISBN:: 9781108727747; 1108727743
OCLC:: 1055456089
Publisher Number:: 99987421676

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Data mining and data warehousing : principles and practical techniques / Parteek Bhatia.

Find

My Account

Guides