My Account Log in

1 option

Text Mining : Concepts, Implementation, and Big Data Challenge.

Springer eBooks EBA - Intelligent Technologies and Robotics Collection 2024 Available online

View online
Format:
Book
Author/Creator:
Jo, Taeho.
Series:
Studies in Big Data Series
Studies in Big Data Series ; v.45
Language:
English
Physical Description:
1 online resource (451 pages)
Edition:
2nd ed.
Place of Publication:
Cham : Springer, 2025.
Summary:
This popular book, updated as a textbook for classroom use, discusses text mining and different ways this type of data mining can be used to find implicit knowledge from text collections. The author provides the guidelines for implementing text mining systems in Java, as well as concepts and approaches. The book starts by providing detailed text preprocessing techniques and then goes on to provide concepts, the techniques, the implementation, and the evaluation of text categorization. It then goes into more advanced topics including text summarization, text segmentation, topic mapping, and automatic text management. The book features exercises and code to help readers quickly learn and apply knowledge.
Contents:
Intro
Preface
Contents
Part I Foundation
1 Introduction
1.1 Definition of Text Mining
1.2 Texts
1.2.1 Text Components
1.2.2 Text Formats
1.3 Data Mining Tasks
1.3.1 Classification
1.3.2 Clustering
1.3.3 Association
1.4 Data Mining Types
1.4.1 Relational Data Mining
1.4.2 Web Mining
1.4.3 Big Data Mining
1.5 Summary
References
2 Text Indexing
2.1 Overview of Text Indexing
2.2 Steps of Text Indexing
2.2.1 Tokenization
2.2.2 Stemming
2.2.3 Stop-Word Removal
2.2.4 Term Weighting
2.3 Text Indexing: Implementation
2.3.1 Class Definition
2.3.2 Stemming Rule
2.3.3 Method Implementations
2.4 Additional Steps
2.4.1 Index Filtering
2.4.2 Index Expansion
2.4.3 Index Optimization
2.5 Summary
3 Text Encoding
3.1 Overview of Text Encoding
3.2 Feature Selection
3.2.1 Wrapper Approach
3.2.2 Principal Component Analysis
3.2.3 Independent Component Analysis
3.2.4 Singular Value Decomposition
3.3 Feature Value Assignment
3.3.1 Assignment Schemes
3.3.2 Similarity Computation
3.4 Issues of Text Encoding
3.4.1 Huge Dimensionality
3.4.2 Sparse Distribution
3.4.3 Poor Transparency
3.5 Summary
4 Text Association
4.1 Overview of Text Association
4.2 Data Association
4.2.1 Functional View
4.2.2 Support and Confidence
4.2.3 Apriori Algorithm
4.3 Word Association
4.3.1 Word Text Matrix
4.3.2 Functional View
4.3.3 Simple Example
4.4 Text Association
4.4.1 Functional View
4.4.2 Simple Example
4.5 Overall Summary
Part II Text Categorization
5 Text Categorization: Conceptual View
5.1 Definition of Text Categorization
5.2 Data Classification
5.2.1 Binary Classification
5.2.2 Multiple Classification
5.2.3 Classification Decomposition.
5.2.4 Regression
5.3 Classification Types
5.3.1 Hard vs. Soft Classification
5.3.2 Flat vs. Hierarchical Classification
5.3.3 Single vs. Multiple Viewed Classification
5.3.4 Independent vs. Dependent Classification
5.4 Variants of Text Categorization
5.4.1 Spam Mail Filtering
5.4.2 Sentimental Analysis
5.4.3 Information Filtering
5.4.4 Topic Routing
5.5 Summary and Further Discussions
6 Text Categorization: Approaches
6.1 Machine Learning
6.2 Lazy Learning
6.2.1 K-Nearest Neighbor
6.2.2 Radius Nearest Neighbor
6.2.3 Distance-Based Nearest Neighbor
6.2.4 Attribute Discriminated Nearest Neighbor
6.3 Probabilistic Learning
6.3.1 Bayes Rule
6.3.2 Bayes Classifier
6.3.3 Naive Bayes
6.3.4 Bayesian Learning
6.4 Kernel-Based Classifier
6.4.1 Perceptron
6.4.2 Kernel Functions
6.4.3 Support Vector Machine
6.4.4 Optimization Constraints
6.5 Summary and Further Discussions
7 Text Categorization: Implementation
7.1 System Architecture
7.2 Class Definitions
7.2.1 Classes: Word, Text, and PlainText
7.2.2 Interface and Class: Classifier and KNearestNeighbor
7.2.3 Class: TextClassificationAPI
7.3 SubsectionTitle
7.3.1 Class: Word
7.3.2 Class: PlainText
7.3.3 Class: KNearestNeighbor
7.3.4 Class: TextClassificationAPI
7.4 Graphic User Interface and Demonstration
7.4.1 Class: TextClassificationGUI
7.4.2 Preliminary Tasks and Encoding
7.4.3 Classification Process
7.4.4 System Upgrading
7.5 Summary and Further Discussions
8 Text Categorization: Evaluation
8.1 Evaluation Overview
8.2 Text Collections
8.2.1 NewsPage.com
8.2.2 20NewsGroups
8.2.3 Reuter21578
8.2.4 OSHUMED
8.3 F1 Measure
8.3.1 Contingency Table
8.3.2 Micro-Averaged F1
8.3.3 Macro-Averaged F1
8.3.4 Example.
8.4 Statistical t-Test
8.4.1 Student t-Distribution
8.4.2 Unpaired Difference Inference
8.4.3 Paired Difference Inference
8.4.4 Example
8.5 Summary and Further Discussions
Part III Text Clustering
9 Text Clustering: Conceptual View
9.1 Definition of Text Clustering
9.2 Data Clustering
9.2.1 SubSubsectionTitle
9.2.2 Association vs. Clustering
9.2.3 Classification vs. Clustering
9.2.4 Constraint Clustering
9.3 Clustering Types
9.3.1 Static vs. Dynamic Clustering
9.3.2 Crisp vs. Fuzzy Clustering
9.3.3 SubsectionTitle
9.3.4 Single vs. Multiple Viewed Clustering
9.4 Derived Tasks from Text Clustering
9.4.1 Cluster Naming
9.4.2 Subtext Clustering
9.4.3 Automatic Sampling for Text Categorization
9.4.4 Redundant Project Detection
9.5 Summary and Further Discussions
10 Text Clustering: Approaches
10.1 Unsupervised Learning
10.2 Simple Clustering Algorithms
10.2.1 AHC Algorithm
10.2.2 Divisive Clustering Algorithm
10.2.3 Single-Pass Algorithm
10.2.4 Growing Algorithm
10.3 K-Means Algorithm
10.3.1 Crisp K-Means Algorithm
10.3.2 Fuzzy K-Means Algorithm
10.3.3 Gaussian Mixture
10.3.4 K Medoid Algorithm
10.4 Competitive Learning
10.4.1 Kohonen Networks
10.4.2 Learning Vector Quantization
10.4.3 Two-Dimensional Self-Organizing Map
10.4.4 Neural Gas
10.5 Summary and Further Discussions
11 Text Clustering: Implementation
11.1 System Architecture
11.2 Class Definitions
11.2.1 Classes in Text Categorization System
11.2.2 Class: Cluster
11.2.3 Interface: ClusterAnalyzer
11.2.4 Class: AHCAlgorithm
11.3 Method Implementations
11.3.1 Methods in Previous Classes
11.3.2 Class: Cluster
11.3.3 Class: AHC Algorithm
11.4 Class: ClusterAnalysisAPI.
11.4.1 Class: ClusterAnalysisAPI
11.4.2 Class: ClusterAnalyzerGUI
11.4.3 Demonstration
11.4.4 System Upgrading
11.5 Summary and Further Discussions
Reference
12 Text Clustering: Evaluation
12.1 Introduction
12.2 Cluster Validations
12.2.1 Intra-cluster and Inter-cluster Similarities
12.2.2 Internal Validation
12.2.3 Relative Validation
12.2.4 External Validation
12.3 Clustering Index
12.3.1 Computation Process
12.3.2 Evaluation of Crisp Clustering
12.3.3 Evaluation of Fuzzy Clustering
12.3.4 Evaluation of Hierarchical Clustering
12.4 Parameter Tuning
12.4.1 Clustering Index for Unlabeled Documents
12.4.2 Simple Clustering Algorithm with Parameter Tuning
12.4.3 K Means Algorithm with Parameter Tuning
12.4.4 Evolutionary Clustering Algorithm
12.5 Summary and Further Discussions
Part IV Advanced Topics
13 Text Summarization
13.1 Definition of Text Summarization
13.2 Text Summarization Types
13.2.1 Manual Versus Automatic Text Summarization
13.2.2 Single Versus Multiple Text Summarization
13.2.3 Flat Versus Hierarchical Text Summarization
13.2.4 Abstraction Versus Query-Based Summarization
13.3 Approaches to Text Summarization
13.3.1 Heuristic Approaches
13.3.2 Mapping into Classification Task
13.3.3 Sampling Schemes
13.3.4 Application of Machine Learning Algorithms
13.4 Combination with Other Text Mining Tasks
13.4.1 Summary-Based Classification
13.4.2 Summary-Based Clustering
13.4.3 Topic-Based Summarization
13.4.4 Text Expansion
13.5 Summary and Further Discussions
14 Text Segmentation
14.1 Definition of Text Segmentation
14.2 Text Segmentation Type
14.2.1 Spoken Versus Written Text Segmentation
14.2.2 Ordered Versus Unordered Text Segmentation
14.2.3 Exclusive Versus Overlapping Segmentation.
14.2.4 Flat Versus Hierarchical Text Segmentation
14.3 Machine Learning-Based Approaches
14.3.1 Heuristic Approaches
14.3.2 Mapping into Classification
14.3.3 Encoding Adjacent Paragraph Pairs
14.3.4 Application of Machine Learning
14.4 Derived Tasks
14.4.1 Temporal Topic Analysis
14.4.2 Subtext Retrieval
14.4.3 Subtext Synthesization
14.4.4 Virtual Text
14.5 Summary and Further Discussions
15 Taxonomy Generation
15.1 Definition of Taxonomy Generation
15.2 Relevant Tasks to Taxonomy Generation
15.2.1 Keyword Extraction
15.2.2 Word Categorization
15.2.3 Word Clustering
15.2.4 Topic Routing
15.3 Taxonomy Generation Schemes
15.3.1 Index-Based Scheme
15.3.2 Clustering-Based Scheme
15.3.3 Association-Based Scheme
15.3.4 Link Analysis-Based Scheme
15.4 Taxonomy Governance
15.4.1 Taxonomy Maintenance
15.4.2 Taxonomy Growth
15.4.3 Taxonomy Integration
15.4.4 Ontology
15.5 Summary and Further Discussions
16 Dynamic Document Organization
16.1 Definition of Dynamic Document Organization
16.2 Online Clustering
16.2.1 Online Clustering in Functional View
16.2.2 Online K Means Algorithm
16.2.3 Online Unsupervised KNN Algorithm
16.2.4 Online Fuzzy Clustering
16.3 Dynamic Organization
16.3.1 Execution Process
16.3.2 Maintenance Mode
16.3.3 Creation Mode
16.3.4 Additional Tasks
16.4 Issues of Dynamic Document Organization
16.4.1 Text Representation
16.4.2 Binary Decomposition
16.4.3 Transition into Creation Mode
16.4.4 Variants of DDO System
16.4.5 Summary and Further Discussions
Part V Word Mining
17 Word Encoding
17.1 Introduction
17.2 Word Encoding
17.2.1 Text Indexing
17.2.2 Text Index Structure
17.2.3 Word Indexing
17.2.4 Inverted Index
17.3 Word Representation.
17.3.1 Text Representation.
Notes:
Description based on publisher supplied metadata and other sources.
ISBN:
9783031759765
3031759761
OCLC:
1481791252

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account