My Account Log in

1 option

Statistical data mining and knowledge discovery / edited by Hamparsum Bozdogan.

LIBRA QA76.9.D343 S685 2004
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Book
Contributor:
Bozdogan, H. (Hamparsum), 1945-
Language:
English
Subjects (All):
Data mining--Statistical methods.
Data mining.
Computer algorithms.
Knowledge acquisition (Expert systems).
Physical Description:
588 pages : illustrations ; 25 cm
Place of Publication:
Boca Raton, Fla. : Chapman & Hall/CRC, [2004]
Summary:
This volume brings together a stellar panel of experts to discuss and disseminate recent developments in data analysis techniques for data mining and knowledge extraction. This carefully edited collection provides a practical, multidisciplinary perspective on using statistical techniques.
Contents:
1 The Role of Bayesian and Frequentist Multivariate Modeling in Statistical Data Mining / S. James Press 1
1.2 Is Data Mining Science? 2
1.3 Genesis of Data Mining 3
1.4 The Data Cube and Databases 3
1.5 Structured Query Language 5
1.6 Statistical Problems with Data Mining 6
1.7 Some DM Approaches to Dimension Reduction 7
1.8 Prior Distributions in Data Mining 9
1.9 Some New DM Applications 10
2 Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms / Hamparsum Bozdogan 15
2.2 What is Information Complexity:ICOMP? 17
2.3 Information Criteria for Multiple Regression Models 31
2.4 A GA for the Regression Modeling 36
2.5 Numerical Examples 41
3 Econometric and Statistical Data Mining, Prediction and Policy-Making / Arnold Zellner 57
3.2 Brief Comments on Scientific Method and Data Mining 59
3.3 The Structural Econometric Modeling, Time Series Analysis (SEMTSA) Approach 61
3.4 Methods Employed in Data Analysis, Modeling and Forecasting 67
3.5 Disaggregation and the Marshallian Macroeconomic Model 71
3.6 A Complete Marshallian Macroeconomic Model 74
4 Data Mining Strategies for the Detection of Chemical Warfare Agents / Jeffrey. L. Solka, Edward J. Wegman, David J. Marchette 79
5 Disclosure Limitation Methods Based on Bounds for Large Contingency Tables With Applications to Disability / Adrian Dobra, Elena A. Erosheva, Stephen E. Fienberg 93
5.2 Example: National Long Term Care Survey Data 95
5.3 Technical Background on Cell Entry Bounds 96
5.4 Decomposable Frontiers 99
5.5 "Greedy" Frontiers 103
5.6 Bounds 108
6 Partial Membership Models with Application to Disability Survey Data / Elena A. Erosheva 117
6.1 Motivation 118
6.2 Functional Disability Data 119
6.3 Full Versus Partial Membership 123
6.4 Bayesian Estimation of the GoM Model 125
6.5 Analysis and Comparison 127
7 Automated Scoring of Polygraph Data / Aleksandra B. Slavkovic 135
7.3 Statistical Models for Classification and Prediction 139
7.4 The Data 141
7.5 Statistical Analysis 144
8 Missing Value Algorithms in Decision Trees / Hyunjoong Kim, Sumer Yates 155
8.2 The Seven Algorithms 156
8.3 The Simulation Study 159
8.4 Results 162
9 Unsupervised Learning from Incomplete Data Using a Mixture Model Approach / Lynette Hunt, Murray Jorgensen 173
9.2 Clustering by Mixture Models 175
9.3 Applications 182
10 Improving the Performance of Radial Basis Function (RBF) Classification Using Information Criteria / Zhenqiu Liu, Hamparsum Bozdogan 193
10.2 Regression Trees 197
10.3 New Kernel Functions 201
10.4 The EM Algorithm 204
10.5 Hybrid Training 208
10.6 Computational Results 210
11 Use of Kernel Based Techniques for Sensor Validation in Nuclear Power Plants / Andrei V. Gribok, Aleksey M. Urmanov, J. Wesley Hines, Robert E. Uhrig 217
11.2 Collinear, Ill-Posed Problems, Regularization 218
11.3 Kernel Regression and MSET 222
11.4 Support Vector Machines 223
11.5 Data Description and Results 225
12 Data Mining and Traditional Regression / Christopher M. Hill, Linda C. Malone, Linda Trocine 233
12.2 Military Manpower Application 234
12.3 Data Mining and Traditional Regression 236
12.4 General Problems 237
12.5 Attempted Solutions 239
12.6 Regression Specific Issues 240
13 An Extended Sliced Inverse Regression / Masahiro Mizuta 251
13.2 Algorithms for SIR Model 252
13.3 Relative Projection Pursuit 254
13.4 SIRrpp 254
14 Using Genetic Programming to Improve the Group Method of Data Handling in Time Series Prediction / M. Hiassat, M.F. Abbod, N. Mort 257
14.2 The Data 258
14.3 Financial Data 259
14.4 Weather Data 259
14.5 Processing of Data 260
14.6 The Group Method of Data Handling (GMDH) 261
14.7 Genetic Programming (GP) 262
14.8 GP-GMDH 263
14.9 Results and Discussion 264
15 Data Mining for Monitoring Plant Devices Using GMDH and Pattern Classification / B.R. Upadhyaya, B. Lu 269
15.2 Description of the Method 273
15.3 Analysis and Results 277
16 Statistical Modeling and Data Mining to Identify Consumer Preferences / Francois Boussu, Jean Jacques Denimal 281
16.2 Data Mining Method 283
16.3 Application to a Textile Data Set 288
17 Testing for Structural Change Over Time of Brand Attribute Perceptions in Market Segments / Sara Dolnicar, Friedrich Leisch 297
17.2 The Managerial Problem 298
17.3 Results from Traditional Analysis 299
17.4 The PBMS and DynPBMS Approaches 300
18 Kernel PCA for Feature Extraction with Information Complexity / Zhenqiu Liu, Hamparsum Bozdogan 309
18.2 Kernel Functions 312
18.3 Kernel PCA 314
18.4 EM for Kernel PCA and On-line PCA 318
18.5 Choosing the Number of Components with Information Complexity 319
18.6 Computational Results 320
19 Global Principal Component Analysis for Dimensionality Reduction in Distributed Data Mining / Hairong Qi, Tse-Wei Wang, J. Douglas Birdwell 323
19.2 Principal Component Analysis 326
19.3 Global PCA for Distributed Homogeneous Databases 327
19.4 Global PCA for Distributed Heterogeneous Databases 330
19.5 Experiments and Results 331
20 A New Metric for Categorical Data / S. H. Al-Harbi, G. P. McKeown, V. J. Rayward-Smith 339
20.2 Dissimilarity Measure 340
20.3 D[subscript CV] Metric 343
20.4 Synthetic Examples 345
20.5 Exploiting the D[subscript CV] Metric 348
21 Ordinal Logistic Modeling Using ICOMP as a Goodness-of-Fit Criterion / J. Michael Lanning, Hamparsum Bozdogan 353
21.2 Model Selection Criteria 356
21.3 Ordinal Logistic Regression 359
21.4 Example Problem: Diabetes Severity 367
22 Comparing Latent Class Factor Analysis with the Traditional Approach in Data Mining / Jay Magidson, Jeroen Vermunt 373
22.2 The Basic LC Factor Model 375
23 On Cluster Effects in Mining Complex Econometric Data / M. Ishaq Bhatti 385
23.2 The Model 387
23.3 An Algorithm for Full Maximum Likelihood Estimation 389
23.4 Application of the Model 392
23.5 Fixed Coefficient Regression Models 394
24 Neural Network-Based Data Mining Techniques for Steel Making / Ravindra K. Sarma, Amar Gupta, Sanjeev Vadhavkar 401
24.2 Productivity from Information Technology (PROFIT) Initiative 403
24.3 Description of Predictive Model 406
24.4 NNRUN
ANN Training Suite 407
24.5 Results and Analysis 409
25 Solving Data Clustering Problem as a String Search Problem / V. Olman, D. Xu, Y. Xu 415
25.2 Mathematical Framework 417
25.3 Stability of MST Structure Under Noise 421
25.4 Statistical Assessment of Identified Clusters 422
25.5 Applications 423
26 Behavior-Based Recommender Systems as Value-Added Services for Scientific Libraries / Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, Anke Thede 433
26.2 Recommender Services for Legacy Library Systems 435
26.3 Ehrenberg's Repeat-Buying Theory for Libraries 439
26.4 A Recommender System for the Library of the Universitat Karlsruhe (TH) 448
27 GTP (General Text Parser) Software for Text Mining / Justin T. Giles, Ling Wo, Michael W.
Berry 455
27.2 Model Facilitated by GTP 456
27.3 GTP Usage and Files Generated 457
27.4 Overview of GTP Options 458
27.5 Query Processing with GTPQUERY 464
27.7 Versions of GTP and GTPQUERY 469
27.8 Code Evolution 470
27.9 Future Work 470
28 Implication Intensity: From the Basic Statistical Definition to the Entropic Version / Julien Blanchard, Pascale Kuntz, Fabrice Guillet, Regis Gras 473
28.2 First Definitions 475
28.3 Entropic Version 476
28.4 Experimental Results 478
29 Use of a Secondary Splitting Criterion in Classification Forest Construction / Chang-Yung Yu, Heping Zhang 487
29.2 A Secondary Node-Splitting Criterion 488
29.3 The Formation of a Deterministic Forest 488
29.4 Comparison Data 489
30 A Method Integrating Self-Organizing Maps to Predict the Probability of Barrier Removal / Zhicheng Zhang, Frederic Vanderhaegen 497
30.2 A Method Integrating Self-Organizing Maps Algorithm 498
30.3 Experimental Results 503
31 Cluster Analysis of Imputed Financial Data Using an Augmentation-Based Algorithm / H. Bensmail, R. P. DeGennaro 513
31.2 Data and Preliminary Tests 514
31.3 Clustering and Bayesian Data Augmentation 518
31.4 Bayesian Model Selection for Choosing the Number of Clusters 523
31.5 Analysis of Financial Data 523
32 Data Mining in Federal Agencies / David L. Banks, Robert T. Olszewski 529
32.1 Data Quality 529
32.2 Indexing Data 534
32.3 Screening for Structure with Locally Low Dimension 537
32.4 Estimating Exposure 545
33 STING: Evaluation of Scientific & Technological Innovation and Progress / S. Sirmakessis, K. Markellos, P. Markellou, G. Mayritsakis, K. Perdikouri, A. Tsakalidis, Georgia Panagopoulou 549
33.2 Methodology for the Analysis of Patents 550
33.3 System Description 559
33.4 Technology Indicators 563
34 The Semantic Conference Organizer / Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar 571
34.2 Latent Semantic Indexing 572
34.3 Software Issues 573
34.4 Creating a Conference 575
34.5 Future Extensions 579.
Notes:
Includes bibliographical references and index.
ISBN:
1584883448
OCLC:
52134776

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account