1 option
Statistical data mining and knowledge discovery / edited by Hamparsum Bozdogan.
LIBRA QA76.9.D343 S685 2004
Available from offsite location
- Format:
- Book
- Language:
- English
- Subjects (All):
- Data mining--Statistical methods.
- Data mining.
- Computer algorithms.
- Knowledge acquisition (Expert systems).
- Physical Description:
- 588 pages : illustrations ; 25 cm
- Place of Publication:
- Boca Raton, Fla. : Chapman & Hall/CRC, [2004]
- Summary:
- This volume brings together a stellar panel of experts to discuss and disseminate recent developments in data analysis techniques for data mining and knowledge extraction. This carefully edited collection provides a practical, multidisciplinary perspective on using statistical techniques.
- Contents:
- 1 The Role of Bayesian and Frequentist Multivariate Modeling in Statistical Data Mining / S. James Press 1
- 1.2 Is Data Mining Science? 2
- 1.3 Genesis of Data Mining 3
- 1.4 The Data Cube and Databases 3
- 1.5 Structured Query Language 5
- 1.6 Statistical Problems with Data Mining 6
- 1.7 Some DM Approaches to Dimension Reduction 7
- 1.8 Prior Distributions in Data Mining 9
- 1.9 Some New DM Applications 10
- 2 Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms / Hamparsum Bozdogan 15
- 2.2 What is Information Complexity:ICOMP? 17
- 2.3 Information Criteria for Multiple Regression Models 31
- 2.4 A GA for the Regression Modeling 36
- 2.5 Numerical Examples 41
- 3 Econometric and Statistical Data Mining, Prediction and Policy-Making / Arnold Zellner 57
- 3.2 Brief Comments on Scientific Method and Data Mining 59
- 3.3 The Structural Econometric Modeling, Time Series Analysis (SEMTSA) Approach 61
- 3.4 Methods Employed in Data Analysis, Modeling and Forecasting 67
- 3.5 Disaggregation and the Marshallian Macroeconomic Model 71
- 3.6 A Complete Marshallian Macroeconomic Model 74
- 4 Data Mining Strategies for the Detection of Chemical Warfare Agents / Jeffrey. L. Solka, Edward J. Wegman, David J. Marchette 79
- 5 Disclosure Limitation Methods Based on Bounds for Large Contingency Tables With Applications to Disability / Adrian Dobra, Elena A. Erosheva, Stephen E. Fienberg 93
- 5.2 Example: National Long Term Care Survey Data 95
- 5.3 Technical Background on Cell Entry Bounds 96
- 5.4 Decomposable Frontiers 99
- 5.5 "Greedy" Frontiers 103
- 5.6 Bounds 108
- 6 Partial Membership Models with Application to Disability Survey Data / Elena A. Erosheva 117
- 6.1 Motivation 118
- 6.2 Functional Disability Data 119
- 6.3 Full Versus Partial Membership 123
- 6.4 Bayesian Estimation of the GoM Model 125
- 6.5 Analysis and Comparison 127
- 7 Automated Scoring of Polygraph Data / Aleksandra B. Slavkovic 135
- 7.3 Statistical Models for Classification and Prediction 139
- 7.4 The Data 141
- 7.5 Statistical Analysis 144
- 8 Missing Value Algorithms in Decision Trees / Hyunjoong Kim, Sumer Yates 155
- 8.2 The Seven Algorithms 156
- 8.3 The Simulation Study 159
- 8.4 Results 162
- 9 Unsupervised Learning from Incomplete Data Using a Mixture Model Approach / Lynette Hunt, Murray Jorgensen 173
- 9.2 Clustering by Mixture Models 175
- 9.3 Applications 182
- 10 Improving the Performance of Radial Basis Function (RBF) Classification Using Information Criteria / Zhenqiu Liu, Hamparsum Bozdogan 193
- 10.2 Regression Trees 197
- 10.3 New Kernel Functions 201
- 10.4 The EM Algorithm 204
- 10.5 Hybrid Training 208
- 10.6 Computational Results 210
- 11 Use of Kernel Based Techniques for Sensor Validation in Nuclear Power Plants / Andrei V. Gribok, Aleksey M. Urmanov, J. Wesley Hines, Robert E. Uhrig 217
- 11.2 Collinear, Ill-Posed Problems, Regularization 218
- 11.3 Kernel Regression and MSET 222
- 11.4 Support Vector Machines 223
- 11.5 Data Description and Results 225
- 12 Data Mining and Traditional Regression / Christopher M. Hill, Linda C. Malone, Linda Trocine 233
- 12.2 Military Manpower Application 234
- 12.3 Data Mining and Traditional Regression 236
- 12.4 General Problems 237
- 12.5 Attempted Solutions 239
- 12.6 Regression Specific Issues 240
- 13 An Extended Sliced Inverse Regression / Masahiro Mizuta 251
- 13.2 Algorithms for SIR Model 252
- 13.3 Relative Projection Pursuit 254
- 13.4 SIRrpp 254
- 14 Using Genetic Programming to Improve the Group Method of Data Handling in Time Series Prediction / M. Hiassat, M.F. Abbod, N. Mort 257
- 14.2 The Data 258
- 14.3 Financial Data 259
- 14.4 Weather Data 259
- 14.5 Processing of Data 260
- 14.6 The Group Method of Data Handling (GMDH) 261
- 14.7 Genetic Programming (GP) 262
- 14.8 GP-GMDH 263
- 14.9 Results and Discussion 264
- 15 Data Mining for Monitoring Plant Devices Using GMDH and Pattern Classification / B.R. Upadhyaya, B. Lu 269
- 15.2 Description of the Method 273
- 15.3 Analysis and Results 277
- 16 Statistical Modeling and Data Mining to Identify Consumer Preferences / Francois Boussu, Jean Jacques Denimal 281
- 16.2 Data Mining Method 283
- 16.3 Application to a Textile Data Set 288
- 17 Testing for Structural Change Over Time of Brand Attribute Perceptions in Market Segments / Sara Dolnicar, Friedrich Leisch 297
- 17.2 The Managerial Problem 298
- 17.3 Results from Traditional Analysis 299
- 17.4 The PBMS and DynPBMS Approaches 300
- 18 Kernel PCA for Feature Extraction with Information Complexity / Zhenqiu Liu, Hamparsum Bozdogan 309
- 18.2 Kernel Functions 312
- 18.3 Kernel PCA 314
- 18.4 EM for Kernel PCA and On-line PCA 318
- 18.5 Choosing the Number of Components with Information Complexity 319
- 18.6 Computational Results 320
- 19 Global Principal Component Analysis for Dimensionality Reduction in Distributed Data Mining / Hairong Qi, Tse-Wei Wang, J. Douglas Birdwell 323
- 19.2 Principal Component Analysis 326
- 19.3 Global PCA for Distributed Homogeneous Databases 327
- 19.4 Global PCA for Distributed Heterogeneous Databases 330
- 19.5 Experiments and Results 331
- 20 A New Metric for Categorical Data / S. H. Al-Harbi, G. P. McKeown, V. J. Rayward-Smith 339
- 20.2 Dissimilarity Measure 340
- 20.3 D[subscript CV] Metric 343
- 20.4 Synthetic Examples 345
- 20.5 Exploiting the D[subscript CV] Metric 348
- 21 Ordinal Logistic Modeling Using ICOMP as a Goodness-of-Fit Criterion / J. Michael Lanning, Hamparsum Bozdogan 353
- 21.2 Model Selection Criteria 356
- 21.3 Ordinal Logistic Regression 359
- 21.4 Example Problem: Diabetes Severity 367
- 22 Comparing Latent Class Factor Analysis with the Traditional Approach in Data Mining / Jay Magidson, Jeroen Vermunt 373
- 22.2 The Basic LC Factor Model 375
- 23 On Cluster Effects in Mining Complex Econometric Data / M. Ishaq Bhatti 385
- 23.2 The Model 387
- 23.3 An Algorithm for Full Maximum Likelihood Estimation 389
- 23.4 Application of the Model 392
- 23.5 Fixed Coefficient Regression Models 394
- 24 Neural Network-Based Data Mining Techniques for Steel Making / Ravindra K. Sarma, Amar Gupta, Sanjeev Vadhavkar 401
- 24.2 Productivity from Information Technology (PROFIT) Initiative 403
- 24.3 Description of Predictive Model 406
- 24.4 NNRUN
- ANN Training Suite 407
- 24.5 Results and Analysis 409
- 25 Solving Data Clustering Problem as a String Search Problem / V. Olman, D. Xu, Y. Xu 415
- 25.2 Mathematical Framework 417
- 25.3 Stability of MST Structure Under Noise 421
- 25.4 Statistical Assessment of Identified Clusters 422
- 25.5 Applications 423
- 26 Behavior-Based Recommender Systems as Value-Added Services for Scientific Libraries / Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, Anke Thede 433
- 26.2 Recommender Services for Legacy Library Systems 435
- 26.3 Ehrenberg's Repeat-Buying Theory for Libraries 439
- 26.4 A Recommender System for the Library of the Universitat Karlsruhe (TH) 448
- 27 GTP (General Text Parser) Software for Text Mining / Justin T. Giles, Ling Wo, Michael W.
- Berry 455
- 27.2 Model Facilitated by GTP 456
- 27.3 GTP Usage and Files Generated 457
- 27.4 Overview of GTP Options 458
- 27.5 Query Processing with GTPQUERY 464
- 27.7 Versions of GTP and GTPQUERY 469
- 27.8 Code Evolution 470
- 27.9 Future Work 470
- 28 Implication Intensity: From the Basic Statistical Definition to the Entropic Version / Julien Blanchard, Pascale Kuntz, Fabrice Guillet, Regis Gras 473
- 28.2 First Definitions 475
- 28.3 Entropic Version 476
- 28.4 Experimental Results 478
- 29 Use of a Secondary Splitting Criterion in Classification Forest Construction / Chang-Yung Yu, Heping Zhang 487
- 29.2 A Secondary Node-Splitting Criterion 488
- 29.3 The Formation of a Deterministic Forest 488
- 29.4 Comparison Data 489
- 30 A Method Integrating Self-Organizing Maps to Predict the Probability of Barrier Removal / Zhicheng Zhang, Frederic Vanderhaegen 497
- 30.2 A Method Integrating Self-Organizing Maps Algorithm 498
- 30.3 Experimental Results 503
- 31 Cluster Analysis of Imputed Financial Data Using an Augmentation-Based Algorithm / H. Bensmail, R. P. DeGennaro 513
- 31.2 Data and Preliminary Tests 514
- 31.3 Clustering and Bayesian Data Augmentation 518
- 31.4 Bayesian Model Selection for Choosing the Number of Clusters 523
- 31.5 Analysis of Financial Data 523
- 32 Data Mining in Federal Agencies / David L. Banks, Robert T. Olszewski 529
- 32.1 Data Quality 529
- 32.2 Indexing Data 534
- 32.3 Screening for Structure with Locally Low Dimension 537
- 32.4 Estimating Exposure 545
- 33 STING: Evaluation of Scientific & Technological Innovation and Progress / S. Sirmakessis, K. Markellos, P. Markellou, G. Mayritsakis, K. Perdikouri, A. Tsakalidis, Georgia Panagopoulou 549
- 33.2 Methodology for the Analysis of Patents 550
- 33.3 System Description 559
- 33.4 Technology Indicators 563
- 34 The Semantic Conference Organizer / Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar 571
- 34.2 Latent Semantic Indexing 572
- 34.3 Software Issues 573
- 34.4 Creating a Conference 575
- 34.5 Future Extensions 579.
- Notes:
- Includes bibliographical references and index.
- ISBN:
- 1584883448
- OCLC:
- 52134776
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.