2 options
Data smart : using data science to transform information into insight / John W. Foreman.
- Format:
- Book
- Author/Creator:
- Foreman, John W.
- Language:
- English
- Subjects (All):
- Data mining.
- Web sites--Design.
- Web sites.
- Web usage mining.
- Genre:
- Electronic books.
- Physical Description:
- 1 online resource (431 pages) : illustrations
- Place of Publication:
- Indianapolis : Wiley, [2014]
- System Details:
- text file
- Summary:
- Most people are approaching data science all wrong. Here's how to do it right. Not to disillusion you, but data scientists are not mystical practitioners of magical arts. Data science is something you can do. Really. This book shows you the significant data science techniques, how they work, how to use them, and how they benefit your business, large or small. It's not about coding or database technologies. It's about turning raw data into insight you can act upon, and doing it as quickly and painlessly as possible. Roll up your sleeves and let's get going. Relax - it's just a spreadsheet, Visit the companion website at www.wiley.com/go/datasmart to download spreadsheets for each chapter, and follow them as you learn about: Artificial intelligence using the general linear model, ensemble methods, and naive Bayes, Clustering via k-means, spherical k-means, and graph modularity, Mathematical optimization, including non-linear programming and genetic algorithms, Working with time series data and forecasting with exponential smoothing, Using Monte Carlo simulation to quantify and address risk, Detecting outliers in single or multiple dimensions, Exploring the data-science-focused R language Book jacket.
- Contents:
- 1 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask 1
- Some Sample Data 2
- Moving Quickly with the Control Button 2
- Copying Formulas and Data Quickly 4
- Formatting Cells 5
- Paste Special Values 7
- Inserting Charts 8
- Locating the Find and Replace Menus 9
- Formulas for Locating and Pulling Values 10
- Using VLOOKUP to Merge Data 12
- Filtering and Sorting 13
- Using PivotTables 16
- Using Array Formulas 19
- Solving Stuff with Solver 20
- OpenSolver: I Wish We Didn't Need This, but We Do 26
- Wrapping Up 27
- 2 Cluster Analysis Part I: Using K-Means to Segment Your Customer Base 29
- Girls Dance with Girls, Boys Scratch Their Elbows 30
- Getting Real: K-Means Clustering Subscribers in E-mail Marketing 35
- Joey Bag O' Donuts Wholesale Wine Emporium 36
- The Initial Dataset 36
- Determining What to Measure 38
- Start with Four Clusters 41
- Euclidean Distance: Measuring Distances as the Crow Flies 41
- Distances and Cluster Assignments for Everybody!. 44
- Solving for the Cluster Centers 46
- Making Sense of the Results 49
- Getting the Top Deals by Cluster 50
- The Silhouette: A Good Way to Let Different K Values Duke It Out 53
- How about Five Clusters? 60
- Solving for Five Clusters 60
- Getting the Top Deals for All Five Clusters 61
- Computing the Silhouette for 5-Means Clustering 64
- K-Medians Clustering and Asymmetric Distance Measurements 66
- Using K-Medians Clustering 66
- Getting a More Appropriate Distance Metric 67
- Putting It All in Excel 69
- The Top Deals for the 5-Medians Clusters 70
- Wrapping Up 75
- 3 Naive Bayes and the Incredible Lightness of Being an Idiot 77
- When You Name a Product Mandrill, You're Going to Get Some Signal and Some Noise 77
- The World's Fastest Intro to Probability Theory 79
- Totaling Conditional Probabilities 80
- Joint Probability, the Chain Rule, and Independence 80
- What Happens in a Dependent Situation? 81
- Bayes Rule 82
- Using Bayes Rule to Create an Al Model 83
- High-Level Class Probabilities Are Often Assumed to Be Equal 84
- A Couple More Odds and Ends 85
- Let's Get This Excel Party Started 87
- Removing Extraneous Punctuation 87
- Splitting on Spaces 88
- Counting Tokens and Calculating Probabilities 92
- And We Have a Model! Let's Use It 94
- Wrapping Up 98
- 4 Optimization Modeling: Because That "Fresh Squeezed" Orange Juice Ain't Gonna Blend Itself 101
- Why Should Data Scientists Know Optimization? 102
- Starting with a Simple Trade-Off 103
- Representing the Problem as a Polytope 103
- Solving by Sliding the Level Set 105
- The Simplex Method: Rooting around the Corners 106
- Working in Excel 108
- There's a Monster at the End of This Chapter 117
- Fresh from the Grove to Your Glass...with a Pit Stop Through a Blending Model 118
- You Use a Blending Model 119
- Let's Start with Some Specs 119
- Coming Back to Consistency 121
- Putting the Data into Excel 121
- Setting Up the Problem in Solver 124
- Lowering Your Standards 126
- Dead Squirrel Removal: The Minimax Formulation 131
- If-Then and the "Big M" Constraint 133
- Multiplying Variables: Cranking Up the Volume to 11 137
- Modeling Risk 144
- Normally Distributed Data 145
- Wrapping Up 154
- 5 Cluster Analysis Part II: Network Graphs and Community Detection 155
- What Is a Network Graph? 156
- Visualizing a Simple Graph 157
- Brief Introduction to Gephi 159
- Gephi Installation and File Preparation 160
- Laying Out the Graph 162
- Node Degree 165
- Pretty Printing 166
- Touching the Graph Data 168
- Building a Graph from the Wholesale Wine Data 170
- Creating a Cosine Similarity Matrix 172
- Producing an r-Neighborhood Graph 174
- How Much Is an Edge Worth? Points and Penalties in Graph Modularity 178
- What's a Point and What's a Penalty? 179
- Setting Up the Score Sheet 183
- Let's Get Clustering! 185
- Split Number 1 185
- Split 2: Electric Boogaloo 190
- And...Split 3: Split with a Vengeance 192
- Encoding and Analyzing the Communities 193
- There and Back Again: A Gephi Tale 197
- Wrapping Up 202
- 6 The Granddaddy of Supervised Artificial Intelligence-Regression 205
- Wait, What? You're Pregnant? 205
- Don't Kid Yourself. 206
- Predicting Pregnant Customers at RetailMart Using Linear Regression 207
- The Feature Set 207
- Assembling the Training Data 209
- Creating Dummy Variables 210
- Let's Bake Our Own Linear Regression 213
- Linear Regression Statistics: R-Squared, F Tests, t Tests 221
- Making Predictions on Some New Data and Measuring Performance 230
- Predicting Pregnant Customers at RetailMart Using Logistic Regression 239
- First You Need a Link Function 240
- Hooking Up the Logistic Function and Reoptimizing 241
- Baking an Actual Logistic Regression 244
- Model Selection-Comparing the Performance of the Linear and Logistic Regressions 245
- For More Information 248
- Wrapping Up 249
- 7 Ensemble Models: A Whole Lot of Bad Pizza 251
- Using the Data from Chapter 6 252
- Bagging: Randomize, Train, Repeat 254
- Decision Stump Is an Unsexy Term for a Stupid Predictor 254
- Doesn't Seem So Stupid to Me! 255
- You Need More Power! 257
- Let's Train It 258
- Evaluating the Bagged Model 267
- Boosting: If You Get It Wrong, Just Boost and Try Again 272
- Training the Model-Every Feature Gets a Shot 272
- Evaluating the Boosted Model 280
- Wrapping Up 283
- 8 Forecasting: Breathe Easy; You Can't Win 285
- The Sword Trade Is Hopping 286
- Getting Acquainted with Time Series Data 286
- Starting Slow with Simple Exponential Smoothing 288
- Setting Up the Simple Exponential Smoothing Forecast 290
- You Might Have a Trend 296
- Holt's Trend-Corrected Exponential Smoothing 299
- Setting Up Holt's Trend-Corrected Smoothing in a Spreadsheet 300
- So Are You Done? Looking at Autocorrelations 306
- Multiplicative Holt-Winters Exponential Smoothing 313
- Setting the Initial Values for Level, Trend, and Seasonality 315
- Getting Rolling on the Forecast 319
- And...Optimize! 324
- Please Tell Me We're Done Now!!! 326
- Putting a Prediction Interval around the Forecast 327
- Creating a Fan Chart for Effect 331
- Wrapping Up 333
- 9 Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant 335
- Outliers Are (Bad?) People, Too 335
- The Fascinating Case of Hadlum v. Hadlum 336
- Tukey Fences 337
- Applying Tukey Fences in a Spreadsheet 338
- The Limitations of This Simple Approach 340
- Terrible at Nothing, Bad at Everything 341
- Preparing Data for Graphing 342
- Creating a Graph 345
- Getting the k Nearest Neighbors 347
- Graph Outlier Detection Method 1: Just Use the Indegree 348
- Graph Outlier Detection Method 2: Getting Nuanced with k-Distance 351
- Graph Outlier Detection Method 3: Local Outlier Factors Are Where It's At 353
- Wrapping Up 358
- 10 Moving from Spreadsheets into R 361
- Getting Up and Running with R 362
- Some Simple Hand-Jamming 363
- Reading Data into R 370
- Doing Some Actual Data Science 372
- Spherical K-Means on Wine Data in Just a Few Lines 372
- Building Al Models on the Pregnancy Data 378
- Forecasting in R 385
- Looking at Outlier Detection 389
- Wrapping Up 394.
- Notes:
- Includes bibliographical references and index.
- Description based on online resource; title from PDF title page (ebrary, viewed December 9, 2013).
- Local Notes:
- Electronic reproduction. Palo Alto, Calif. : ebrary, 2013. Available via World Wide Web. Access may be limited to ebrary affiliated libraries.
- ISBN:
- 9781118661482
- OCLC:
- 865512819
- Access Restriction:
- Restricted for use by site license.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.