My Account Log in

1 option

Doing data science Cathy O’Neil and Rachel Schutt.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
O'Neil, Cathy, author.
Schutt, Rachel, 1976- author.
Language:
English
Subjects (All):
Big data.
Information science.
Data structures (Computer science).
Cyberinfrastructure.
Database management.
Electronic data processing.
Physical Description:
1 online resource (379 pages) : illustrations
Other Title:
Doing data science straight talk from the frontline
Place of Publication:
Sebastapol, CA O'Reilly, [2014]
Language Note:
English
System Details:
text file
Summary:
Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. Topics include: Statistical inference, exploratory data analysis, and the data science process Algorithms Spam filters, Naive Bayes, and data wrangling Logistic regression Financial modeling Recommendation engines and causality Data visualization Social networks and data journalism Data engineering, MapReduce, Pregel, and Hadoop Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.
Contents:
Copyright
Table of Contents
Preface
Motivation
Origins of the Class
Origins of the Book
What to Expect from This Book
How This Book Is Organized
How to Read This Book
How Code Is Used in This Book
Who This Book Is For
Prerequisites
Supplemental Reading
About the Contributors
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
Chapter1.Introduction: What Is Data Science?
Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
A Data Scientist's Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
Chapter3.Algorithms
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k-Nearest Neighbors (k-NN)
k-means
Exercise: Basic Machine Learning Algorithms
Solutions
Summing It All Up
Thought Experiment: Automated Statistician
Chapter4.Spam Filters, Naive Bayes, and Wrangling
Thought Experiment: Learning by Example
The Current Landscape (with a Little History)
Data Science Jobs
A Data Science Profile
Thought Experiment: Meta-Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
Chapter2.Statistical Inference, Exploratory Data Analysis, and the Data Science Process
Statistical Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
The Data Science Process
Why Won't Linear Regression Work for Filtering Spam?
How About k-nearest Neighbors?
Naive Bayes
Bayes Law
A Spam Filter for Individual Words
A Spam Filter That Combines Words: Naive Bayes
Fancy It Up: Laplace Smoothing
Comparing Naive Bayes to k-NN
Sample Code in bash
Scraping the Web: APIs and Other Tools
Jake's Exercise: Naive Bayes for Article Classification
Sample R Code for Dealing with the NYT API
Chapter5.Logistic Regression
Thought Experiments
Classifiers
Runtime
You
Interpretability
Scalability
M6D Logistic Regression Case Study
Notes:
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed November 2, 2013).
ISBN:
9781306810784
1306810787
9781449363895
144936389X
9781449363901
1449363903
OCLC:
868083954

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account