My Account Log in

1 option

Agile data science / Russell Jurney.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Jurney, Russell.
Contributor:
Loukides, Michael Kosta, editor.
Treseler, Mary, editor.
Language:
English
Subjects (All):
Apache Hadoop.
Data mining.
Agile software development.
Physical Description:
1 online resource (177 p.)
Edition:
First edition.
Place of Publication:
Beijing : O'Reilly Media, 2013.
Language Note:
English
System Details:
text file
Summary:
Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps. Create analytics applications by using the agile big data development methodology Build value from your data in a series of agile sprints, using the data-value stack Gain insight by using several data structures to extract multiple features from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future, and translate predictions into action Get feedback from users after each sprint to keep your project on track
Contents:
Intro
Copyright
Table of Contents
Preface
Who This Book Is For
How This Book Is Organized
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Part I. Setup
Chapter 1. Theory
Agile Big Data
Big Words Defined
Agile Big Data Teams
Recognizing the Opportunity and Problem
Adapting to Change
Agile Big Data Process
Code Review and Pair Programming
Agile Environments: Engineering Productivity
Collaboration Space
Private Space
Personal Space
Realizing Ideas with Large-Format Printing
Chapter 2. Data
Email
Working with Raw Data
Raw Email
Structured Versus Semistructured Data
SQL
NoSQL
Serialization
Extracting and Exposing Features in Evolving Schemas
Data Pipelines
Data Perspectives
Networks
Time Series
Natural Language
Probability
Conclusion
Chapter 3. Agile Tools
Scalability = Simplicity
Agile Big Data Processing
Setting Up a Virtual Environment for Python
Serializing Events with Avro
Avro for Python
Collecting Data
Data Processing with Pig
Installing Pig
Publishing Data with MongoDB
Installing MongoDB
Installing MongoDB's Java Driver
Installing mongo-hadoop
Pushing Data to MongoDB from Pig
Searching Data with ElasticSearch
Installation
ElasticSearch and Pig with Wonderdog
Reflecting on our Workflow
Lightweight Web Applications
Python and Flask
Presenting Our Data
Installing Bootstrap
Booting Boostrap
Visualizing Data with D3.js and nvd3.js
Chapter 4. To the Cloud!
Introduction
GitHub
dotCloud
Echo on dotCloud
Python Workers
Amazon Web Services
Simple Storage Service
Elastic MapReduce
MongoDB as a Service
Instrumentation
Google Analytics
Mortar Data.
Part II. Climbing the Pyramid
Chapter 5. Collecting and Displaying Records
Putting It All Together
Collect and Serialize Our Inbox
Process and Publish Our Emails
Presenting Emails in a Browser
Serving Emails with Flask and pymongo
Rendering HTML5 with Jinja2
Agile Checkpoint
Listing Emails
Listing Emails with MongoDB
Anatomy of a Presentation
Searching Our Email
Indexing Our Email with Pig, ElasticSearch, and Wonderdog
Searching Our Email on the Web
Chapter 6. Visualizing Data with Charts
Good Charts
Extracting Entities: Email Addresses
Extracting Emails
Visualizing Time
Chapter 7. Exploring Data with Reports
Building Reports with Multiple Charts
Linking Records
Extracting Keywords from Emails with TF-IDF
Chapter 8. Making Predictions
Predicting Response Rates to Emails
Personalization
Chapter 9. Driving Actions
Properties of Successful Emails
Better Predictions with Naive Bayes
P(Reply | From & To)
P(Reply | Token)
Making Predictions in Real Time
Logging Events
Index
About the Author.
Notes:
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed November 13, 2013).
ISBN:
9781449326906
1449326900
9781449326920
1449326927
9781449326913
1449326919
9781306813136
1306813131
OCLC:
868971089

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account