My Account Log in

1 option

Principles and practice of big data : preparing, sharing, and analyzing complex information / Jules J. Berman.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Berman, Jules J., author.
Language:
English
Subjects (All):
Big data.
Physical Description:
1 online resource (482 pages)
Edition:
Second edition.
Place of Publication:
London, United Kingdom : Academic Press, imprint of Elsevier, [2018]
System Details:
text file
Summary:
Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on large, complex data sets can be achieved without the use of specialized suites of software (e.g., Hadoop), and without expensive hardware (e.g., supercomputers). The core of every algorithm described in the book can be implemented in a few lines of code using just about any popular programming language (Python snippets are provided). Through the use of new multiple examples, this edition demonstrates that if we understand our data, and if we know how to ask the right questions, we can learn a great deal from large and complex data collections. The book will assist students and professionals from all scientific backgrounds who are interested in stepping outside the traditional boundaries of their chosen academic disciplines. Presents new methodologies that are widely applicable to just about any project involving large and complex datasets Offers readers informative new case studies across a range scientific and engineering disciplines Provides insights into semantics, identification, de-identification, vulnerabilities and regulatory/legal issues Utilizes a combination of pseudocode and very short snippets of Python code to show readers how they may develop their own projects without downloading or learning new software
Contents:
Front Cover
Principles and Practice of Big Data: Preparing, sharing, and analyzing complex information
Copyright
Other Books by Jules J. Berman
Dedication
Contents
About the Author
Author's Preface to Second Edition
Author's Preface to First Edition
References
Chapter 1: Introduction
Section 1.1. Definition of Big Data
Section 1.2. Big Data Versus Small Data
Section 1.3. Whence Comest Big Data?
Section 1.4. The Most Common Purpose of Big Data Is to Produce Small Data
Section 1.5. Big Data Sits at the Center of the Research Universe
Glossary
Chapter 2: Providing Structure to Unstructured Data
Section 2.1. Nearly All Data Is Unstructured and Unusable in Its Raw Form
Section 2.2. Concordances
Section 2.3. Term Extraction
Section 2.4. Indexing
Section 2.5. Autocoding
Section 2.6. Case Study: Instantly Finding the Precise Location of Any Atom in the Universe (Some Assembly Required)
Section 2.7. Case Study (Advanced): A Complete Autocoder (in 12 Lines of Python Code)
Section 2.8. Case Study: Concordances as Transformations of Text
Section 2.9. Case Study (Advanced): Burrows Wheeler Transform (BWT)
Chapter 3: Identification, Deidentification, and Reidentification
Section 3.1. What Are Identifiers?
Section 3.2. Difference Between an Identifier and an Identifier System
Section 3.3. Generating Unique Identifiers
Section 3.4. Really Bad Identifier Methods
Section 3.5. Registering Unique Object Identifiers
Section 3.6. Deidentification and Reidentification
Section 3.7. Case Study: Data Scrubbing
Section 3.8. Case Study (Advanced): Identifiers in Image Headers
Section 3.9. Case Study: One-Way Hashes
Chapter 4: Metadata, Semantics, and Triples
Section 4.1. Metadata.
Section 4.2. eXtensible Markup Language
Section 4.3. Semantics and Triples
Section 4.4. Namespaces
Section 4.5. Case Study: A Syntax for Triples
Section 4.6. Case Study: Dublin Core
Chapter 5: Classifications and Ontologies
Section 5.1. It's All About Object Relationships
Section 5.2. Classifications, the Simplest of Ontologies
Section 5.3. Ontologies, Classes With Multiple Parents
Section 5.4. Choosing a Class Model
Section 5.5. Class Blending
Section 5.6. Common Pitfalls in Ontology Development
Section 5.7. Case Study: An Upper Level Ontology
Section 5.8. Case Study (Advanced): Paradoxes
Section 5.9. Case Study (Advanced): RDF Schemas and Class Properties
Section 5.10. Case Study (Advanced): Visualizing Class Relationships
Chapter 6: Introspection
Section 6.1. Knowledge of Self
Section 6.2. Data Objects: The Essential Ingredient of Every Big Data Collection
Section 6.3. How Big Data Uses Introspection
Section 6.4. Case Study: Time Stamping Data
Section 6.5. Case Study: A Visit to the TripleStore
Section 6.6. Case Study (Advanced): Proof That Big Data Must Be Object-Oriented
Chapter 7: Standards and Data Integration
Section 7.1. Standards
Section 7.2. Specifications Versus Standards
Section 7.3. Versioning
Section 7.4. Compliance Issues
Section 7.5. Case Study: Standardizing the Chocolate Teapot
Chapter 8: Immutability and Immortality
Section 8.1. The Importance of Data That Cannot Change
Section 8.2. Immutability and Identifiers
Section 8.3. Coping With the Data That Data Creates
Section 8.4. Reconciling Identifiers Across Institutions
Section 8.5. Case Study: The Trusted Timestamp
Section 8.6. Case Study: Blockchains and Distributed Ledgers.
Section 8.7. Case Study (Advanced): Zero-Knowledge Reconciliation
Chapter 9: Assessing the Adequacy of a Big Data Resource
Section 9.1. Looking at the Data
Section 9.2. The Minimal Necessary Properties of Big Data
Section 9.3. Data That Comes With Conditions
Section 9.4. Case Study: Utilities for Viewing and Searching Large Files
Section 9.5. Case Study: Flattened Data
Chapter 10: Measurement
Section 10.1. Accuracy and Precision
Section 10.2. Data Range
Section 10.3. Counting
Section 10.4. Normalizing and Transforming Your Data
Section 10.5. Reducing Your Data
Section 10.6. Understanding Your Control
Section 10.7. Statistical Significance Without Practical Significance
Section 10.8. Case Study: Gene Counting
Section 10.9. Case Study: Early Biometrics, and the Significance of Narrow Data Ranges
Chapter 11: Indispensable Tips for Fast and Simple Big Data Analysis
Section 11.1. Speed and Scalability
Section 11.2. Fast Operations, Suitable for Big Data, That Every Computer Supports
Section 11.3. The Dot Product, a Simple and Fast Correlation Method
Section 11.4. Clustering
Section 11.5. Methods for Data Persistence (Without Using a Database)
Section 11.6. Case Study: Climbing a Classification
Section 11.7. Case Study (Advanced): A Database Example
Section 11.8. Case Study (Advanced): NoSQL
Chapter 12: Finding the Clues in Large Collections of Data
Section 12.1. Denominators
Section 12.2. Word Frequency Distributions
Section 12.3. Outliers and Anomalies
Section 12.4. Back-of-Envelope Analyses
Section 12.5. Case Study: Predicting User Preferences
Section 12.6. Case Study: Multimodality in Population Data
Section 12.7. Case Study: Big and Small Black Holes.
Glossary
Chapter 13: Using Random Numbers to Knock Your Big Data Analytic Problems Down to Size
Section 13.1. The Remarkable Utility of (Pseudo)Random Numbers
Section 13.2. Repeated Sampling
Section 13.3. Monte Carlo Simulations
Section 13.4. Case Study: Proving the Central Limit Theorem
Section 13.5. Case Study: Frequency of Unlikely String of Occurrences
Section 13.6. Case Study: The Infamous Birthday Problem
Section 13.7. Case Study (Advanced): The Monty Hall Problem
Section 13.8. Case Study (Advanced): A Bayesian Analysis
Chapter 14: Special Considerations in Big Data Analysis
Section 14.1. Theory in Search of Data
Section 14.2. Data in Search of Theory
Section 14.3. Bigness Biases
Section 14.4. Data Subsets in Big Data: Neither Additive Nor Transitive
Section 14.5. Additional Big Data Pitfalls
Section 14.6. Case Study (Advanced): Curse of Dimensionality
Chapter 15: Big Data Failures and How to Avoid (Some of) Them
Section 15.1. Failure Is Common
Section 15.2. Failed Standards
Section 15.3. Blaming Complexity
Section 15.4. An Approach to Big Data That May Work for You
Section 15.5. After Failure
Section 15.6. Case Study: Cancer Biomedical Informatics Grid, a Bridge Too Far
Section 15.7. Case Study: The Gaussian Copula Function
Chapter 16: Data Reanalysis: Much More Important Than Analysis
Section 16.1. First Analysis (Nearly) Always Wrong
Section 16.2. Why Reanalysis Is More Important Than Analysis
Section 16.3. Case Study: Reanalysis of Old JADE Collider Data
Section 16.4. Case Study: Vindication Through Reanalysis
Section 16.5. Case Study: Finding New Planets From Old Data
Chapter 17: Repurposing Big Data.
Section 17.1. What Is Data Repurposing?
Section 17.2. Dark Data, Abandoned Data, and Legacy Data
Section 17.3. Case Study: From Postal Code to Demographic Keystone
Section 17.4. Case Study: Scientific Inferencing From a Database of Genetic Sequences
Section 17.5. Case Study: Linking Global Warming to High-Intensity Hurricanes
Section 17.6. Case Study: Inferring Climate Trends With Geologic Data
Section 17.7. Case Study: Lunar Orbiter Image Recovery Project
Chapter 18: Data Sharing and Data Security
Section 18.1. What Is Data Sharing, and Why Don't We Do More of It?
Section 18.2. Common Complaints
Section 18.3. Data Security and Cryptographic Protocols
Section 18.4. Case Study: Life on Mars
Section 18.5. Case Study: Personal Identifiers
Chapter 19: Legalities
Section 19.1. Responsibility for the Accuracy and Legitimacy of Data
Section 19.2. Rights to Create, Use, and Share the Resource
Section 19.3. Copyright and Patent Infringements Incurred by Using Standards
Section 19.4. Protections for Individuals
Section 19.5. Consent
Section 19.6. Unconsented Data
Section 19.7. Privacy Policies
Section 19.8. Case Study: Timely Access to Big Data
Section 19.9. Case Study: The Havasupai Story
Chapter 20: Societal Issues
Section 20.1. How Big Data Is Perceived by the Public
Section 20.2. Reducing Costs and Increasing Productivity With Big Data
Section 20.3. Public Mistrust
Section 20.4. Saving Us From Ourselves
Section 20.5. Who Is Big Data?
Section 20.6. Hubris and Hyperbole
Section 20.7. Case Study: The Citizen Scientists
Section 20.8. Case Study: 1984, by George Orwell
Index
Back Cover.
Notes:
Includes bibliographical references and index.
Description based on print version record.
ISBN:
9780128156100
0128156104
9780128156094
0128156090
OCLC:
1082522847

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account