1 option
Data Warehousing in the Age of Big Data.
- Format:
- Book
- Author/Creator:
- Krishnan, Krish.
- Series:
- The Morgan Kaufmann Series on Business Intelligence Series
- Language:
- English
- Subjects (All):
- Big data.
- Physical Description:
- 1 online resource (371 pages)
- Edition:
- 1st ed.
- Place of Publication:
- San Diego : Elsevier Science & Technology, 2013.
- Contents:
- Front Cover
- Data Warehousing in the Age of Big Data
- Copyright Page
- Contents
- Acknowledgments
- About the Author
- Introduction
- Part 1: Big Data
- Part 2: The Data Warehousing
- Part 3: Building the Big Data - Data Warehouse
- Appendixes
- Companion website
- 1 BIG DATA
- 1 Introduction to Big Data
- Big Data
- Defining Big Data
- Why Big Data and why now?
- Big Data example
- Social Media posts
- Survey data analysis
- Survey data
- Weather data
- Twitter data
- Integration and analysis
- Additional data types
- Summary
- Further reading
- 2 Working with Big Data
- Data explosion
- Data volume
- Machine data
- Application log
- Clickstream logs
- External or third-party data
- Emails
- Contracts
- Geographic information systems and geo-spatial data
- Example: Funshots, Inc.
- Data velocity
- Amazon, Facebook, Yahoo, and Google
- Sensor data
- Mobile networks
- Social media
- Data variety
- 3 Big Data Processing Architectures
- Data processing revisited
- Data processing techniques
- Data processing infrastructure challenges
- Storage
- Transportation
- Processing
- Speed or throughput
- Shared-everything and shared-nothing architectures
- Shared-everything architecture
- Shared-nothing architecture
- OLTP versus data warehousing
- Big Data processing
- Infrastructure explained
- Data processing explained
- Telco Big Data study
- Infrastructure
- Data processing
- 4 Introducing Big Data Technologies
- Distributed data processing
- Big Data processing requirements
- Technologies for Big Data processing
- Google file system
- Hadoop
- Hadoop core components
- HDFS
- HDFS architecture
- NameNode
- DataNodes
- Image
- Journal
- Checkpoint
- HDFS startup
- Block allocation and storage in HDFS.
- HDFS client
- Replication and recovery
- Communication and management
- Heartbeats
- CheckpointNode and BackupNode
- CheckpointNode
- BackupNode
- File system snapshots
- JobTracker and TaskTracker
- MapReduce
- MapReduce programming model
- MapReduce program design
- MapReduce implementation architecture
- MapReduce job processing and management
- MapReduce limitations (Version 1, Hadoop MapReduce)
- MapReduce v2 (YARN)
- YARN scalability
- Comparison between MapReduce v1 and v2
- SQL/MapReduce
- Zookeeper
- Zookeeper features
- Locks and processing
- Failure and recovery
- Pig
- Programming with pig latin
- Pig data types
- Running pig programs
- Pig program flow
- Common pig command
- HBase
- HBase architecture
- HBase components
- Write-ahead log
- Hive
- Hive architecture
- Execution: how does hive process queries?
- Hive data types
- Hive query language (HiveQL)
- Chukwa
- Flume
- Oozie
- HCatalog
- Sqoop
- Sqoop1
- Sqoop2
- Hadoop summary
- NoSQL
- CAP theorem
- Key-value pair: Voldemort
- Column family store: Cassandra
- Data model
- Data partitioning
- Data sorting
- Consistency management
- Write consistency
- Read consistency
- Specifying client consistency levels
- Built-in consistency repair features
- Cassandra ring architecture
- Data placement
- Peer-to-Peer: simple scalability
- Gossip protocol: node management
- Document database: Riak
- Graph databases
- NoSQL summary
- Textual ETL processing
- 5 Big Data Driving Business Value
- Case study 1: Sensor data
- Vestas
- Overview
- Producing electricity from wind
- Turning climate into capital
- Tackling Big Data challenges
- Maintaining energy efficiency in its data center
- Case study 2: Streaming data
- Summary.
- Surveillance and security: TerraEchos
- The need
- The solution
- The benefit
- Advanced fiber optics combine with real-time streaming data
- Solution components
- Extending the security perimeter creates a strategic advantage
- Correlating sensor data delivers a zero false-positive rate
- Case study 3: The right prescription: improving patient outcomes with Big Data analytics
- Business objective
- Challenges
- Overview: giving practitioners new insights to guide patient care
- Challenges: blending traditional data warehouse ecosystems with Big Data
- Solution: getting ready for Big Data analytics
- Results: eliminating the "Data Trap"
- Why aster?
- About aurora
- Case study 4: University of Ontario, institute of technology: leveraging key data to provide proactive patient care
- Business benefits
- Making better use of the data resource
- Smarter healthcare
- Merging human knowledge and technology
- Broadening the impact of artemis
- Case study 5: Microsoft SQL server customer solution
- Customer profile
- Solution spotlight
- Business needs
- Solution
- Benefits
- Speed efficiency and cut costs
- Increases insight and advantage
- Facilitates innovation
- Case study 6: Customer-centric data integration
- Solution design
- Enabling a better cross-sell and upsell opportunity
- Example
- 2 THE DATA WAREHOUSING
- 6 Data Warehousing Revisited
- Traditional data warehousing, or data warehousing 1.0
- Data architecture
- Pitfalls of data warehousing
- Performance
- Scalability
- Architecture approaches to building a data warehouse
- Pros and cons of information factory approach
- Pros and cons of datamart BUS architecture approach
- Data warehouse 2.0
- Overview of Inmon's DW 2.0.
- Overview of DSS 2.0
- 7 Reengineering the Data Warehouse
- Enterprise data warehouse platform
- Transactional systems
- Operational data store
- Staging area
- Data warehouse
- Datamarts
- Analytical databases
- Issues with the data warehouse
- Choices for reengineering the data warehouse
- Replatforming
- Platform engineering
- Data engineering
- Modernizing the data warehouse
- Case study of data warehouse modernization
- Current-state analysis
- Recommendations
- Business benefits of modernization
- The appliance selection process
- Request For Information/Request For Proposal (RFI/RFP)
- Vendor information
- Product information
- Scorecard
- Proof of concept process
- Program roadmap
- Modernization ROI
- Additional benefits
- 8 Workload Management in the Data Warehouse
- Current state
- Defining workloads
- Understanding workloads
- Data warehouse outbound
- End-user application
- Data outbound to users
- Data inbound from users
- Data warehouse inbound
- Data warehouse processing overheads
- Query classification
- Wide/Wide
- Wide/Narrow
- Narrow/Wide
- Narrow/Narrow
- Unstructured/semi-structured data
- ETL and CDC workloads
- Measurement
- Current system design limitations
- New workloads and Big Data
- Big Data workloads
- Technology choices
- 9 New Technologies Applied to Data Warehousing
- Data warehouse challenges revisited
- Data loading
- Availability
- Data volumes
- Storage performance
- Query performance
- Data transport
- Data warehouse appliance
- Appliance architecture
- Data distribution in the appliance
- Key best practices for deploying a data warehouse appliance.
- Big Data appliances
- Cloud computing
- Infrastructure as a service
- Platform as a service
- Software as a service
- Cloud infrastructure
- Benefits of cloud computing for data warehouse
- Issues facing cloud computing for data warehouse
- Data virtualization
- What is data virtualization?
- Increasing business intelligence performance
- Workload distribution
- Implementing a data virtualization program
- Pitfalls to avoid when using data virtualization
- In-memory technologies
- Benefits of in-memory architectures
- 3 BUILDING THE BIG DATA - DATA WAREHOUSE
- 10 Integration of Big Data and Data Warehousing
- Components of the new data warehouse
- Data layer
- Algorithms
- Technology layer
- Integration strategies
- Data-driven integration
- Data classification
- Architecture
- Workload
- Analytics
- Physical component integration and architecture
- Data availability
- Operational costs
- External data integration
- Hadoop &
- RDBMS
- Big Data appliances
- Semantic framework
- Lexical processing
- Clustering
- Semantic knowledge processing
- Information extraction
- Visualization
- 11 Data-Driven Architecture for Big Data
- Metadata
- Technical metadata
- Business metadata
- Contextual metadata
- Process design-level metadata
- Program-level metadata
- Infrastructure metadata
- Core business metadata
- Operational metadata
- Business intelligence metadata
- Master data management
- Processing data in the data warehouse
- Processing complexity of Big Data
- Processing limitations
- Processing Big Data
- Gather stage
- Analysis stage
- Process stage
- Context processing
- Metadata, master data, and semantic linkage.
- Types of probabilistic links.
- Notes:
- Description based on publisher supplied metadata and other sources.
- Other Format:
- Print version: Krishnan, Krish Data Warehousing in the Age of Big Data
- ISBN:
- 9780124059207
- OCLC:
- 843860813
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.