Getting Started with Greenplum for Big Data Analytics.

By: Gollapudi, SunilaPublisher: Olton : Packt Publishing, Limited, 2013Copyright date: ©2013Description: 1 online resource (199 pages)Content type: text Media type: computer Carrier type: online resourceISBN: 9781782177050Subject(s): Anglican Communion -- Doctrines.;Lord's Supper -- Anglican CommunionGenre/Form: Electronic books. Additional physical formats: Print version:: Getting Started with Greenplum for Big Data AnalyticsDDC classification: 006.3;006.3/12 LOC classification: QA76.9.D32 -- G65 2013ebOnline resources: Click to View
Contents:
Intro -- Getting Started with Greenplum for Big Data Analytics -- Table of Contents -- Getting Started with Greenplum for Big Data Analytics -- Credits -- Foreword -- About the Author -- Acknowledgement -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Instant Updates on New Packt Books -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Errata -- Piracy -- Questions -- 1. Big Data, Analytics, and Data Science Life Cycle -- Enterprise data -- Classification -- Features -- Big Data -- So, what is Big Data? -- Multi-structured data -- Data analytics -- Data science -- Data science life cycle -- Phase 1 - state business problem -- Phase 2 - set up data -- Phase 3 - explore/transform data -- Phase 4 - model -- Phase 5 - publish insights -- Phase 6 - measure effectiveness -- References/Further reading -- Summary -- 2. Greenplum Unified Analytics Platform (UAP) -- Big Data analytics - platform requirements -- Greenplum Unified Analytics Platform (UAP) -- Core components -- Greenplum Database -- Hadoop (HD) -- Chorus -- Command Center -- Modules -- Database modules -- HD modules -- Data Integration Accelerator (DIA) modules -- Core architecture concepts -- Data warehousing -- Column-oriented databases -- Parallel versus distributed computing/processing -- Shared nothing, massive parallel processing (MPP) systems, and elastic scalability -- Shared disk data architecture -- Shared memory data architecture -- Shared nothing data architecture -- Data loading patterns -- Greenplum UAP components -- Greenplum Database -- The Greenplum Database physical architecture -- The Greenplum high-availability architecture -- High-speed data loading using external tables.
External table types -- Polymorphic data storage and historic data management -- Data distribution -- Hadoop (HD) -- Hadoop Distributed File System (HDFS) -- Hadoop MapReduce -- Chorus -- Greenplum Data Computing Appliance (DCA) -- Greenplum Data Integration Accelerator (DIA) -- References/Further reading -- Summary -- 3. Advanced Analytics - Paradigms, Tools, and Techniques -- Analytic paradigms -- Descriptive analytics -- Predictive analytics -- Prescriptive analytics -- Analytics classified -- Classification -- Forecasting or prediction or regression -- Clustering -- Optimization -- Simulations -- Modeling methods -- Decision trees -- Association rules -- The Apriori algorithm -- Linear regression -- Logistic regression -- The Naive Bayesian classifier -- K-means clustering -- Text analysis -- R programming -- Weka -- In-database analytics using MADlib -- References/Further reading -- Summary -- 4. Implementing Analytics with Greenplum UAP -- Data loading for Greenplum Database and HD -- Greenplum data loading options -- External tables -- gpfdist -- gpload -- Hadoop (HD) data loading options -- Sqoop 2 -- Greenplum BulkLoader for Hadoop -- Using external ETL to load data into Greenplum -- Extraction, Load, and Transformation (ELT) and Extraction, Transformation, Load, and Transformation (ETLT) -- Greenplum target configuration -- Sourcing large volumes of data from Greenplum -- Unsupported Greenplum data types -- Push Down Optimization (PDO) -- Greenplum table distribution and partitioning -- Distribution -- Data skew and performance -- Optimizing the broadcast or redistribution motion for data co-location -- Partitioning -- Querying Greenplum Database and HD -- Querying Greenplum Database -- Analyzing and optimizing queries -- The ANALYZE function -- The EXPLAIN function -- Dynamic Pipelining in Greenplum -- Querying HDFS -- Hive -- Pig.
Data communication between Greenplum Database and Hadoop (using external tables) -- Data Computing Appliance (DCA) -- Storage design, disk protection, and fault tolerance -- Master server RAID configurations -- Segment server RAID configurations -- Monitoring DCA -- Greenplum Database management -- In-database analytics options (Greenplum-specific) -- Window functions -- The PARTITION BY clause -- The ORDER BY clause -- The OVER (ORDER BY…) clause -- Creating, modifying, and dropping functions -- User-defined aggregates -- Using R with Greenplum -- DBI Connector for R -- PL/R -- Using Weka with Greenplum -- Using MADlib with Greenplum -- Using Greenplum Chorus -- Pivotal -- References/Further reading -- Summary -- Index.
Summary: Standard tutorial-based approach."Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It's assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.
Holdings
Item type Current library Call number Status Date due Barcode Item holds
Ebrary Ebrary Afghanistan
Available EBKAF00084624
Ebrary Ebrary Algeria
Available
Ebrary Ebrary Cyprus
Available
Ebrary Ebrary Egypt
Available
Ebrary Ebrary Libya
Available
Ebrary Ebrary Morocco
Available
Ebrary Ebrary Nepal
Available EBKNP00084624
Ebrary Ebrary Sudan

Access a wide range of magazines and books using Pressreader and Ebook central.

Enjoy your reading, British Council Sudan.

Available
Ebrary Ebrary Tunisia
Available
Total holds: 0

Intro -- Getting Started with Greenplum for Big Data Analytics -- Table of Contents -- Getting Started with Greenplum for Big Data Analytics -- Credits -- Foreword -- About the Author -- Acknowledgement -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Instant Updates on New Packt Books -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Errata -- Piracy -- Questions -- 1. Big Data, Analytics, and Data Science Life Cycle -- Enterprise data -- Classification -- Features -- Big Data -- So, what is Big Data? -- Multi-structured data -- Data analytics -- Data science -- Data science life cycle -- Phase 1 - state business problem -- Phase 2 - set up data -- Phase 3 - explore/transform data -- Phase 4 - model -- Phase 5 - publish insights -- Phase 6 - measure effectiveness -- References/Further reading -- Summary -- 2. Greenplum Unified Analytics Platform (UAP) -- Big Data analytics - platform requirements -- Greenplum Unified Analytics Platform (UAP) -- Core components -- Greenplum Database -- Hadoop (HD) -- Chorus -- Command Center -- Modules -- Database modules -- HD modules -- Data Integration Accelerator (DIA) modules -- Core architecture concepts -- Data warehousing -- Column-oriented databases -- Parallel versus distributed computing/processing -- Shared nothing, massive parallel processing (MPP) systems, and elastic scalability -- Shared disk data architecture -- Shared memory data architecture -- Shared nothing data architecture -- Data loading patterns -- Greenplum UAP components -- Greenplum Database -- The Greenplum Database physical architecture -- The Greenplum high-availability architecture -- High-speed data loading using external tables.

External table types -- Polymorphic data storage and historic data management -- Data distribution -- Hadoop (HD) -- Hadoop Distributed File System (HDFS) -- Hadoop MapReduce -- Chorus -- Greenplum Data Computing Appliance (DCA) -- Greenplum Data Integration Accelerator (DIA) -- References/Further reading -- Summary -- 3. Advanced Analytics - Paradigms, Tools, and Techniques -- Analytic paradigms -- Descriptive analytics -- Predictive analytics -- Prescriptive analytics -- Analytics classified -- Classification -- Forecasting or prediction or regression -- Clustering -- Optimization -- Simulations -- Modeling methods -- Decision trees -- Association rules -- The Apriori algorithm -- Linear regression -- Logistic regression -- The Naive Bayesian classifier -- K-means clustering -- Text analysis -- R programming -- Weka -- In-database analytics using MADlib -- References/Further reading -- Summary -- 4. Implementing Analytics with Greenplum UAP -- Data loading for Greenplum Database and HD -- Greenplum data loading options -- External tables -- gpfdist -- gpload -- Hadoop (HD) data loading options -- Sqoop 2 -- Greenplum BulkLoader for Hadoop -- Using external ETL to load data into Greenplum -- Extraction, Load, and Transformation (ELT) and Extraction, Transformation, Load, and Transformation (ETLT) -- Greenplum target configuration -- Sourcing large volumes of data from Greenplum -- Unsupported Greenplum data types -- Push Down Optimization (PDO) -- Greenplum table distribution and partitioning -- Distribution -- Data skew and performance -- Optimizing the broadcast or redistribution motion for data co-location -- Partitioning -- Querying Greenplum Database and HD -- Querying Greenplum Database -- Analyzing and optimizing queries -- The ANALYZE function -- The EXPLAIN function -- Dynamic Pipelining in Greenplum -- Querying HDFS -- Hive -- Pig.

Data communication between Greenplum Database and Hadoop (using external tables) -- Data Computing Appliance (DCA) -- Storage design, disk protection, and fault tolerance -- Master server RAID configurations -- Segment server RAID configurations -- Monitoring DCA -- Greenplum Database management -- In-database analytics options (Greenplum-specific) -- Window functions -- The PARTITION BY clause -- The ORDER BY clause -- The OVER (ORDER BY…) clause -- Creating, modifying, and dropping functions -- User-defined aggregates -- Using R with Greenplum -- DBI Connector for R -- PL/R -- Using Weka with Greenplum -- Using MADlib with Greenplum -- Using Greenplum Chorus -- Pivotal -- References/Further reading -- Summary -- Index.

Standard tutorial-based approach."Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It's assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2019. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.