SlideShare una empresa de Scribd logo
1 de 10
HPCC (High-Performance Computing Cluster)
IS A MASSIVE PARALLEL-PROCESSING COMPUTING PLATFORM THAT
SOLVES BIG DATA PROBLEMS. THE PLATFORM IS NOW OPEN SOURCE!
HPCC vs HADOOP
 Declarative programming language: Describe what needs to be done and not
how to do it
 Powerful: Unlike Java, high level primitives such as JOIN, TRANSFORM, PROJECT,
SORT, DISTRIBUTE, MAP, etc. are available. Higher level code means less
programmers and shorter time to deliver complete projects
 Extensible: As new attributes are defined, they become primitives that other
programmers can use
 Implicitly parallel: Parallelism is built into the underlying platform. The
programmer needs not be concerned with it
 Maintainable: A High level programming language, no side effects and attribute
encapsulation provide for more succinct, reliable and easier to troubleshoot
code
 Complete: Unlike Pig and Hive, ECL provides for a complete programming
paradigm.
 Homogeneous: One language to express data algorithms across the entire
HPCC platform, including data ETL and delivery.
The Enterprise Control Language (ECL)
 HPCC Systems Enterprise Control Language (ECL) is the query and control
language developed to manage all aspects of the massive data joins, sorts
and builds. ECL truly differentiates HPCC from other technologies in its ability to
provide flexible data analysis on a massive scale.
 ECL is a declarative language optimized for the manipulation of massive data
sets and provides for modular structured programming. Moreover, ECL is a
transparent and implicitly parallel programming language which is both
powerful and flexible, allowing for faster and more effective development
cycles, through higher expressiveness, encapsulation and code reuse.
 Data analysts can “express” complex queries without the need for iterative,
time-consuming data transformations and sorts associated with other
programming languages. Traditional low level languages (Java, C++ etc)
force the translation of business requirements to functional requirements before
programming can occur. The abstract nature of ECL eliminates the need for
this by making it easy to express business rules directly and succinctly.
HPCC System Architecture
 The HPCC system architecture includes two distinct cluster processing
environments, each of which can be optimized independently for its parallel
data processing purpose. The first of these platforms is called a Data Refinery
whose overall purpose is the general processing of massive volumes of raw data
of any type for any purpose but typically used for data cleansing and hygiene.
 ETL processing of the raw data, record linking and entity resolution, large-scale
ad-hoc complex analytics, and creation of keyed data and indexes to support
high-performance structured queries and data warehouse applications. The
Data Refinery is also referred to as Thor.
 A Thor cluster is similar in its function, execution environment, filesystem, and
capabilities to the Google and Hadoop MapReduce platforms.
It shows a representation of a physical Thor processing cluster which functions as a batch job execution
engine for scalable data-intensive computing applications. In addition to the Thor master and slave
nodes, additional auxiliary and common components are needed to implement a complete HPCC
processing environment.
Roxie(rapid data delivery engine)
 The second of the parallel data processing platforms is called Roxie and
functions as a rapid data delivery engine.
 This platform is designed as an online high-performance structured query and
analysis platform or data warehouse delivering the parallel data access
processing requirements of online applications through Web services interfaces
supporting thousands of simultaneous queries and users with sub-second
response times.
 Roxie utilizes a distributed indexed filesystem to provide parallel processing of
queries using an optimized execution environment and filesystem for high-
performance online processing.
 A Roxie cluster is similar in its function and capabilities to Hadoop with HBase
and Hive capabilities added, and provides for near real time predictable query
latencies.
 Both Thor and Roxie clusters utilize the ECL programming language for
implementing applications, increasing continuity and programmer productivity.
Hpcc
Continued…
 It shows a representation of a physical Roxie processing cluster
which functions as an online query execution engine for high-
performance query and data warehousing applications.
 A Roxie cluster includes multiple nodes with server and worker
processes for processing queries; an additional auxiliary component
called an ESP server which provides interfaces for external client
access to the cluster; and additional common components which
are shared with a Thor cluster in an HPCC environment. Although a
Thor processing cluster can be implemented and used without a
Roxie cluster, an HPCC environment which includes a Roxie cluster
should also include a Thor cluster. The Thor cluster is used to build the
distributed index files used by the Roxie cluster and to develop
online queries which will be deployed with the index files to the
Roxie cluster.
More on ECL(data-centric programming language)
 ECL is a declarative, data centric programming language designed in 2000 to allow a
team of programmers to process big data across a high performance computing cluster
without the programmer being involved in many of the lower level, imperative decisions.
 Sorting problem
// First declare a dataset with one column containing a list of strings
// Datasets can also be binary, csv, xml or externally defined structures
D :=DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING
Value;});
SD := SORT(D,Value);
output(SD)
More on ECL(data-centric programming language)
 ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE,
PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOSEN, ENTH, TOPN,
DISTRIBUTE.
 Comparison to Map-Reduce
The Hadoop Map-Reduce paradigm actually consists of three phases which
correlate to ECL primitives as follows.

Más contenido relacionado

La actualidad más candente

High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)Jose Luis Lopez Pino
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesVyacheslav Tykhonov
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibphanleson
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014aceas13tern
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014OSSCube
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Talend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training CurriculumTalend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training CurriculumBharat Khanna
 
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, DatabricksSpark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, DatabricksGoDataDriven
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
 
Modern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyModern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyMarius Butuc
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19ExtremeEarth
 

La actualidad más candente (20)

High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
Pattern -A scoring engine
Pattern -A scoring enginePattern -A scoring engine
Pattern -A scoring engine
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Talend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training CurriculumTalend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training Curriculum
 
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, DatabricksSpark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
Modern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyModern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative study
 
HDF5 In Support of Database Applications
HDF5 In Support of Database ApplicationsHDF5 In Support of Database Applications
HDF5 In Support of Database Applications
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 

Destacado

Weekly weather
Weekly weatherWeekly weather
Weekly weathershania95
 
Introduccion Algoritmos Multihilo
Introduccion Algoritmos MultihiloIntroduccion Algoritmos Multihilo
Introduccion Algoritmos MultihiloXavier Ochoa
 
0360chapterone.2
0360chapterone.20360chapterone.2
0360chapterone.2holly_cin
 
Housing Choice Voucher Program At-A-Glance
Housing Choice Voucher Program At-A-GlanceHousing Choice Voucher Program At-A-Glance
Housing Choice Voucher Program At-A-GlanceJessKern
 
Filhinhos - Vou prepara-vos um lugar!!
Filhinhos - Vou prepara-vos um lugar!! Filhinhos - Vou prepara-vos um lugar!!
Filhinhos - Vou prepara-vos um lugar!! Marly Brito
 
A utilização das TIC em contexto educativo.
A utilização das TIC em contexto educativo. A utilização das TIC em contexto educativo.
A utilização das TIC em contexto educativo. Fernando Albuquerque Costa
 
Competências para aprender e competências para ensinar COM TIC
Competências para aprender e competências para ensinar COM TICCompetências para aprender e competências para ensinar COM TIC
Competências para aprender e competências para ensinar COM TICFernando Albuquerque Costa
 
"La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient...
"La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient..."La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient...
"La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient...@cristobalcobo
 

Destacado (11)

Weekly weather
Weekly weatherWeekly weather
Weekly weather
 
Stb p hdsl
Stb p hdslStb p hdsl
Stb p hdsl
 
Introduccion Algoritmos Multihilo
Introduccion Algoritmos MultihiloIntroduccion Algoritmos Multihilo
Introduccion Algoritmos Multihilo
 
0360chapterone.2
0360chapterone.20360chapterone.2
0360chapterone.2
 
Housing Choice Voucher Program At-A-Glance
Housing Choice Voucher Program At-A-GlanceHousing Choice Voucher Program At-A-Glance
Housing Choice Voucher Program At-A-Glance
 
La innovacion pendiente (2016) . Cristobal Cobo
La innovacion pendiente (2016) . Cristobal CoboLa innovacion pendiente (2016) . Cristobal Cobo
La innovacion pendiente (2016) . Cristobal Cobo
 
Filhinhos - Vou prepara-vos um lugar!!
Filhinhos - Vou prepara-vos um lugar!! Filhinhos - Vou prepara-vos um lugar!!
Filhinhos - Vou prepara-vos um lugar!!
 
A utilização das TIC em contexto educativo.
A utilização das TIC em contexto educativo. A utilização das TIC em contexto educativo.
A utilização das TIC em contexto educativo.
 
ZIGBEE TECHNOLOGY
ZIGBEE TECHNOLOGYZIGBEE TECHNOLOGY
ZIGBEE TECHNOLOGY
 
Competências para aprender e competências para ensinar COM TIC
Competências para aprender e competências para ensinar COM TICCompetências para aprender e competências para ensinar COM TIC
Competências para aprender e competências para ensinar COM TIC
 
"La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient...
"La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient..."La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient...
"La innovación pendiente: nuevas formas de evaluar y reconocer el conocimient...
 

Similar a Hpcc

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Presentation at Wright State University
Presentation at Wright State UniversityPresentation at Wright State University
Presentation at Wright State UniversityHPCC Systems
 
Mukhtar_Resume_ETL_Developer
Mukhtar_Resume_ETL_DeveloperMukhtar_Resume_ETL_Developer
Mukhtar_Resume_ETL_DeveloperMukhtar Mohammed
 
Mukhtar resume etl_developer
Mukhtar resume etl_developerMukhtar resume etl_developer
Mukhtar resume etl_developerMukhtar Mohammed
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introductionrajsandhu1989
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)ruchabhandiwad
 

Similar a Hpcc (20)

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
HPCC Presentation
HPCC PresentationHPCC Presentation
HPCC Presentation
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
HCE project brief
HCE project briefHCE project brief
HCE project brief
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Presentation at Wright State University
Presentation at Wright State UniversityPresentation at Wright State University
Presentation at Wright State University
 
Mukhtar_Resume_ETL_Developer
Mukhtar_Resume_ETL_DeveloperMukhtar_Resume_ETL_Developer
Mukhtar_Resume_ETL_Developer
 
Mukhtar resume etl_developer
Mukhtar resume etl_developerMukhtar resume etl_developer
Mukhtar resume etl_developer
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
 
r4
r4r4
r4
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)
 

Último

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Último (20)

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

Hpcc

  • 1. HPCC (High-Performance Computing Cluster) IS A MASSIVE PARALLEL-PROCESSING COMPUTING PLATFORM THAT SOLVES BIG DATA PROBLEMS. THE PLATFORM IS NOW OPEN SOURCE!
  • 2. HPCC vs HADOOP  Declarative programming language: Describe what needs to be done and not how to do it  Powerful: Unlike Java, high level primitives such as JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. Higher level code means less programmers and shorter time to deliver complete projects  Extensible: As new attributes are defined, they become primitives that other programmers can use  Implicitly parallel: Parallelism is built into the underlying platform. The programmer needs not be concerned with it  Maintainable: A High level programming language, no side effects and attribute encapsulation provide for more succinct, reliable and easier to troubleshoot code  Complete: Unlike Pig and Hive, ECL provides for a complete programming paradigm.  Homogeneous: One language to express data algorithms across the entire HPCC platform, including data ETL and delivery.
  • 3. The Enterprise Control Language (ECL)  HPCC Systems Enterprise Control Language (ECL) is the query and control language developed to manage all aspects of the massive data joins, sorts and builds. ECL truly differentiates HPCC from other technologies in its ability to provide flexible data analysis on a massive scale.  ECL is a declarative language optimized for the manipulation of massive data sets and provides for modular structured programming. Moreover, ECL is a transparent and implicitly parallel programming language which is both powerful and flexible, allowing for faster and more effective development cycles, through higher expressiveness, encapsulation and code reuse.  Data analysts can “express” complex queries without the need for iterative, time-consuming data transformations and sorts associated with other programming languages. Traditional low level languages (Java, C++ etc) force the translation of business requirements to functional requirements before programming can occur. The abstract nature of ECL eliminates the need for this by making it easy to express business rules directly and succinctly.
  • 4. HPCC System Architecture  The HPCC system architecture includes two distinct cluster processing environments, each of which can be optimized independently for its parallel data processing purpose. The first of these platforms is called a Data Refinery whose overall purpose is the general processing of massive volumes of raw data of any type for any purpose but typically used for data cleansing and hygiene.  ETL processing of the raw data, record linking and entity resolution, large-scale ad-hoc complex analytics, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications. The Data Refinery is also referred to as Thor.  A Thor cluster is similar in its function, execution environment, filesystem, and capabilities to the Google and Hadoop MapReduce platforms.
  • 5. It shows a representation of a physical Thor processing cluster which functions as a batch job execution engine for scalable data-intensive computing applications. In addition to the Thor master and slave nodes, additional auxiliary and common components are needed to implement a complete HPCC processing environment.
  • 6. Roxie(rapid data delivery engine)  The second of the parallel data processing platforms is called Roxie and functions as a rapid data delivery engine.  This platform is designed as an online high-performance structured query and analysis platform or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times.  Roxie utilizes a distributed indexed filesystem to provide parallel processing of queries using an optimized execution environment and filesystem for high- performance online processing.  A Roxie cluster is similar in its function and capabilities to Hadoop with HBase and Hive capabilities added, and provides for near real time predictable query latencies.  Both Thor and Roxie clusters utilize the ECL programming language for implementing applications, increasing continuity and programmer productivity.
  • 8. Continued…  It shows a representation of a physical Roxie processing cluster which functions as an online query execution engine for high- performance query and data warehousing applications.  A Roxie cluster includes multiple nodes with server and worker processes for processing queries; an additional auxiliary component called an ESP server which provides interfaces for external client access to the cluster; and additional common components which are shared with a Thor cluster in an HPCC environment. Although a Thor processing cluster can be implemented and used without a Roxie cluster, an HPCC environment which includes a Roxie cluster should also include a Thor cluster. The Thor cluster is used to build the distributed index files used by the Roxie cluster and to develop online queries which will be deployed with the index files to the Roxie cluster.
  • 9. More on ECL(data-centric programming language)  ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process big data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions.  Sorting problem // First declare a dataset with one column containing a list of strings // Datasets can also be binary, csv, xml or externally defined structures D :=DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;}); SD := SORT(D,Value); output(SD)
  • 10. More on ECL(data-centric programming language)  ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE, PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOSEN, ENTH, TOPN, DISTRIBUTE.  Comparison to Map-Reduce The Hadoop Map-Reduce paradigm actually consists of three phases which correlate to ECL primitives as follows.