SlideShare una empresa de Scribd logo
1 de 48
The Download: Community Tech Talks
Episode 11
February 15, 2018
Welcome!
• Please share: Let others know you are here with #HPCCTechTalks
• Ask questions! We will answer as many questions as we can following each speaker.
• Look for polls at the bottom of your screen. Exit full-screen mode or refresh your screen if
you don’t see them.
• We welcome your feedback - please rate us before you leave today and visit our blog for
information after the event.
• Want to be one of our featured speakers? Let us know! techtalks@hpccsystems.com
The Download: Tech Talks #HPCCTechTalks2
Community announcements
3
Dr. Flavio Villanustre
VP Technology
RELX Distinguished Technologist
LexisNexis® Risk Solutions
Flavio.Villanustre@lexisnexisrisk.com
The Download: Tech Talks #HPCCTechTalks
• HPCC Systems Platform updates
• 6.4.10-1 is the latest gold version / Community Changelog
• 6.4.12 RC1 coming soon
• 7.0.0 Beta planned for early Q2 – among the key features:
• Spark integration
• Indexer
• Record Translation
• Session Management Improvements
• VS Code Beta version
• Roadmap items for 2018 and beyond
• Latest Blogs
• HPCC Systems/Tableau Web Data Connector v0.2 Tech Preview
• Machine Learning Demystified
• Reminder: 2018 Summer Internship Proposal Period Open
• Interested candidates can submit proposals from the Ideas List
• Visit the Student Wiki for more details
• Deadline to submit is April 6, 2018
• Program runs late May through mid August
• Don’t delay!
Today’s speakers
4 The Download: Tech Talks #HPCCTechTalks
Raj Chandrasekaran
CTO & Co-Founder
ClearFunnel
raj@clearfunnel.com
Raj is the CTO/Co-Founder of ClearFunnel, a Big
Data Analytics as a Service Platform Startup,
leading their Product Strategy and Solutions.
ClearFunnel focuses on enabling Marketing
Analytics, Advanced Text Analytics, Bio
Informatics and Image Processing for various
clients in Technology, Maritime, Publishing and
Healthcare domains.
Featured Community Speaker
Today’s speakers
5 The Download: Tech Talks #HPCCTechTalks
James McMullan
Software Engineer III
LexisNexis Risk Solutions
James.McMullan@lexisnexisrisk.com
James has a broad range of Software Engineering experience from developing low
level system drivers for X-Ray fluorescence equipment to mobile video games and web
applications. He is a recent addition to the LexisNexis team and is part of an internal
R&D group where he has been working on multiple projects including; HPCC Systems
& Spark benchmarks, integration projects between the HPCC Systems, Spark and
Hadoop ecosystems, and document storage systems.
Bob Foreman
Senior Software Engineer
LexisNexis Risk Solutions
Robert.Foreman@lexisnexisrisk.com
Bob Foreman has worked with the HPCC Systems technology platform and
the ECL programming language for over 5 years, and has been a technical
trainer for over 25 years. He is the developer and designer of the HPCC
Systems Online Training Courses, and is the Senior Instructor for all
classroom and Webex/Lync based training.
Scaling Data Science Capabilities:
Leveraging a Homogeneous Big Data Ecosystem
Raj Chandrasekaran
CTO & Co-Founder
ClearFunnel
Quick poll:
Where have you had the most success in
deployment of HPCC Systems based solutions?
See poll on bottom of presentation screen
To succeed, a Big Data Analytics enterprise needs…
The Download: Tech Talks #HPCCTechTalks8
• An efficient Big Data ecosystem, which comprises the following key
capabilities:
• Big Data Processing
• Data Science: ML & AI
• Cloud Integration
• Leveraging these capabilities for Commercial Advantage
• Key-Success Factor for any Start-up: Cost of Operations and Cash flow
Big Data Processing
The Download: Tech Talks #HPCCTechTalks9
• Top of the list: Hadoop and Spark
• Lots of incremental innovations:
• Hadoop: MapReduce, Hive, HBase, Solr, Pig, Kafka, Yarn, Ambari, Ranger, Knox, Atlas, …
• Spark: Hadoop’s Successor, In-Memory, Directed Acyclic Graph – DAG, Stream Processing,
Machine Learning, SparkSQL, GraphX, Support for Python, Java, R and Scala, …
• Which also means, Lots of Integrations and…
• A variety of Engineering Talent
• Still, all of the above = version 1.01 in the HPCC Systems domain
HPCC Systems Capabilities:  Big Data Processing
Data Science: ML & AI
The Download: Tech Talks #HPCCTechTalks10
• Traditionally, R & Python
• Current State:
• MLLib has a core set of machine learning algorithms, but is certainly not as complete as R or other machine learning
libraries such as MADLib
• SparkR is work-in-progress… you still need a robust ML library to implement advanced Data Science use cases
• ML is also an evolving field in the HPCC Systems domain.
• ECL-ML modules are fully-parallel, and covers both - Supervised and Unsupervised Models
• Extensibility: ECL is natively designed to manage data, and is thereby easily extensible to implement custom ML
algorithms, including Neural Network and Deep Learning.
• ClearFunnel Innovations using ECL-ML:
• Text Processing (Self-learning layered taxonomy, Entity and Topic Extraction, Context Analysis, Point of View Scoring)
• Image Recognition and Pattern Matching (OCR and NN based)
• Maritime Predictive Analytics (Deep Learning with Geospatial and IOT streaming data)
HPCC Systems Capabilities:  Big Data Processing  Data Science
Cloud Integration
The Download: Tech Talks #HPCCTechTalks11
• AWS: The Big Daddy of Cloud
• Core strengths are really EC2 and S3. All other AWS capabilities and micro-services have been built around these 2 foundational
technologies.
• HPCC Systems on AWS:
• HPCC Systems provides native support for AWS (one-click deployment).
• Additionally, HPCC System’s simple, homogeneous tech stack makes it a breeze to operate on cloud with minimal investments in
resources and time.
• ClearFunnel Innovations:
• Spray / De-Spray data between Thor cluster and S3 at speeds of up to 2 TBPS (Netezza’s data transfer rate is 2 – 4 TB/hr)
• Failsafe job operation (recover instantly from any failures)
• Near Real-Time, Micro-batching, Monitoring, Alert, Data Delivery APIs, etc. capabilities by integrating AWS micro-services
and HPCC Systems
• Key Principles:
• Avoid creating layers of abstractions on both ends (AWS and HPCC Systems).
• Instead integrate HPCC Systems directly with core capabilities of EC2 and S3.
HPCC Systems Capabilities:  Big Data Processing  Data Science  Cloud
Leveraging HPCC Systems for Commercial Success
The Download: Tech Talks #HPCCTechTalks12
• ClearFunnel has implemented a full-spectrum of complex data engineering use cases
using HPCC Systems:
• Complex and large Graph traversal across nodes
• Image Analytics
• Operational Analytics with Near Real-Time and Stream Processing based Analog data
• Pattern Detection in Bioinformatics
• NLP and advanced Text Analytics
• IOT based sensor-data integration and analytics
• Advanced Search and Querying
• Single, homogeneous tech stack:
• ClearFunnel’s Big Data Analytics Platform runs these diverse use cases with a homogeneous tech stack,
extending HPCC Systems’ capabilities to meet virtually any Big Data processing requirement
Key Success Criteria: Cost of Big Data Operations
The Download: Tech Talks #HPCCTechTalks13
• Distinctive Cost benefits from using a Homogeneous tech stack and a highly
productive ECL language
• “Fail-fast, fail-often” and multiple iterations of solution development do not involve a
lot of time, resources, and cost
• Re-use and Refactor core ML & AI modules across use cases (single language
implementation)
• Minimal Cost of Operations:
• ClearFunnel is operating multiple Big Data clusters in Production environment with
hundreds of nodes each, without any dedicated support staff - Cloud Engineer,
Infrastructure Engineer, Network Engineer, Production Support Engineer, Dev Ops
Engineer, or Tech Ops Specialist!
• Enabled with efficient automation and close integration of AWS and HPCC Systems
Quick poll:
In your opinion, which of these use cases are most
suitable for implementing in HPCC Systems?
See poll on bottom of presentation screen
Questions?
Raj Chandrasekaran
CTO & Co-Founder
ClearFunnel
raj@clearfunnel.com
https://clearfunnel.com/
The Download: Tech Talks #HPCCTechTalks15
HDFS Connector Preview
James McMullan
Software Engineer III
LexisNexis® Risk Solutions
Quick poll:
Would you be interested in
interacting with the Hadoop
ecosystem from HPCC Systems?
See poll on bottom of presentation screen
Overview
• HDFS Connector Motivations
• Why are we making the connector?
• What are our goals for the connector?
• Overview of HDFS Architecture
• How is data stored in HDFS?
• How can we interact with HDFS?
• HDFS Connector Design
• Overview of how the connector works & achieves parallelism
• HDFS Connector Demo
The Download: Tech Talks #HPCCTechTalks18
HDFS Connector Motivations
• Interact with HDFS datasets and Hadoop processes
• Existing HPCC to Hadoop (h2h) Project
• No longer maintained
• Chance to improve upon h2h
• Tighter integration with HPCC
• Fewer dependencies
• Fewer failure points
• Possibility for New Features
• Variable length record flat files
• Hadoop File Formats?
The Download: Tech Talks #HPCCTechTalks19
HDFS Connector Goals
• Robust – Should “Just Work”
• Straight forward
• Few dependencies
• Little to no configuration
• Tightly integrated
• Datasets from HDFS should be first class citizens
• Performant
• Parallelism where possible
• Reduce data transfer costs
The Download: Tech Talks #HPCCTechTalks20
Overview of HDFS Architecture
• How are files stored in HDFS?
• Stored as blocks of data & metadata
• Blocks are usually 64 MiB
• Blocks replicated for fault tolerance
• Namenode
• File metadata
• Filesystem namespace
• Datanodes
• Blocks of data
• No knowledge of files
The Download: Tech Talks #HPCCTechTalks21
Datanodes
Namenode
Metadata
Overview of HDFS Architecture
• Reading & Writing in HDFS
• Namenode arbitrates reads & writes
• Datanodes fulfill reads & writes
• Multiple readers / Single writer
• Client Applications
• Java Hadoop or native libHDFS libraries
• Messaging uses Google Protocol Buffers
The Download: Tech Talks #HPCCTechTalks22
Datanodes
Namenode
Client
Application
Client
Application
Client
Application
HDFS Connector Design – Communicating with HDFS
• Java Hadoop or native libHDFS library?
• libHDFS relies on the Java Hadoop libraries
• Both require Hadoop to be installed locally
• Google Protocol Buffers?
• Possible but a lot of work
• libHDFS3
• Part of Apache HAWQ
• Completely native implementation of libHDFS
The Download: Tech Talks #HPCCTechTalks23
HDFS Connector Design – HPCC Integration
• ECL PIPE?
• High data transfer costs
• Loosely coupled
• Leverages native ECL
• Import Java Library?
• High data transfer costs
• Adds lots of dependencies
• Parallelism is difficult
• Native ECL Plugin?
• Low data transfer costs
• Fewest dependencies
• Parallelism is possible
The Download: Tech Talks #HPCCTechTalks24
HDFS Connector Design – Reading Data in Parallel
• CSV Files & Flat Files Fixed Record
• Break HDFS file into logical chunks
• One chunk per HPCC node
• Chunks aren’t record aligned
• Consume records that begin in our chunk
• Variable Record Flat Files
• Need record split metadata
• Create split metadata on write
• Preprocess step if no metadata
The Download: Tech Talks #HPCCTechTalks25
Chunk
Consumed
Records
Chunk
Split Metadata
HDFS Connector Design – Writing to HDFS
• HDFS is single writer
• Single File
• Each Thor node writes its data to the file in sequence
• Requires Append mode to be enabled
• Interacts well with existing HDFS ecosystem
• Multiple File Parts
• Similar to how HPCC stores files
• Parallel writing
• Existing Hadoop applications would need to be updated
The Download: Tech Talks #HPCCTechTalks26
HDFS Connector Demo – Writing a dataset to HDFS
The Download: Tech Talks #HPCCTechTalks27
HDFS Connector Demo – Reading a dataset from HDFS
The Download: Tech Talks #HPCCTechTalks28
HDFS Connector Demo – Working with HDFS Datasets
The Download: Tech Talks #HPCCTechTalks29
Quick poll:
Do you currently use HDFS as a data
store?
See poll on bottom of presentation screen
Questions?
James McMullan
Software Engineer III
LexisNexis Risk Solutions
James.McMullan@lexisnexisrisk.com
The Download: Tech Talks #HPCCTechTalks31
ECL Tips and Cool Tricks –
Building a Relational Dataset
Bob Foreman
Senior Software Engineer
LexisNexis Risk Solutions
Quick poll:
Have you ever worked with a relational
denormalized dataset in ECL?
See poll on bottom of presentation screen
Background
• Most of our datasets on an HPCC cluster are organized in a normalized
architecture.
• A unique linking field in one dataset can be used to join with other datasets
using a one-to-one or a one-to-many relationship.
• In LexisNexis we affectionately refer to this architecture as the “Data Donut”
The Download: Tech Talks #HPCCTechTalks34
The LN Data “Donut”
DID
ADL
IDL
LinkID
LexID
The Download: Tech Talks #HPCCTechTalks35
Sometimes, analyzing or
querying this
normalized data can be
challenging.
Enter the
“denormalized”
dataset!
Given a sample 3-level hierarchical relational
database:
People
Vehicle
Property
Taxdata
Example Data:
The Download: Tech Talks #HPCCTechTalks36
Denormalizing Related Data:
People Vehicles Vehicles Vehicles
Property Taxdata Taxdata Taxdata
Property Taxdata Taxdata Taxdata
Start One Record
End of record
The Download: Tech Talks #HPCCTechTalks37
(Continued)
(Continued)
ChildRecord := RECORD
UNSIGNED4 person_id;
UNSIGNED8 address_id;
STRING20 per_surname;
STRING20 per_forename;
END;
ParentRecord := RECORD
UNSIGNED8 id;
STRING20 address;
STRING20 CSZ;
STRING10 postcode;
UNSIGNED2 numPeople;
DATASET(ChildRecord) children {MAXCOUNT(20)};
END;
EXPORT File_Address := DATASET('CLASS::Adr_List', ParentRecord, THOR);
Nested Child Dataset RECORD:
The Download: Tech Talks #HPCCTechTalks38
DENORMALIZE(parentoutput,childrecset,condition,transform)
parentoutput – The set of parent records already formatted as the result of the
combination.
childrecset – The set of child records to process.
condition – An expression that specifies how to match records between the
parent and child records.
transform – The TRANSFORM function to call.
The DENORMALIZE function forms flat file records from a parent and any number
of children.
The transform function must take at least 2 parameters: a LEFT record of the
same format as the resulting combined parent and child records, and a RIGHT
record of the same format as the childrecset. An optional integer COUNTER
parameter can be included which indicates the current iteration of child record.
DENORMALIZE Function:
The Download: Tech Talks #HPCCTechTalks39
Implicit Dataset Relationality
(Nested child datasets):
Parent record fields are always in memory when operating at the
level of the Child
You may only reference the related set of Child records when
operating at the level of the Parent
People
Vehicle
Property
Taxdata
Querying Relational Data:
The Download: Tech Talks #HPCCTechTalks40
NORMALIZE(recordset, expression, transform)
recordset – The set of records to process.
expression – An numeric expression specifying the total number of times to call
the transform for that record.
transform – The TRANSFORM function to call for each record in the recordset.
The NORMALIZE function processes through all the records in the recordset
performing the transform function the expression number of times on each
record in turn to produce relational child records of the parent.
The transform function must take two parameters: A LEFT record of the same
format as the recordset, and an integer COUNTER specifying the number of times
to call the transform for that record. The format of the resulting recordset can be
different from the input.
NORMALIZE Function
The Download: Tech Talks #HPCCTechTalks41
ECL Code Demonstration
Let’s look at some ECL!
The Download: Tech Talks #HPCCTechTalks42
Summary
• Using a denormalized dataset can improve the power of your queries and
discover hidden relationship in the data.
• ECL has powerful and easy support in moving from a normalized to a
denormalized format when needed.
• Knowing how to move both ways and the best practices in doing so is a good
skill to have for all ECL developers.
The Download: Tech Talks #HPCCTechTalks43
In closing: LOVE YOUR DATA!
The Download: Tech Talks #HPCCTechTalks44
Quick poll:
After today’s ECL Tech Tip, will you use
DENORMALIZE for any advanced query
applications?
See poll on bottom of presentation screen
Questions?
Bob Foreman
Senior Software Engineer
LexisNexis Risk Solutions
Robert.Foreman@lexisnexisrisk.com
The Download: Tech Talks #HPCCTechTalks46
• Have a new success story to share?
• Want to pitch a new use case?
• Have a new HPCC Systems application you want to demo?
• Want to share some helpful ECL tips and sample code?
• Have a new suggestion for the roadmap?
• Be a featured speaker for an upcoming episode! Email your idea to
Techtalks@hpccsystems.com
• Visit The Download Tech Talks wiki for more information:
https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+Tech+Talks
Mark your calendar for the March 15 Tech Talk -
More machine learning topics coming!
Watch our Events page for details.
Submit a talk for an upcoming episode!
47 The Download: Tech Talks #HPCCTechTalks
A copy of this presentation will be made available soon on our blog:
hpccsystems.com/blog
Thank You!

Más contenido relacionado

La actualidad más candente

ETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsC4Media
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Rogue Wave Software
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...DataWorks Summit
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkFlink Forward
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Hybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and KubernetesHybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and KubernetesDatabricks
 
Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Henry Saputra
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiDataWorks Summit
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersJerome Marc
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 

La actualidad más candente (20)

ETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live Streams
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. Spark
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Hybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and KubernetesHybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and Kubernetes
 
Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_Containers
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 

Similar a The Download: Tech Talks by the HPCC Systems Community, Episode 11

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...Precisely
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016INDUSCommunity
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
High-Level Synthesis for the Design of AI Chips
High-Level Synthesis for the Design of AI ChipsHigh-Level Synthesis for the Design of AI Chips
High-Level Synthesis for the Design of AI ChipsObject Automation
 
8 Things to Consider as SharePoint Moves to the Cloud
8 Things to Consider as SharePoint Moves to the Cloud8 Things to Consider as SharePoint Moves to the Cloud
8 Things to Consider as SharePoint Moves to the CloudChristian Buckley
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemZohar Elkayam
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely
 

Similar a The Download: Tech Talks by the HPCC Systems Community, Episode 11 (20)

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
From traditional to GitOps
From traditional to GitOpsFrom traditional to GitOps
From traditional to GitOps
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
High-Level Synthesis for the Design of AI Chips
High-Level Synthesis for the Design of AI ChipsHigh-Level Synthesis for the Design of AI Chips
High-Level Synthesis for the Design of AI Chips
 
8 Things to Consider as SharePoint Moves to the Cloud
8 Things to Consider as SharePoint Moves to the Cloud8 Things to Consider as SharePoint Moves to the Cloud
8 Things to Consider as SharePoint Moves to the Cloud
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 

Más de HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

Más de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Último

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 

Último (20)

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 

The Download: Tech Talks by the HPCC Systems Community, Episode 11

  • 1. The Download: Community Tech Talks Episode 11 February 15, 2018
  • 2. Welcome! • Please share: Let others know you are here with #HPCCTechTalks • Ask questions! We will answer as many questions as we can following each speaker. • Look for polls at the bottom of your screen. Exit full-screen mode or refresh your screen if you don’t see them. • We welcome your feedback - please rate us before you leave today and visit our blog for information after the event. • Want to be one of our featured speakers? Let us know! techtalks@hpccsystems.com The Download: Tech Talks #HPCCTechTalks2
  • 3. Community announcements 3 Dr. Flavio Villanustre VP Technology RELX Distinguished Technologist LexisNexis® Risk Solutions Flavio.Villanustre@lexisnexisrisk.com The Download: Tech Talks #HPCCTechTalks • HPCC Systems Platform updates • 6.4.10-1 is the latest gold version / Community Changelog • 6.4.12 RC1 coming soon • 7.0.0 Beta planned for early Q2 – among the key features: • Spark integration • Indexer • Record Translation • Session Management Improvements • VS Code Beta version • Roadmap items for 2018 and beyond • Latest Blogs • HPCC Systems/Tableau Web Data Connector v0.2 Tech Preview • Machine Learning Demystified • Reminder: 2018 Summer Internship Proposal Period Open • Interested candidates can submit proposals from the Ideas List • Visit the Student Wiki for more details • Deadline to submit is April 6, 2018 • Program runs late May through mid August • Don’t delay!
  • 4. Today’s speakers 4 The Download: Tech Talks #HPCCTechTalks Raj Chandrasekaran CTO & Co-Founder ClearFunnel raj@clearfunnel.com Raj is the CTO/Co-Founder of ClearFunnel, a Big Data Analytics as a Service Platform Startup, leading their Product Strategy and Solutions. ClearFunnel focuses on enabling Marketing Analytics, Advanced Text Analytics, Bio Informatics and Image Processing for various clients in Technology, Maritime, Publishing and Healthcare domains. Featured Community Speaker
  • 5. Today’s speakers 5 The Download: Tech Talks #HPCCTechTalks James McMullan Software Engineer III LexisNexis Risk Solutions James.McMullan@lexisnexisrisk.com James has a broad range of Software Engineering experience from developing low level system drivers for X-Ray fluorescence equipment to mobile video games and web applications. He is a recent addition to the LexisNexis team and is part of an internal R&D group where he has been working on multiple projects including; HPCC Systems & Spark benchmarks, integration projects between the HPCC Systems, Spark and Hadoop ecosystems, and document storage systems. Bob Foreman Senior Software Engineer LexisNexis Risk Solutions Robert.Foreman@lexisnexisrisk.com Bob Foreman has worked with the HPCC Systems technology platform and the ECL programming language for over 5 years, and has been a technical trainer for over 25 years. He is the developer and designer of the HPCC Systems Online Training Courses, and is the Senior Instructor for all classroom and Webex/Lync based training.
  • 6. Scaling Data Science Capabilities: Leveraging a Homogeneous Big Data Ecosystem Raj Chandrasekaran CTO & Co-Founder ClearFunnel
  • 7. Quick poll: Where have you had the most success in deployment of HPCC Systems based solutions? See poll on bottom of presentation screen
  • 8. To succeed, a Big Data Analytics enterprise needs… The Download: Tech Talks #HPCCTechTalks8 • An efficient Big Data ecosystem, which comprises the following key capabilities: • Big Data Processing • Data Science: ML & AI • Cloud Integration • Leveraging these capabilities for Commercial Advantage • Key-Success Factor for any Start-up: Cost of Operations and Cash flow
  • 9. Big Data Processing The Download: Tech Talks #HPCCTechTalks9 • Top of the list: Hadoop and Spark • Lots of incremental innovations: • Hadoop: MapReduce, Hive, HBase, Solr, Pig, Kafka, Yarn, Ambari, Ranger, Knox, Atlas, … • Spark: Hadoop’s Successor, In-Memory, Directed Acyclic Graph – DAG, Stream Processing, Machine Learning, SparkSQL, GraphX, Support for Python, Java, R and Scala, … • Which also means, Lots of Integrations and… • A variety of Engineering Talent • Still, all of the above = version 1.01 in the HPCC Systems domain HPCC Systems Capabilities:  Big Data Processing
  • 10. Data Science: ML & AI The Download: Tech Talks #HPCCTechTalks10 • Traditionally, R & Python • Current State: • MLLib has a core set of machine learning algorithms, but is certainly not as complete as R or other machine learning libraries such as MADLib • SparkR is work-in-progress… you still need a robust ML library to implement advanced Data Science use cases • ML is also an evolving field in the HPCC Systems domain. • ECL-ML modules are fully-parallel, and covers both - Supervised and Unsupervised Models • Extensibility: ECL is natively designed to manage data, and is thereby easily extensible to implement custom ML algorithms, including Neural Network and Deep Learning. • ClearFunnel Innovations using ECL-ML: • Text Processing (Self-learning layered taxonomy, Entity and Topic Extraction, Context Analysis, Point of View Scoring) • Image Recognition and Pattern Matching (OCR and NN based) • Maritime Predictive Analytics (Deep Learning with Geospatial and IOT streaming data) HPCC Systems Capabilities:  Big Data Processing  Data Science
  • 11. Cloud Integration The Download: Tech Talks #HPCCTechTalks11 • AWS: The Big Daddy of Cloud • Core strengths are really EC2 and S3. All other AWS capabilities and micro-services have been built around these 2 foundational technologies. • HPCC Systems on AWS: • HPCC Systems provides native support for AWS (one-click deployment). • Additionally, HPCC System’s simple, homogeneous tech stack makes it a breeze to operate on cloud with minimal investments in resources and time. • ClearFunnel Innovations: • Spray / De-Spray data between Thor cluster and S3 at speeds of up to 2 TBPS (Netezza’s data transfer rate is 2 – 4 TB/hr) • Failsafe job operation (recover instantly from any failures) • Near Real-Time, Micro-batching, Monitoring, Alert, Data Delivery APIs, etc. capabilities by integrating AWS micro-services and HPCC Systems • Key Principles: • Avoid creating layers of abstractions on both ends (AWS and HPCC Systems). • Instead integrate HPCC Systems directly with core capabilities of EC2 and S3. HPCC Systems Capabilities:  Big Data Processing  Data Science  Cloud
  • 12. Leveraging HPCC Systems for Commercial Success The Download: Tech Talks #HPCCTechTalks12 • ClearFunnel has implemented a full-spectrum of complex data engineering use cases using HPCC Systems: • Complex and large Graph traversal across nodes • Image Analytics • Operational Analytics with Near Real-Time and Stream Processing based Analog data • Pattern Detection in Bioinformatics • NLP and advanced Text Analytics • IOT based sensor-data integration and analytics • Advanced Search and Querying • Single, homogeneous tech stack: • ClearFunnel’s Big Data Analytics Platform runs these diverse use cases with a homogeneous tech stack, extending HPCC Systems’ capabilities to meet virtually any Big Data processing requirement
  • 13. Key Success Criteria: Cost of Big Data Operations The Download: Tech Talks #HPCCTechTalks13 • Distinctive Cost benefits from using a Homogeneous tech stack and a highly productive ECL language • “Fail-fast, fail-often” and multiple iterations of solution development do not involve a lot of time, resources, and cost • Re-use and Refactor core ML & AI modules across use cases (single language implementation) • Minimal Cost of Operations: • ClearFunnel is operating multiple Big Data clusters in Production environment with hundreds of nodes each, without any dedicated support staff - Cloud Engineer, Infrastructure Engineer, Network Engineer, Production Support Engineer, Dev Ops Engineer, or Tech Ops Specialist! • Enabled with efficient automation and close integration of AWS and HPCC Systems
  • 14. Quick poll: In your opinion, which of these use cases are most suitable for implementing in HPCC Systems? See poll on bottom of presentation screen
  • 15. Questions? Raj Chandrasekaran CTO & Co-Founder ClearFunnel raj@clearfunnel.com https://clearfunnel.com/ The Download: Tech Talks #HPCCTechTalks15
  • 16. HDFS Connector Preview James McMullan Software Engineer III LexisNexis® Risk Solutions
  • 17. Quick poll: Would you be interested in interacting with the Hadoop ecosystem from HPCC Systems? See poll on bottom of presentation screen
  • 18. Overview • HDFS Connector Motivations • Why are we making the connector? • What are our goals for the connector? • Overview of HDFS Architecture • How is data stored in HDFS? • How can we interact with HDFS? • HDFS Connector Design • Overview of how the connector works & achieves parallelism • HDFS Connector Demo The Download: Tech Talks #HPCCTechTalks18
  • 19. HDFS Connector Motivations • Interact with HDFS datasets and Hadoop processes • Existing HPCC to Hadoop (h2h) Project • No longer maintained • Chance to improve upon h2h • Tighter integration with HPCC • Fewer dependencies • Fewer failure points • Possibility for New Features • Variable length record flat files • Hadoop File Formats? The Download: Tech Talks #HPCCTechTalks19
  • 20. HDFS Connector Goals • Robust – Should “Just Work” • Straight forward • Few dependencies • Little to no configuration • Tightly integrated • Datasets from HDFS should be first class citizens • Performant • Parallelism where possible • Reduce data transfer costs The Download: Tech Talks #HPCCTechTalks20
  • 21. Overview of HDFS Architecture • How are files stored in HDFS? • Stored as blocks of data & metadata • Blocks are usually 64 MiB • Blocks replicated for fault tolerance • Namenode • File metadata • Filesystem namespace • Datanodes • Blocks of data • No knowledge of files The Download: Tech Talks #HPCCTechTalks21 Datanodes Namenode Metadata
  • 22. Overview of HDFS Architecture • Reading & Writing in HDFS • Namenode arbitrates reads & writes • Datanodes fulfill reads & writes • Multiple readers / Single writer • Client Applications • Java Hadoop or native libHDFS libraries • Messaging uses Google Protocol Buffers The Download: Tech Talks #HPCCTechTalks22 Datanodes Namenode Client Application Client Application Client Application
  • 23. HDFS Connector Design – Communicating with HDFS • Java Hadoop or native libHDFS library? • libHDFS relies on the Java Hadoop libraries • Both require Hadoop to be installed locally • Google Protocol Buffers? • Possible but a lot of work • libHDFS3 • Part of Apache HAWQ • Completely native implementation of libHDFS The Download: Tech Talks #HPCCTechTalks23
  • 24. HDFS Connector Design – HPCC Integration • ECL PIPE? • High data transfer costs • Loosely coupled • Leverages native ECL • Import Java Library? • High data transfer costs • Adds lots of dependencies • Parallelism is difficult • Native ECL Plugin? • Low data transfer costs • Fewest dependencies • Parallelism is possible The Download: Tech Talks #HPCCTechTalks24
  • 25. HDFS Connector Design – Reading Data in Parallel • CSV Files & Flat Files Fixed Record • Break HDFS file into logical chunks • One chunk per HPCC node • Chunks aren’t record aligned • Consume records that begin in our chunk • Variable Record Flat Files • Need record split metadata • Create split metadata on write • Preprocess step if no metadata The Download: Tech Talks #HPCCTechTalks25 Chunk Consumed Records Chunk Split Metadata
  • 26. HDFS Connector Design – Writing to HDFS • HDFS is single writer • Single File • Each Thor node writes its data to the file in sequence • Requires Append mode to be enabled • Interacts well with existing HDFS ecosystem • Multiple File Parts • Similar to how HPCC stores files • Parallel writing • Existing Hadoop applications would need to be updated The Download: Tech Talks #HPCCTechTalks26
  • 27. HDFS Connector Demo – Writing a dataset to HDFS The Download: Tech Talks #HPCCTechTalks27
  • 28. HDFS Connector Demo – Reading a dataset from HDFS The Download: Tech Talks #HPCCTechTalks28
  • 29. HDFS Connector Demo – Working with HDFS Datasets The Download: Tech Talks #HPCCTechTalks29
  • 30. Quick poll: Do you currently use HDFS as a data store? See poll on bottom of presentation screen
  • 31. Questions? James McMullan Software Engineer III LexisNexis Risk Solutions James.McMullan@lexisnexisrisk.com The Download: Tech Talks #HPCCTechTalks31
  • 32. ECL Tips and Cool Tricks – Building a Relational Dataset Bob Foreman Senior Software Engineer LexisNexis Risk Solutions
  • 33. Quick poll: Have you ever worked with a relational denormalized dataset in ECL? See poll on bottom of presentation screen
  • 34. Background • Most of our datasets on an HPCC cluster are organized in a normalized architecture. • A unique linking field in one dataset can be used to join with other datasets using a one-to-one or a one-to-many relationship. • In LexisNexis we affectionately refer to this architecture as the “Data Donut” The Download: Tech Talks #HPCCTechTalks34
  • 35. The LN Data “Donut” DID ADL IDL LinkID LexID The Download: Tech Talks #HPCCTechTalks35 Sometimes, analyzing or querying this normalized data can be challenging. Enter the “denormalized” dataset!
  • 36. Given a sample 3-level hierarchical relational database: People Vehicle Property Taxdata Example Data: The Download: Tech Talks #HPCCTechTalks36
  • 37. Denormalizing Related Data: People Vehicles Vehicles Vehicles Property Taxdata Taxdata Taxdata Property Taxdata Taxdata Taxdata Start One Record End of record The Download: Tech Talks #HPCCTechTalks37 (Continued) (Continued)
  • 38. ChildRecord := RECORD UNSIGNED4 person_id; UNSIGNED8 address_id; STRING20 per_surname; STRING20 per_forename; END; ParentRecord := RECORD UNSIGNED8 id; STRING20 address; STRING20 CSZ; STRING10 postcode; UNSIGNED2 numPeople; DATASET(ChildRecord) children {MAXCOUNT(20)}; END; EXPORT File_Address := DATASET('CLASS::Adr_List', ParentRecord, THOR); Nested Child Dataset RECORD: The Download: Tech Talks #HPCCTechTalks38
  • 39. DENORMALIZE(parentoutput,childrecset,condition,transform) parentoutput – The set of parent records already formatted as the result of the combination. childrecset – The set of child records to process. condition – An expression that specifies how to match records between the parent and child records. transform – The TRANSFORM function to call. The DENORMALIZE function forms flat file records from a parent and any number of children. The transform function must take at least 2 parameters: a LEFT record of the same format as the resulting combined parent and child records, and a RIGHT record of the same format as the childrecset. An optional integer COUNTER parameter can be included which indicates the current iteration of child record. DENORMALIZE Function: The Download: Tech Talks #HPCCTechTalks39
  • 40. Implicit Dataset Relationality (Nested child datasets): Parent record fields are always in memory when operating at the level of the Child You may only reference the related set of Child records when operating at the level of the Parent People Vehicle Property Taxdata Querying Relational Data: The Download: Tech Talks #HPCCTechTalks40
  • 41. NORMALIZE(recordset, expression, transform) recordset – The set of records to process. expression – An numeric expression specifying the total number of times to call the transform for that record. transform – The TRANSFORM function to call for each record in the recordset. The NORMALIZE function processes through all the records in the recordset performing the transform function the expression number of times on each record in turn to produce relational child records of the parent. The transform function must take two parameters: A LEFT record of the same format as the recordset, and an integer COUNTER specifying the number of times to call the transform for that record. The format of the resulting recordset can be different from the input. NORMALIZE Function The Download: Tech Talks #HPCCTechTalks41
  • 42. ECL Code Demonstration Let’s look at some ECL! The Download: Tech Talks #HPCCTechTalks42
  • 43. Summary • Using a denormalized dataset can improve the power of your queries and discover hidden relationship in the data. • ECL has powerful and easy support in moving from a normalized to a denormalized format when needed. • Knowing how to move both ways and the best practices in doing so is a good skill to have for all ECL developers. The Download: Tech Talks #HPCCTechTalks43
  • 44. In closing: LOVE YOUR DATA! The Download: Tech Talks #HPCCTechTalks44
  • 45. Quick poll: After today’s ECL Tech Tip, will you use DENORMALIZE for any advanced query applications? See poll on bottom of presentation screen
  • 46. Questions? Bob Foreman Senior Software Engineer LexisNexis Risk Solutions Robert.Foreman@lexisnexisrisk.com The Download: Tech Talks #HPCCTechTalks46
  • 47. • Have a new success story to share? • Want to pitch a new use case? • Have a new HPCC Systems application you want to demo? • Want to share some helpful ECL tips and sample code? • Have a new suggestion for the roadmap? • Be a featured speaker for an upcoming episode! Email your idea to Techtalks@hpccsystems.com • Visit The Download Tech Talks wiki for more information: https://wiki.hpccsystems.com/display/hpcc/HPCC+Systems+Tech+Talks Mark your calendar for the March 15 Tech Talk - More machine learning topics coming! Watch our Events page for details. Submit a talk for an upcoming episode! 47 The Download: Tech Talks #HPCCTechTalks
  • 48. A copy of this presentation will be made available soon on our blog: hpccsystems.com/blog Thank You!