Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring

•

3 recomendaciones•748 vistas

In this talk Mark Baker (CSL) will show how CSL Behring is Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI to a central Hadoop data lake at CSL Behring The challenge of merging data from disparate systems has been a leading driver behind investments in data warehousing systems, as well as, in Hadoop. While data warehousing solutions are ready-built for RDBMS integration, Hadoop adds the benefits of infinite and economical scale – not to mention the variety of structured and non-structured formats that it can handle. Whether using a data warehouse or Hadoop or both, physical data movement and consolidation is the primary method of integration. There may also be challenges with synchronizing rapidly changing data from a system of record to a consolidated Hadoop platform . This introduces the need for “data federation” , where data is integrated without copying data between systems. For historical/batch data use cases there is a replication of data across remote data hubs into a central data lake using Apache NIFI. We will demo using Apache Zeppelin for analyzing data using Apache Spark and Apache HIVE.

Tecnología

INGELA VIKSTROM, ANABEL SILVA, SANDRO PRATO
CSL Bio21 Research Scientists
Australia
INGELA VIKSTROM, ANABEL SILVA, SANDRO PRATO
CSL Bio21 Research Scientists
Australia
MARK BAKER
Head of Big Data Infrastructure
CSL Behring
ANALYZING DATA FROM MULTIPLE
MANUFACTURING SITES USING A CENTRAL
HADOOP DATA LAKE

Outline
• CSL Behring
– Introduction of CSL Behring
• CSL Behring’s products and focus
• Growth and global placement of manufacturing facilities
• Current PACE globalization initiative
• Streamlining global processes to improve efficiency
• Partnership with Hortonworks to create our Big Data
Platform
• HDP for Data lake and analytics using Zeppelin
• HDF for secure data movement from global manufacturing sites to
our central data repository SAP HANA & HDP.
• Q & A

CSL Behring’s Products and Focus
• CSL Behring
– CSL Behring is a global biotherapeutics leader
– Focused on serving patients’ needs by using the latest
technologies
• Deliver innovative therapies that are used to treat rare
and serious conditions.
– One of our “super orphan” therapies treats a condition affecting
approximately 300 patients in the U.S. and only one million
worldwide. To meet growing demand and bring more therapies to
more patients, we continue to invest in the expansion of all our
manufacturing facilities

Business Driver
PACE globalization initiative
• PACE is a global, transformation initiative that fulfills our
promise to patients by aligning our processes and
enhancing collaboration to achieve sustainable business
excellence
• Provide advanced analytics capabilities to exploit
existing and new data assets, support decision-making,
and provide predictive models
• Build user community with the right skills and right tools

Global Manufacturing Facilities
• Manufacturing Sites
• United States
– Kankakee
• Germany
– Marburg
• Switzerland
– Bern
• Australia
– Melbourne
• Historically separated by region and operated
independently

Manufacturing & Analytical Silos
13/06/20176

Future Manufacturing Data Flows
13/06/20177

Challenges
• Each Manufacturing system uses a different backend
databases and schema to log the batch execution steps
– 12 x SCADA and MES systems
• Edge servers must not impact MES system performance
– Sensitive systems required impact assessment prior to direct
data extracts
• Data must be encrypted in motion and at rest
– HIPAA compliance and EU privacy requirements
• Data must be compressed over the WAN
– Due to bandwidth constrictions on intranet
• Multiple time zones and string encodings

NiFi
• Allows the creation of custom processors for each MES
system (python).
• Uses back pressure to eliminate any full database pulls
after network/hardware outages.
• Encrypts data over the wire.
• Compresses data over the wire
• Allows data enrichment for the addition of UTC column.
• ETL functionality allows for special characters to be
transformed into data analytical tools can process ex. ṏ

Thank You
CSL Limited
45 Poplar Road
Parkville, Victoria, 3056
Australia
TEST

Más contenido relacionado

La actualidad más candente

Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit

Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit

Breaking the Silos: Storage for Analytics & AIDataWorks Summit

Shaping a Digital VisionDataWorks Summit/Hadoop Summit

Modernise your EDW - Data LakeDataWorks Summit/Hadoop Summit

Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit

Hadoop: The Unintended BenefitsDataWorks Summit

Data-In-Motion UnleashedDataWorks Summit

It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit

High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit

Hadoop Journey at WalgreensDataWorks Summit

Log I am your fatherDataWorks Summit/Hadoop Summit

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...DataWorks Summit

Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit

How big data and AI saved the day: critical IP almost walked out the doorDataWorks Summit

The Destiny of DataHortonworks

Hadoop for the MassesDataWorks Summit/Hadoop Summit

Accelerating Data Warehouse ModernizationDataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient DataWorks Summit/Hadoop Summit

Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit

La actualidad más candente (20)

Lightning Fast Analytics with Hive LLAP and Druid

Innovation in the Enterprise Rent-A-Car Data Warehouse

Breaking the Silos: Storage for Analytics & AI

Shaping a Digital Vision

Modernise your EDW - Data Lake

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

Hadoop: The Unintended Benefits

Data-In-Motion Unleashed

It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...

High Performance Spatial-Temporal Trajectory Analysis with Spark

Hadoop Journey at Walgreens

Log I am your father

How Apache Spark and Apache Hadoop are being used to keep banking regulators ...

Designing data pipelines for analytics and machine learning in industrial set...

How big data and AI saved the day: critical IP almost walked out the door

The Destiny of Data

Hadoop for the Masses

Accelerating Data Warehouse Modernization

How Hadoop Makes the Natixis Pack More Efficient

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

Similar a Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring

Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung

Chip ICT | Hgst storage brochureMarco van der Hart

Driving Faster Analytics at Symphony HealthPrecisely

Cray Urika-XA Advanced Analytics Platforminside-BigData.com

Big Data/Cloudera from Excelerate SystemsDavid Bennett

Oracle Database Appliance X5-2 Yasir El Nimr

20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge

The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.

High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox

Predicting Patient Outcomes in Real-Time at HCASri Ambati

Foxvalley bigdataTom Rogers

Shared services - the future of HPC and big data facilities for UK researchMartin Hamilton

Big data analytics and machine intelligence v5.0Amr Kamel Deklel

Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP Amnon Raviv

Meetup 25/04/19: Big DataDigipolis Antwerpen

Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent

Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon

Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely

Similar a Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring (20)

Agile Big Data Analytics Development: An Architecture-Centric Approach

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent

Chip ICT | Hgst storage brochure

Driving Faster Analytics at Symphony Health

Cray Urika-XA Advanced Analytics Platform

Big Data/Cloudera from Excelerate Systems

Oracle Database Appliance X5-2

20160331 sa introduction to big data pipelining berlin meetup 0.3

The Future of Data Warehousing: ETL Will Never be the Same

High Performance Data Analytics and a Java Grande Run Time

Predicting Patient Outcomes in Real-Time at HCA

Foxvalley bigdata

Shared services - the future of HPC and big data facilities for UK research

Big data analytics and machine intelligence v5.0

Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP

Meetup 25/04/19: Big Data

Automating the process of continuously prioritising data, updating and deploy...

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...

Turn Data Into Actionable Insights - StampedeCon 2016

Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...

Más de DataWorks Summit

Data Science Crash CourseDataWorks Summit

Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit

Managing the Dewey Decimal SystemDataWorks Summit

Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

Security Framework for Multitenant ArchitectureDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit

Extending Twitter's Data Platform to Google CloudDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit

Computer Vision: Coming to a Store Near YouDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Más de DataWorks Summit (20)

Data Science Crash Course

Floating on a RAFT: HBase Durability with Apache Ratis

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

HBase Tales From the Trenches - Short stories about most common HBase operati...

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

Managing the Dewey Decimal System

Practical NoSQL: Accumulo's dirlist Example

HBase Global Indexing to support large-scale data ingestion at Uber

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Security Framework for Multitenant Architecture

Presto: Optimizing Performance of SQL-on-Anything Engine

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Extending Twitter's Data Platform to Google Cloud

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Computer Vision: Coming to a Store Near You

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Último

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Take control of your SAP testing with UiPath Test SuiteDianaGray10

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

WordPress Websites for Engineers: Elevate Your Brandgvaughan

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Advanced Computer Architecture – An IntroductionDilum Bandara

Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring

1. INGELA VIKSTROM, ANABEL SILVA, SANDRO PRATO CSL Bio21 Research Scientists Australia INGELA VIKSTROM, ANABEL SILVA, SANDRO PRATO CSL Bio21 Research Scientists Australia MARK BAKER Head of Big Data Infrastructure CSL Behring ANALYZING DATA FROM MULTIPLE MANUFACTURING SITES USING A CENTRAL HADOOP DATA LAKE

2. Outline • CSL Behring – Introduction of CSL Behring • CSL Behring’s products and focus • Growth and global placement of manufacturing facilities • Current PACE globalization initiative • Streamlining global processes to improve efficiency • Partnership with Hortonworks to create our Big Data Platform • HDP for Data lake and analytics using Zeppelin • HDF for secure data movement from global manufacturing sites to our central data repository SAP HANA & HDP. • Q & A

3. CSL Behring’s Products and Focus • CSL Behring – CSL Behring is a global biotherapeutics leader – Focused on serving patients’ needs by using the latest technologies • Deliver innovative therapies that are used to treat rare and serious conditions. – One of our “super orphan” therapies treats a condition affecting approximately 300 patients in the U.S. and only one million worldwide. To meet growing demand and bring more therapies to more patients, we continue to invest in the expansion of all our manufacturing facilities

4. Business Driver PACE globalization initiative • PACE is a global, transformation initiative that fulfills our promise to patients by aligning our processes and enhancing collaboration to achieve sustainable business excellence • Provide advanced analytics capabilities to exploit existing and new data assets, support decision-making, and provide predictive models • Build user community with the right skills and right tools

5. Global Manufacturing Facilities • Manufacturing Sites • United States – Kankakee • Germany – Marburg • Switzerland – Bern • Australia – Melbourne • Historically separated by region and operated independently

6. Manufacturing & Analytical Silos 13/06/20176

7. Future Manufacturing Data Flows 13/06/20177

8. Challenges • Each Manufacturing system uses a different backend databases and schema to log the batch execution steps – 12 x SCADA and MES systems • Edge servers must not impact MES system performance – Sensitive systems required impact assessment prior to direct data extracts • Data must be encrypted in motion and at rest – HIPAA compliance and EU privacy requirements • Data must be compressed over the WAN – Due to bandwidth constrictions on intranet • Multiple time zones and string encodings

9. NiFi • Allows the creation of custom processors for each MES system (python). • Uses back pressure to eliminate any full database pulls after network/hardware outages. • Encrypts data over the wire. • Compresses data over the wire • Allows data enrichment for the addition of UTC column. • ETL functionality allows for special characters to be transformed into data analytical tools can process ex. ṏ

10. Thank You CSL Limited 45 Poplar Road Parkville, Victoria, 3056 Australia TEST

Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring

Similar a Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring