SlideShare una empresa de Scribd logo
1 de 50
Best practices and lessons learnt
from Running NiFi at Renault
Kamelia Benchekroun- Big Data Architect
Renault
Abdelkrim Hadjidj - Solution Engineer
Hortonworks
2
Agenda
 The NiFi journey at at Renault
 Best practices for running NiFi in production
 Lessons learnt at Renault
 Questions & answers
3
The NiFi journey at
Renault
4
About Us
Platform Design
Deploy and Automate
Capture and Analyze
Raw Logs and Machine
Data
Improve
Reliability
Speed
Operations
Monitor and
React
Datalake Squad Activities
The Datalake Squad:
• A passionate Team who work really hard to
deliver solutions and services and empower
Data Initiatives at Renault
Datalake Platform Metrics:
450
Connected
Users
30
Data Initiatives
and use cases
8000
Daily
Queries
3500
Service
Requests
300TB
Efficient
Storage
5
Data Lake at Renault – the beginning of the story
CFT
HDP Admin team
DataLake Renault
Node 1 …. Node 16
June 2016
- 10 Users
- Requests by emails, 1 Admin
- Manual provisioning
NFS-GWExports
6
Big Data Ingestion
7
HDF came in (Q1 2017)
Data Flow Platform
Node 1 …. Node 5
HDF Admin team HDP Admin team
DataLake Renault
Node 1 …. Node 40
2016
- 10 Users
- Requests by emails, 1 Admin
- Manual provisioning
2017
- 200 users
- Jira portal, Admins/devops
- NiFi, Automation (Jenkins for HDP)
Today
- 300 – 500 active users
- 7000 query per day
- 40 data source in production
8
Transit Zone RAW Zone
DATA
SOURCES
Data Lake
Data Engineers (DL)NIFI
Other Zones
Data Ingestion Process
Data + Triggers
HDFS / KAFKA
Mail to admin if critical isssues
Technical metrics to AMS
Other Zones
Other Zones
9
Data ingestion workflow
9
10
Dev to prod testing
/tmp/devp/${DESTINATION}
S2S
Dev Prod
11
NiFi Value for Renault
90% of data
ingestion since2016
One Platform for
all data sources : Files,
DBs, API, Brokers, etc.
Offload CFT
100 active data
flows, + 2000
processors in
production,.
Accelerate time to
insights, use case
development and
improve monitoring and
governance
Also used export
data from Data Lake
to other
systems/sites/Cloud
Enables new use
cases connected
plants, package
tracking, IoT
12
Packaging Traceability
13
Packaging Traceability
 POC: 2500 Packages running in the Loop
– 1 package lost = 800 € (= IPhone)
 If the solution is generalized : 600k package
 2016 Status
– 400 K€ packaging re-investment due to packaging
losses
 2017 Status
– 100 K€ cardboard in January
 Test Expectations:
– Reduce cardboard costs and packaging losses
– Test LoRa technology in an industrial context and
Renault activities
– Validate operational added Value of LoRa technology
UMCD
Pitesti Meca
UVD
Pitesti (Dacia)
Bursa
Nissan Sunderland NMUK
Maubeuge (MCA)
Cross dock
(Hungaria)
Nissan Barcelone NMISA
Douai
14
Example flow
 Pitesti -> Barcelona (full)
– 3 transports/week
– 1 truck each time
– Lead Time : 6 days
 NMUK -> Pitesti (Empty)
– once a week
– 1 truck each time
– Lead Time : 6 days
– Cross Dock in Strančice (Tchek Rep.)
15
Use case scope
NETWORK & SECURITY
SENSORS
IOT PLATEFORME
OBJECT VISUALISATION
MÉTIER INTELLIGENCE
IOT PLATEFORME
DATA VISUALISATION
Connect
Collect
Innovate
Drive
Analyse
Objenious
Manual
Treatment
2017
Objenious
Manual
Treatment
S1 2018
Renault IT
(DLK +
DFP)
Objenious
S2 2018
Renault IT
(DLK + DFP
+ IS
integration
+ Analytics)
Telecom
WTB
Renault IT
(DLK + DFP
+ IS
integration
+ Analytics)
Sensor
Not in Scope of Objenious offer
16
Renault Data Centers
Access Control List
or
Certificates
Architecture 1
Remote Process Group
over Proxy
POST HTTP LISTEN HTTP OUTPUT PORT
(Passerelle)
17
Renault Data Centers
Architecture 2
Remote Process Group
over Proxy
OUTPUT PORT
MQTT
AMQP
Websocket…
POST HTTPs LISTEN HTTPs
Or
Any other Operator (Worldwide)
(Passerelle)
IOT Platform
- Device management/provisioning
- Network & Security
18
Reduce cloud communication cost by 50%
Tag Data Decoding with NiFi
ExecuteScript Processor
Raw Data out of sensor : ff1401aa
Decoded Data out of sensor : [{"data":{"battery":3.6007,"temperature":20}}]
19
Architecture + Cloud ingestion
Data Flow Platform
Node 1 …. Node 3
HDF Admin team HDP Admin team
DataLake Renault
Node 1 …. Node 40
Data Flow Platform
Node 1 Node 2
S2S outbound
20
Connected plants
21
Connected plants
MiNiFi
Source 1
Source 2
FTP
MQTT
Source 3
OPCUA
Aggregation server
Collect, processing,
ingestion
OPCUA JMS, Kafka, MQTT,
WebSocket, *
S2S (http, security,
HA)
HDFS, Hive, Solr,
Hbase, etc
JMS
Other sources /
protocols
D A T A I N M O T I O N D A T A A T R E S T
Facility Level Plant Level Corporate Level
Governance
&Integration
Security
Operations
Data Access
Data
Management
22
Overview of Plant Assembly Factory UC
23
Sensors
Full HDF use case (NiFi, Kafka and Storm)
24
Connected plants
Site
To
Site
Demo with MQTTool from
Hand to Data Lake !
Handy, MQTT Broker, NiFi
ConsumeMQTT to Kafka,
NIFI consume Kafka to
ElasticSearch and Hive…
25
Screwing tools valladolid data analysis
26
Data Flow Platform
Node 1 …. Node 3
HDF Admin team HDP Admin team
DataLake Renault
Node 1 …. Node 40
Data Flow Platform
Node 1 Node 2
S2S outbound
DFP Usine - ----------
HDF Flow Mgmt – X coeursN
o
d
e
1
N
o
d
e
X
….
DFP Usine - ----------
HDF Flow Mgmt – X coeursN
o
d
e
1
N
o
d
e
X
….
Data Flow Platform
Node 1 Node 2
S2S
Connected plants
Architecture + Connected plants
27
Best practices for
running NiFi in
production
28
Architecture & sizing
29
Typical NiFi Production Cluster Logical View
30
Sizing considerations
 There are baselines for NiFi sizing but NiFi is like a “programming language” !!
 NiFi resources usage depends on used processors but it’s always IO intensive
– Use different disks / volumes for the three repositories : flow file, content & provenance
– For content repository, it’s recommended to have multiple mount points
– For content and provenance repositories, SSD can provide the best performances
– Heap sizing depends on the use case and used processors
– Depending on the workload, we can scale vertically or horizontally
 Default settings are only for getting started. Tune based on your use case.
– Thread pool size : Maximum Timer Driven Thread Count
– Timeouts: nifi.cluster.node.connection/read.timeout, nifi.zookeeper.connect/session.timeout
– Pluggable modules: WAL Provenance Repository
31
Development &
deployment
32
Let’s consider a simple data flow
Public class Project {
private int foo(int x) {
//do something with x -> y
return y;
}
public static void main(String [] args) {
//some code
try {
BufferedReader bufferRead = new
BufferedReader ...;
String data =bufferRead.readLine(
data = foo(data);
System.out.println(data);
} catch ...
}
}
33
Flow development = software development
 Integrated Development Environment
 Algorithm, code, instructions
 Functions (arguments, results)
 Libraries
 …
 Apache NiFi UI
 Flows, processors, funnels
 Process groups (input/out ports)
 Templates
 …
Programing language Apache NiFi
Design Code Test Deploy Monitor Evolve
34
General guideline
 Use separate environments
 No flows at root level: use a PG per
department, BL, project, etc
 Break your flow into process groups
 Use a naming convention, use comments
(labels, comments)
 Use variable when possible
 Organize your projects into three PGs:
ingestion, test & monitoring
 Performance, security and SLA
 Easy to secure, everything in NiFi is
attached to a PG, heritage d
 Easy to test, update & version
 Very useful for development/monitoring.
Ideally use unique names
 Promotion (dev > test > prod), update
 NiFi can generate data for TDD (Test
Driven Dev), can collect/parse logs for
BAM (Business App Monitoring)
Principle Benefits
35
NiFi organization example
36
NiFi organization example
37
Know & use NiFi Design Patterns
Fan IN/OUT
List & Fetch
Attributes
promotion
Extract, Update Attribute
Throttling
ControlRate, expiration
Funneling
RPG, RouteOnAttribute
Error loops
Relations, Counters
etc
38
FDLC: Flow Development LifeCycle
NiFi Cluster
Node 1 Node x
Tests cluster
Registry
..
NiFi Cluster
Node 1 Node x
Production cluster
Registry
..
NiFi Cluster
Node 1 Node x
Dev cluster
Registry
..
Dev Ops chain
NiFi CLI, NipyAPI, Jenkins
1- push
new flow
version
2- get new
version of
the flow 3 & 6 - automated testing / promotion
(naming convention, scheduling, variable
replacement)
4- push
new flow
version
5- get new
version of
the flow
7- push
new flow
version
39
Monitoring
40
NiFi Monitoring
 Is NiFi service running correctly?
 Monitor global system metrics such as
threads, JVM, disk, etc
 Monitor global flow metrics such as
number of flow files sent, received or
queued, processors stopped, etc
 Solutions
– NiFi UI
– Reporting tasks
– Ambari
– Grafana
 Are a particular flow running correctly?
 Monitor per application (flow, PG,
processor) metrics such as number of
flow files, data size, queues, back
pressure, etc
 Solution
– S2S Reporting tasks
– Custom flow developments (integrate
monitoring and reporting in the application
logic)
Service monitoring Applications (Flow) monitoring
41
NiFi UI for service monitoring
42
Tools available for service monitoring
 Bootstrap notifier: send notification when the NiFi starts,
stops or died unexpectedly
– Email/HTTP notification services
 Use reporting tasks: export metrics to your monitoring
solution
– AmbariReportingTask (global, process group)
– MonitorDiskUsage (Flowfile, content, provenance repositories)
– MonitorMemory
 Also, monitor inactivity
– NiFi has a built-in MonitorActivity processor
– To be used with the S2SBulletinReportingTask
– You can use InvokeHTTP to call the reporting Rest API
43
How to achieve granular monitoring?
 Integrate your monitoring logic in the
flow design
– Count data (lines, tables, events, etc)
– Extract business metadata from data (table
names, project name, source directory, etc)
 Handle different types of errors
(connection, format, schema, etc)
 Send extracted KPI to brokers,
dashboards, API, files, etc
Monitoring Driven Development (MDD)
 NIFInception: NiFi can export metrics to
to another cluster or to itself
 Bulletins provide information on errors
 Status provide metrics on usage
 These reports become data and we can
use the power of NiFi to extract our KPI
 Naming convention is key
S2S reporting tasks
44
PutHDFS UpdateAttribute
Monitoring Driven Development
PutHiveQL
PutElasticSearch
UpdateAttribute
…
…
45
NiFInception architecture
NiFi Cluster
Node 1 Node y
Production cluster
S2S Bulletins
S2S Status
Other applications
Monitoring
Alerting
• Generate status and bulletins
• Send them through S2S to a local
input port
• Ingest bulletin and status reports
• Parse reports (JSON) and extract
useful KPI
• Send KPI to a dashboard tool
(AMS, Elastic, etc)
• Use a broker for alerting (Kafka)
• Can ingest logs from other
systems or application
46
NiFInception architecture 2
NiFi Cluster
Node 1 Node x
Production cluster
NiFi Cluster
Node 1 Node y
Monitoring cluster
S2S Bulletins
S2S Status
Other applications
Monitoring
Alerting
• Ingest data
• Generate status and bulletins
• Send them through S2S to a
remote cluster
• Ingest bulletin and status reports
• Parse reports (JSON) and extract
useful KPI
• Send KPI to a dashboard tool
(AMS, Elastic, etc)
• Use a broker for alerting (Kafka)
• Can ingest logs from other
systems or application
47
Lessons learnt at
Renault
48
Recommendations from day to day life
 High availability means several weeks to validate before Go live
 Use Ranger authorizations instead of NIFI build in ACL management
 Too much « if then do else »: do the data flow and do not offload other processing
solutions
 S2S : Minimize the number of RPG and self-RPG. Use attributes and routing.
 Put hive via Two redundant processors
 Discuss with DBA to better handle impacts (Views VS Tables, Open Sockets ..etc).
 Backup flowfile.xml.gz (users.xml et autorizations.xml) using NiFi itself.
49
Recommendations from day to day life
 Success flag can be done trough counters instead of InvokHTTP
 In case of CIFS, do not forget to use AUTO MOUNT for NFS client side on NiFi Servers.
 Check ‘0’ size before transmitting into HDFS (especially for IoT use case).
 Configure a TIMER for when we use FAILURE redirections to avoid back-pressure
scenario.
 Build separate clusters for separate use cases (Real Time with SSD, Batch, etc …)
 CA PKI very useful for internal communications, no need to wait Security teams answers
but need security skills (SSL) .
50
Questions

Más contenido relacionado

La actualidad más candente

Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
DataWorks Summit
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 

La actualidad más candente (20)

Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 

Similar a Best practices and lessons learnt from Running Apache NiFi at Renault

network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
AssadLeo1
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
Mullaiselvan Mohan
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
 

Similar a Best practices and lessons learnt from Running Apache NiFi at Renault (20)

Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAP
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media Server
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Services
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
ION Costa Rica - About the IETF and How to Get Involved
ION Costa Rica - About the IETF and How to Get InvolvedION Costa Rica - About the IETF and How to Get Involved
ION Costa Rica - About the IETF and How to Get Involved
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Best practices and lessons learnt from Running Apache NiFi at Renault

  • 1. Best practices and lessons learnt from Running NiFi at Renault Kamelia Benchekroun- Big Data Architect Renault Abdelkrim Hadjidj - Solution Engineer Hortonworks
  • 2. 2 Agenda  The NiFi journey at at Renault  Best practices for running NiFi in production  Lessons learnt at Renault  Questions & answers
  • 3. 3 The NiFi journey at Renault
  • 4. 4 About Us Platform Design Deploy and Automate Capture and Analyze Raw Logs and Machine Data Improve Reliability Speed Operations Monitor and React Datalake Squad Activities The Datalake Squad: • A passionate Team who work really hard to deliver solutions and services and empower Data Initiatives at Renault Datalake Platform Metrics: 450 Connected Users 30 Data Initiatives and use cases 8000 Daily Queries 3500 Service Requests 300TB Efficient Storage
  • 5. 5 Data Lake at Renault – the beginning of the story CFT HDP Admin team DataLake Renault Node 1 …. Node 16 June 2016 - 10 Users - Requests by emails, 1 Admin - Manual provisioning NFS-GWExports
  • 7. 7 HDF came in (Q1 2017) Data Flow Platform Node 1 …. Node 5 HDF Admin team HDP Admin team DataLake Renault Node 1 …. Node 40 2016 - 10 Users - Requests by emails, 1 Admin - Manual provisioning 2017 - 200 users - Jira portal, Admins/devops - NiFi, Automation (Jenkins for HDP) Today - 300 – 500 active users - 7000 query per day - 40 data source in production
  • 8. 8 Transit Zone RAW Zone DATA SOURCES Data Lake Data Engineers (DL)NIFI Other Zones Data Ingestion Process Data + Triggers HDFS / KAFKA Mail to admin if critical isssues Technical metrics to AMS Other Zones Other Zones
  • 10. 10 Dev to prod testing /tmp/devp/${DESTINATION} S2S Dev Prod
  • 11. 11 NiFi Value for Renault 90% of data ingestion since2016 One Platform for all data sources : Files, DBs, API, Brokers, etc. Offload CFT 100 active data flows, + 2000 processors in production,. Accelerate time to insights, use case development and improve monitoring and governance Also used export data from Data Lake to other systems/sites/Cloud Enables new use cases connected plants, package tracking, IoT
  • 13. 13 Packaging Traceability  POC: 2500 Packages running in the Loop – 1 package lost = 800 € (= IPhone)  If the solution is generalized : 600k package  2016 Status – 400 K€ packaging re-investment due to packaging losses  2017 Status – 100 K€ cardboard in January  Test Expectations: – Reduce cardboard costs and packaging losses – Test LoRa technology in an industrial context and Renault activities – Validate operational added Value of LoRa technology UMCD Pitesti Meca UVD Pitesti (Dacia) Bursa Nissan Sunderland NMUK Maubeuge (MCA) Cross dock (Hungaria) Nissan Barcelone NMISA Douai
  • 14. 14 Example flow  Pitesti -> Barcelona (full) – 3 transports/week – 1 truck each time – Lead Time : 6 days  NMUK -> Pitesti (Empty) – once a week – 1 truck each time – Lead Time : 6 days – Cross Dock in Strančice (Tchek Rep.)
  • 15. 15 Use case scope NETWORK & SECURITY SENSORS IOT PLATEFORME OBJECT VISUALISATION MÉTIER INTELLIGENCE IOT PLATEFORME DATA VISUALISATION Connect Collect Innovate Drive Analyse Objenious Manual Treatment 2017 Objenious Manual Treatment S1 2018 Renault IT (DLK + DFP) Objenious S2 2018 Renault IT (DLK + DFP + IS integration + Analytics) Telecom WTB Renault IT (DLK + DFP + IS integration + Analytics) Sensor Not in Scope of Objenious offer
  • 16. 16 Renault Data Centers Access Control List or Certificates Architecture 1 Remote Process Group over Proxy POST HTTP LISTEN HTTP OUTPUT PORT (Passerelle)
  • 17. 17 Renault Data Centers Architecture 2 Remote Process Group over Proxy OUTPUT PORT MQTT AMQP Websocket… POST HTTPs LISTEN HTTPs Or Any other Operator (Worldwide) (Passerelle) IOT Platform - Device management/provisioning - Network & Security
  • 18. 18 Reduce cloud communication cost by 50% Tag Data Decoding with NiFi ExecuteScript Processor Raw Data out of sensor : ff1401aa Decoded Data out of sensor : [{"data":{"battery":3.6007,"temperature":20}}]
  • 19. 19 Architecture + Cloud ingestion Data Flow Platform Node 1 …. Node 3 HDF Admin team HDP Admin team DataLake Renault Node 1 …. Node 40 Data Flow Platform Node 1 Node 2 S2S outbound
  • 21. 21 Connected plants MiNiFi Source 1 Source 2 FTP MQTT Source 3 OPCUA Aggregation server Collect, processing, ingestion OPCUA JMS, Kafka, MQTT, WebSocket, * S2S (http, security, HA) HDFS, Hive, Solr, Hbase, etc JMS Other sources / protocols D A T A I N M O T I O N D A T A A T R E S T Facility Level Plant Level Corporate Level Governance &Integration Security Operations Data Access Data Management
  • 22. 22 Overview of Plant Assembly Factory UC
  • 23. 23 Sensors Full HDF use case (NiFi, Kafka and Storm)
  • 24. 24 Connected plants Site To Site Demo with MQTTool from Hand to Data Lake ! Handy, MQTT Broker, NiFi ConsumeMQTT to Kafka, NIFI consume Kafka to ElasticSearch and Hive…
  • 26. 26 Data Flow Platform Node 1 …. Node 3 HDF Admin team HDP Admin team DataLake Renault Node 1 …. Node 40 Data Flow Platform Node 1 Node 2 S2S outbound DFP Usine - ---------- HDF Flow Mgmt – X coeursN o d e 1 N o d e X …. DFP Usine - ---------- HDF Flow Mgmt – X coeursN o d e 1 N o d e X …. Data Flow Platform Node 1 Node 2 S2S Connected plants Architecture + Connected plants
  • 27. 27 Best practices for running NiFi in production
  • 29. 29 Typical NiFi Production Cluster Logical View
  • 30. 30 Sizing considerations  There are baselines for NiFi sizing but NiFi is like a “programming language” !!  NiFi resources usage depends on used processors but it’s always IO intensive – Use different disks / volumes for the three repositories : flow file, content & provenance – For content repository, it’s recommended to have multiple mount points – For content and provenance repositories, SSD can provide the best performances – Heap sizing depends on the use case and used processors – Depending on the workload, we can scale vertically or horizontally  Default settings are only for getting started. Tune based on your use case. – Thread pool size : Maximum Timer Driven Thread Count – Timeouts: nifi.cluster.node.connection/read.timeout, nifi.zookeeper.connect/session.timeout – Pluggable modules: WAL Provenance Repository
  • 32. 32 Let’s consider a simple data flow Public class Project { private int foo(int x) { //do something with x -> y return y; } public static void main(String [] args) { //some code try { BufferedReader bufferRead = new BufferedReader ...; String data =bufferRead.readLine( data = foo(data); System.out.println(data); } catch ... } }
  • 33. 33 Flow development = software development  Integrated Development Environment  Algorithm, code, instructions  Functions (arguments, results)  Libraries  …  Apache NiFi UI  Flows, processors, funnels  Process groups (input/out ports)  Templates  … Programing language Apache NiFi Design Code Test Deploy Monitor Evolve
  • 34. 34 General guideline  Use separate environments  No flows at root level: use a PG per department, BL, project, etc  Break your flow into process groups  Use a naming convention, use comments (labels, comments)  Use variable when possible  Organize your projects into three PGs: ingestion, test & monitoring  Performance, security and SLA  Easy to secure, everything in NiFi is attached to a PG, heritage d  Easy to test, update & version  Very useful for development/monitoring. Ideally use unique names  Promotion (dev > test > prod), update  NiFi can generate data for TDD (Test Driven Dev), can collect/parse logs for BAM (Business App Monitoring) Principle Benefits
  • 37. 37 Know & use NiFi Design Patterns Fan IN/OUT List & Fetch Attributes promotion Extract, Update Attribute Throttling ControlRate, expiration Funneling RPG, RouteOnAttribute Error loops Relations, Counters etc
  • 38. 38 FDLC: Flow Development LifeCycle NiFi Cluster Node 1 Node x Tests cluster Registry .. NiFi Cluster Node 1 Node x Production cluster Registry .. NiFi Cluster Node 1 Node x Dev cluster Registry .. Dev Ops chain NiFi CLI, NipyAPI, Jenkins 1- push new flow version 2- get new version of the flow 3 & 6 - automated testing / promotion (naming convention, scheduling, variable replacement) 4- push new flow version 5- get new version of the flow 7- push new flow version
  • 40. 40 NiFi Monitoring  Is NiFi service running correctly?  Monitor global system metrics such as threads, JVM, disk, etc  Monitor global flow metrics such as number of flow files sent, received or queued, processors stopped, etc  Solutions – NiFi UI – Reporting tasks – Ambari – Grafana  Are a particular flow running correctly?  Monitor per application (flow, PG, processor) metrics such as number of flow files, data size, queues, back pressure, etc  Solution – S2S Reporting tasks – Custom flow developments (integrate monitoring and reporting in the application logic) Service monitoring Applications (Flow) monitoring
  • 41. 41 NiFi UI for service monitoring
  • 42. 42 Tools available for service monitoring  Bootstrap notifier: send notification when the NiFi starts, stops or died unexpectedly – Email/HTTP notification services  Use reporting tasks: export metrics to your monitoring solution – AmbariReportingTask (global, process group) – MonitorDiskUsage (Flowfile, content, provenance repositories) – MonitorMemory  Also, monitor inactivity – NiFi has a built-in MonitorActivity processor – To be used with the S2SBulletinReportingTask – You can use InvokeHTTP to call the reporting Rest API
  • 43. 43 How to achieve granular monitoring?  Integrate your monitoring logic in the flow design – Count data (lines, tables, events, etc) – Extract business metadata from data (table names, project name, source directory, etc)  Handle different types of errors (connection, format, schema, etc)  Send extracted KPI to brokers, dashboards, API, files, etc Monitoring Driven Development (MDD)  NIFInception: NiFi can export metrics to to another cluster or to itself  Bulletins provide information on errors  Status provide metrics on usage  These reports become data and we can use the power of NiFi to extract our KPI  Naming convention is key S2S reporting tasks
  • 44. 44 PutHDFS UpdateAttribute Monitoring Driven Development PutHiveQL PutElasticSearch UpdateAttribute … …
  • 45. 45 NiFInception architecture NiFi Cluster Node 1 Node y Production cluster S2S Bulletins S2S Status Other applications Monitoring Alerting • Generate status and bulletins • Send them through S2S to a local input port • Ingest bulletin and status reports • Parse reports (JSON) and extract useful KPI • Send KPI to a dashboard tool (AMS, Elastic, etc) • Use a broker for alerting (Kafka) • Can ingest logs from other systems or application
  • 46. 46 NiFInception architecture 2 NiFi Cluster Node 1 Node x Production cluster NiFi Cluster Node 1 Node y Monitoring cluster S2S Bulletins S2S Status Other applications Monitoring Alerting • Ingest data • Generate status and bulletins • Send them through S2S to a remote cluster • Ingest bulletin and status reports • Parse reports (JSON) and extract useful KPI • Send KPI to a dashboard tool (AMS, Elastic, etc) • Use a broker for alerting (Kafka) • Can ingest logs from other systems or application
  • 48. 48 Recommendations from day to day life  High availability means several weeks to validate before Go live  Use Ranger authorizations instead of NIFI build in ACL management  Too much « if then do else »: do the data flow and do not offload other processing solutions  S2S : Minimize the number of RPG and self-RPG. Use attributes and routing.  Put hive via Two redundant processors  Discuss with DBA to better handle impacts (Views VS Tables, Open Sockets ..etc).  Backup flowfile.xml.gz (users.xml et autorizations.xml) using NiFi itself.
  • 49. 49 Recommendations from day to day life  Success flag can be done trough counters instead of InvokHTTP  In case of CIFS, do not forget to use AUTO MOUNT for NFS client side on NiFi Servers.  Check ‘0’ size before transmitting into HDFS (especially for IoT use case).  Configure a TIMER for when we use FAILURE redirections to avoid back-pressure scenario.  Build separate clusters for separate use cases (Real Time with SSD, Batch, etc …)  CA PKI very useful for internal communications, no need to wait Security teams answers but need security skills (SSL) .

Notas del editor

  1. Only files and DB via Sqoop && CFT
  2. MSG and API Ingestions, Use a dedicated servers and bandwith and separate from the compute, simplified FW rules, and better credentials administration, one point of monitoring…
  3. Expliquer la Zone de Transit Cas le plus courant : ORC file
  4. ${Destination} étant dans l’update attribut prévue avant le S2S. Bien que nos devellopeur aujourdhui l’utilisent tèrs rarement, les flux de capture étant réalisé par la SQUAD qui assure l’opérationel.
  5. CDC actif avec NIFI pour une source MySQL, Etude en cours pour Oracle et DB2
  6. if it’s foreseen to have millions of flow files being processed by NiFi at one point in time, the heap size should be adjusted accordingly. A common value is to set the heap size to 8GB and it is usually not recommended to exceed 16GB unless very specific processors are used requiring a lot of memory. Since there is no resource isolation in NiFi, it’s important not to set this value too high in order to ensure that sufficient resources are available for the operating system and the other running processes. It is usually recommended to set this value to 2 or 3 times the number of available cores on the NiFi host. Increasing the value to a higher number should only be performed if there is a need for it (NiFi UI shows that the maximum number of threads is constantly used), and if a strict monitoring of the host is performed to ensure correct CPU/load metrics. the "nifi.cluster.node.connection.timeout" and "nifi.cluster.node.read.timeout" properties in nifi.properties) are set to 5 seconds. This is fine for a "getting started" type of cluster. However, when processing large numbers of FlowFiles, the JVM's garbage collection can sometimes cause some fairly lengthy pauses. This could cause timeouts. It is recommended to increase that to 10-15 seconds or more. the "nifi.zookeeper.connect.timeout" and "nifi.zookeeper.session.timeout" default to 3 seconds, as this is what the ZooKeeper default is. However, in case ZooKeeper is very busy being used by multiple components it could create timeouts which would cause the Primary Node and Cluster Coordinator to change frequently. Changing the value to 10 seconds is recommended.
  7. If you need to success in your NiFi projects, you really need to consider NiFi flow like software development Your code or your algorithm is a set of instructions just like your flows are a set of processor. You can have if else statments with routing processor, you can have for or while loops with update attributes, you can have catch statment with error relations When you build your flow you divide it into process groups like you use functions in your code. This make your applications easier to understand, maintain, and debug You can have templates for repetitive things like libraries that you build and use accross your projects
  8. Controller services should be local within the Process Group, not in the root or parent process group. All Flows should be contained within a child process group For larger flows, develop several smaller flows that can be discretely tested, and then combine them into composites using input/output or S2S ports Use Variable Registry for attributes that vary between environments Updating only a part is useful for RT system where data flow continuosly
  9. Manque Variable