SlideShare a Scribd company logo
1 of 44
Download to read offline
NIFI DEVELOPER GUIDE
Presenter Deon Huang
2017/7/7
Agenda
• NiFi REST API
• NiFi In Depth
• NiFi developer Guide
• Custom Processor
• Contribution Sharing
NiFi REST API
• The Rest API provides programmatic access to command and control a
NiFi instance in real time.
• Start and stop processors, monitor queues, query provenance data, and
more.
NiFi REST API
What happen?
NiFi REST API
We’ve send a REST request to NiFi instance
NiFi REST API
Request URL
Component ID
Request body we actually send
NiFi REST API
• Every component in NiFi actually has a unique ID.
• Every operation to component is actually REST request to NiFi instance.
• Most of operation need to specify component ID
• https:// /nifi-api/process-groups
/015d1045-0b88-1db2-da38-cb71ac006792/process-groups
NiFi Instance URL
REST API Usage
REST Path
Unique Component ID
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO
NiFi REST API
• RevisionDTO – indentify component version view to client
ProcessGroupDTO – Component body of ProcessGroup
PositionDTO – Position in canvas
• All DTO, Entity are provided.
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-client-dto</artifactId>
<version>1.1.2</version>
</dependency>
REST API Recap
• Every component in NiFi actually has a unique ID.
• Every operation to component is actually REST request to NiFi instance.
• Most of operation need to specify component ID
NiFi in Depth
• Repositories
• Life of FlowFile
FlowFile Mechanism in Depth
NiFi Architecture
NiFi Architecture
Attribute
1. HashMap in JVM
2. WAL in FlowFile Repository
Content
Immutable in disk
NiFi in Depth
• FlowFile are the heart of NiFi and its flow-based design.
• A FlowFile is a data record, Consist of a pointer to its content, attributes
and associated with provenance events
• Attribute are key/value pairs act as metadata for the FlowFile
• Content is the actual data of the file
• Provenance is a record of what has happened to the FlowFile
NiFi in Depth
• Repository are immutable.
• The benefits of this are many, including: substantial reduction in storage
space required for the typical complex graphs of processing, natural
replay capability, takes advantage of OS caching, reduces random
read/write performance hits, and is easy to reason over.
• All three repositories actually directories on local storage to persist data.
NiFi in Depth
• The FlowFile repository contains metadata for all current FlowFiles in the
flow
• The Content Repository holds the content for current and past FlowFiles
• The Provenance Repository holds the history of FlowFiles
NiFi in Depth
• FlowFiles are held in Map in JVM memory
• FlowFile metadata include
- Attributes
- A pointer to the actual contet of FlowFile
- State (Which Connection/Queue belonged in)
• FlowFile Repository act as NiFi’s “Write-Ahead Log”
• Each change happens as a transactional unit of work
NiFi in Depth
• NiFi recover a FlowFile by restoring a snapshot of the FlowFile
• A snapshot is automatically taken periodically by the system
• Compute a new base checkpoint by serializing FlowFile map into disk
with filename ‘.partial’
• Step by Step WAL in NiFi
https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-
Ahead+Log+Implementation
Content Repository
• Largest Repositories, utilize immutability and copy-on-write to maximize
speed and thread-safety
• Resource Claims are Java objects that point to specific files on disk
• The FlowFile has a “Content Claim” object
- a reference to Resource Claims
- offset of content within the file
- length of the content
Provenance Repository
• History of each FlowFile, provide Data Lineage (Chain of Custody)
• When a provenance event is created, it copies all the FlowFile’s
attributes and content pointer and stat to one location in the
Provenance Repo
• Provenance Repository design decisions
https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance
+Repository+Design
Provenance Repository
• Provenance Event
-CLONE
-ATTIBUTES_MODIFIED
-CONTENT_MODIFIED
-CREATE
-DROP
-EXPIRE
-FORK
-JOIN
-ROUTE
…
Repositories Recap
• The FlowFile repository contains metadata for all current FlowFiles in the
flow
• The Content Repository holds the content for current and past FlowFiles
• The Provenance Repository holds the history of FlowFiles
• Best practice
- Analyze contents of FlowFile as few times as possible
- Extract key information into attributes
- Update FlowFile repository is much faster than content repository
Life of FlowFile
• Data Ingress → Pass by Reference → Copy-On-Write → Data Egress
• Important aspect of flow-based programming is the resource-
constrained relationships between the black boxes.
• Route from one processor to another simply by passing a reference to
FlowFile
Pass by Reference
Funnels
Copy On Write
Update Attribute
Data Egress
• Eventually FlowFile will be “DROPPED”, no longer processing and is
available for deletion.
• Remains in the FlowFile repository until next repository checkpoint. (24
hours default) release all old content claims.
• Periodically, The Content Repo ask the Resource Claim Manager which
Resource Claims can be cleaned up.
Developer Guide
• Processor
• Reporting Task
• ControllerService
• FlowFilePrioritizer
• AuthorityProvider
Supporting API
• ProcessSession
• ProcessContext
• PropertyDesciptor
• Validator
• ValidationContext
• PropertyValue
• RelationShip
• StateManager
• ComponentLog
Proceesor Life Cycle
• Processor Initialization →
• Exposing Processor’s Relationships →
• Exposing Processor Properties →
• Validating Processor Properties →
• Triggered and Performing the Work →
• ProcessSeesion finish
Component Life Cycle
• @OnAdded →
• @OnEnabled →
• @OnRemoved →
• @OnScheduled →
• @OnUnscheduled →
• @OnStopped →
• @OnShutdown
Common Processor Patterns
• Data Ingress
• Data Egress
• Route Based on Content
• Route Based on Attribute
• Split Content
• Update Attributes Based on Content
• Enrich Modify Content
Error Handling
• ProcessException or other Exception means it is known failure
and roll back session
• Don’t catch general Exceptions, Throwable.
• Penalization vs Yielding
Session rollback
• ProcessSession provide transactionality
• Call commit() or rollback() to end session.
• Best practice is to keep simplicity
Testing
• NiFi provide mock framework for Processor testing.
Use TestRunner interface
• 1-AddControllerService if needed
runner.addControllerService()
• 2-Set Property Value
Map<String, String> attributes
attributes.put(‘property name’, ‘property value’);
• 3-Enqueue FlowFiles
runner.enqueuer(“Select ….”.getBytes(),attributes);
• 4-Run the processor
runner.run();
runner.assertAllFlowFilesTransferred(Success,1);
Recap Developer guide
• Understand life cycle of Processor
• Understand supporting component API
• Understand processor general pattern
• Understand how to handle process failure
• Understand how to test processor
Contribution preparation
• NiFi Contributor Guide
https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide
• Git Feature Branch Workflow
https://www.atlassian.com/git/tutorials/comparing-workflows
• How to Write a Git Commit Message
https://chris.beams.io/posts/git-commit/
Contribution feedback
• Don’t produce trailing whitespace
• GitHub Pull request procedure
• Commit title start with NIFI-2829
• Open Source Ci fail all the time, Don’t panic.
• Keep patient and humble for reviewers feedback.
Contribution feedback
• While dealing with Time Zone problem.
We should consider building in different time zone.
• In java 1.8, there is standard library provide great support to dealing
with Time issue in Java.
https://docs.oracle.com/javase/8/docs/api/java/time/package-
summary.html
https://magiclen.org/java-8-date-time-api/
Reference
• Official Apache NiFi
https://nifi.apache.org/
• All Micron nifi instance
http://nifi.micron.com/
• Hortonworks forum

More Related Content

What's hot

BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 

What's hot (20)

Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
 
kafka
kafkakafka
kafka
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 

Similar to NiFi Developer Guide

Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
VMware Tanzu
 
The Need For Speed - NEBytes
The Need For Speed - NEBytesThe Need For Speed - NEBytes
The Need For Speed - NEBytes
Phil Pursglove
 
The Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen CambridgeThe Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen Cambridge
Phil Pursglove
 
Extending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and FiltersExtending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and Filters
WSO2
 

Similar to NiFi Developer Guide (20)

Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
NiFi - First approach
NiFi - First approachNiFi - First approach
NiFi - First approach
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Apache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachApache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop Approach
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
The Need For Speed - NEBytes
The Need For Speed - NEBytesThe Need For Speed - NEBytes
The Need For Speed - NEBytes
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Coherence sig-nfr-web-tier-scaling-using-coherence-web
Coherence sig-nfr-web-tier-scaling-using-coherence-webCoherence sig-nfr-web-tier-scaling-using-coherence-web
Coherence sig-nfr-web-tier-scaling-using-coherence-web
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
The Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen CambridgeThe Need For Speed - NxtGen Cambridge
The Need For Speed - NxtGen Cambridge
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Extending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and FiltersExtending the WSO2 Governance Registry with Handlers and Filters
Extending the WSO2 Governance Registry with Handlers and Filters
 
What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0
 
Afs manager
Afs managerAfs manager
Afs manager
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 

NiFi Developer Guide

  • 1. NIFI DEVELOPER GUIDE Presenter Deon Huang 2017/7/7
  • 2. Agenda • NiFi REST API • NiFi In Depth • NiFi developer Guide • Custom Processor • Contribution Sharing
  • 3. NiFi REST API • The Rest API provides programmatic access to command and control a NiFi instance in real time. • Start and stop processors, monitor queues, query provenance data, and more.
  • 5. NiFi REST API We’ve send a REST request to NiFi instance
  • 6. NiFi REST API Request URL Component ID Request body we actually send
  • 7. NiFi REST API • Every component in NiFi actually has a unique ID. • Every operation to component is actually REST request to NiFi instance. • Most of operation need to specify component ID • https:// /nifi-api/process-groups /015d1045-0b88-1db2-da38-cb71ac006792/process-groups NiFi Instance URL REST API Usage REST Path Unique Component ID
  • 8. NiFi REST API • RevisionDTO
  • 9. NiFi REST API • RevisionDTO
  • 10. NiFi REST API • RevisionDTO
  • 11. NiFi REST API • RevisionDTO
  • 12. NiFi REST API • RevisionDTO – indentify component version view to client ProcessGroupDTO – Component body of ProcessGroup PositionDTO – Position in canvas • All DTO, Entity are provided. <dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-client-dto</artifactId> <version>1.1.2</version> </dependency>
  • 13. REST API Recap • Every component in NiFi actually has a unique ID. • Every operation to component is actually REST request to NiFi instance. • Most of operation need to specify component ID
  • 14. NiFi in Depth • Repositories • Life of FlowFile FlowFile Mechanism in Depth
  • 16. NiFi Architecture Attribute 1. HashMap in JVM 2. WAL in FlowFile Repository Content Immutable in disk
  • 17. NiFi in Depth • FlowFile are the heart of NiFi and its flow-based design. • A FlowFile is a data record, Consist of a pointer to its content, attributes and associated with provenance events • Attribute are key/value pairs act as metadata for the FlowFile • Content is the actual data of the file • Provenance is a record of what has happened to the FlowFile
  • 18. NiFi in Depth • Repository are immutable. • The benefits of this are many, including: substantial reduction in storage space required for the typical complex graphs of processing, natural replay capability, takes advantage of OS caching, reduces random read/write performance hits, and is easy to reason over. • All three repositories actually directories on local storage to persist data.
  • 19. NiFi in Depth • The FlowFile repository contains metadata for all current FlowFiles in the flow • The Content Repository holds the content for current and past FlowFiles • The Provenance Repository holds the history of FlowFiles
  • 20. NiFi in Depth • FlowFiles are held in Map in JVM memory • FlowFile metadata include - Attributes - A pointer to the actual contet of FlowFile - State (Which Connection/Queue belonged in) • FlowFile Repository act as NiFi’s “Write-Ahead Log” • Each change happens as a transactional unit of work
  • 21. NiFi in Depth • NiFi recover a FlowFile by restoring a snapshot of the FlowFile • A snapshot is automatically taken periodically by the system • Compute a new base checkpoint by serializing FlowFile map into disk with filename ‘.partial’ • Step by Step WAL in NiFi https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write- Ahead+Log+Implementation
  • 22. Content Repository • Largest Repositories, utilize immutability and copy-on-write to maximize speed and thread-safety • Resource Claims are Java objects that point to specific files on disk • The FlowFile has a “Content Claim” object - a reference to Resource Claims - offset of content within the file - length of the content
  • 23. Provenance Repository • History of each FlowFile, provide Data Lineage (Chain of Custody) • When a provenance event is created, it copies all the FlowFile’s attributes and content pointer and stat to one location in the Provenance Repo • Provenance Repository design decisions https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance +Repository+Design
  • 24. Provenance Repository • Provenance Event -CLONE -ATTIBUTES_MODIFIED -CONTENT_MODIFIED -CREATE -DROP -EXPIRE -FORK -JOIN -ROUTE …
  • 25. Repositories Recap • The FlowFile repository contains metadata for all current FlowFiles in the flow • The Content Repository holds the content for current and past FlowFiles • The Provenance Repository holds the history of FlowFiles • Best practice - Analyze contents of FlowFile as few times as possible - Extract key information into attributes - Update FlowFile repository is much faster than content repository
  • 26. Life of FlowFile • Data Ingress → Pass by Reference → Copy-On-Write → Data Egress • Important aspect of flow-based programming is the resource- constrained relationships between the black boxes. • Route from one processor to another simply by passing a reference to FlowFile
  • 31. Data Egress • Eventually FlowFile will be “DROPPED”, no longer processing and is available for deletion. • Remains in the FlowFile repository until next repository checkpoint. (24 hours default) release all old content claims. • Periodically, The Content Repo ask the Resource Claim Manager which Resource Claims can be cleaned up.
  • 32. Developer Guide • Processor • Reporting Task • ControllerService • FlowFilePrioritizer • AuthorityProvider
  • 33. Supporting API • ProcessSession • ProcessContext • PropertyDesciptor • Validator • ValidationContext • PropertyValue • RelationShip • StateManager • ComponentLog
  • 34. Proceesor Life Cycle • Processor Initialization → • Exposing Processor’s Relationships → • Exposing Processor Properties → • Validating Processor Properties → • Triggered and Performing the Work → • ProcessSeesion finish
  • 35. Component Life Cycle • @OnAdded → • @OnEnabled → • @OnRemoved → • @OnScheduled → • @OnUnscheduled → • @OnStopped → • @OnShutdown
  • 36. Common Processor Patterns • Data Ingress • Data Egress • Route Based on Content • Route Based on Attribute • Split Content • Update Attributes Based on Content • Enrich Modify Content
  • 37. Error Handling • ProcessException or other Exception means it is known failure and roll back session • Don’t catch general Exceptions, Throwable. • Penalization vs Yielding
  • 38. Session rollback • ProcessSession provide transactionality • Call commit() or rollback() to end session. • Best practice is to keep simplicity
  • 39. Testing • NiFi provide mock framework for Processor testing. Use TestRunner interface • 1-AddControllerService if needed runner.addControllerService() • 2-Set Property Value Map<String, String> attributes attributes.put(‘property name’, ‘property value’); • 3-Enqueue FlowFiles runner.enqueuer(“Select ….”.getBytes(),attributes); • 4-Run the processor runner.run(); runner.assertAllFlowFilesTransferred(Success,1);
  • 40. Recap Developer guide • Understand life cycle of Processor • Understand supporting component API • Understand processor general pattern • Understand how to handle process failure • Understand how to test processor
  • 41. Contribution preparation • NiFi Contributor Guide https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide • Git Feature Branch Workflow https://www.atlassian.com/git/tutorials/comparing-workflows • How to Write a Git Commit Message https://chris.beams.io/posts/git-commit/
  • 42. Contribution feedback • Don’t produce trailing whitespace • GitHub Pull request procedure • Commit title start with NIFI-2829 • Open Source Ci fail all the time, Don’t panic. • Keep patient and humble for reviewers feedback.
  • 43. Contribution feedback • While dealing with Time Zone problem. We should consider building in different time zone. • In java 1.8, there is standard library provide great support to dealing with Time issue in Java. https://docs.oracle.com/javase/8/docs/api/java/time/package- summary.html https://magiclen.org/java-8-date-time-api/
  • 44. Reference • Official Apache NiFi https://nifi.apache.org/ • All Micron nifi instance http://nifi.micron.com/ • Hortonworks forum