Más contenido relacionado La actualidad más candente (20) Similar a Connecting the Drops with Apache NiFi & Apache MiNiFi (20) Más de DataWorks Summit (20) Connecting the Drops with Apache NiFi & Apache MiNiFi2. © Hortonworks Inc. 2011 – 2016. All Rights Reserved2
Agenda
Apache NiFi Fundamentals
Expanding the Reach of NiFi with Apache NiFi - MiNiFi
Evolving the NiFi Ecosystem
Apache NiFi Registry
Community
3. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Empower users to manage the
collection and flow of data
4. © Hortonworks Inc. 2011 – 2016. All Rights Reserved4
The Problem at Hand
Producers A.K.A Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
5. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving data effectively is hard
Standards: http://xkcd.com/927/
6. © Hortonworks Inc. 2011 – 2016. All Rights Reserved6
Apache NiFi: A Primer
Key Features and Principles
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-grained
history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
7. © Hortonworks Inc. 2011 – 2016. All Rights Reserved7
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
8. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi & Data Agnosticism
NiFi is data agnostic!
But, NiFi was designed understanding that users
can care about specifics and provides tooling
to interact with specific formats, protocols, etc.
ISO 8601 - http://xkcd.com/1179/
Robustness principle
Be conservative in what you do,
be liberal in what you accept from others“
11. © Hortonworks Inc. 2011 – 2016. All Rights Reserved11
Apache NiFi - MiNiFi
Let me get the key parts of NiFi close to where data begins
Bidirectional data transfer
Greater illuminate journey with provenance
NiFi lives in the data center. Give it an enterprise server or a cluster of
them.
MiNiFi lives as close to where data is born and is a guest on that device or
system
12. © Hortonworks Inc. 2011 – 2016. All Rights Reserved12
Apache NiFi - MiNiFi
Limited computing capability
Limited power/network
Restricted software library/platform availability
No UI
Physically inaccessible
Not frequently updated
Competing standards/protocols
Scalability
Privacy & Security
Realities of computing outside the cozy datacenter
13. © Hortonworks Inc. 2011 – 2016. All Rights Reserved13
Apache NiFi - MiNiFi: Scoping
Go small: Java – Write once, run anywhere*
– Feature parity and reuse of core NiFi libraries
Go smaller: C++ – Write once**, run anywhere
Go smallest: Write n-many times, embed, run anywhere
Language libraries to support tagging, FlowFile format, Site to Site protocol, and
provenance generation without a full processing framework
– Language SDKs, Mobile Platforms
Provide all the key principles of NiFi in varying, smaller footprints
14. © Hortonworks Inc. 2011 – 2016. All Rights Reserved14
Apache NiFi - MiNiFi: The Differences
No UI / Declarative configuration
– Supports YAML
– Extensible interface to ingest other formats
Reduced set of bundled components
Minimize initial size
Departures from NiFi
15. © Hortonworks Inc. 2011 – 2016. All Rights Reserved15
Apache NiFi - MiNiFi: Centralized Command & Control (C2)
Provide flow updates, information and assets to instances where they live
Act as a gateway to/from network enclaves
Provide a user interface/experience for design & deploy and monitoring
Extend the reach of user experience and operations
16. © Hortonworks Inc. 2011 – 2016. All Rights Reserved16
Connecting the Drops
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
17. © Hortonworks Inc. 2011 – 2016. All Rights Reserved17
Managing data flow for a courier service
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
NiFi NiFi NiFi NiFi NiFi NiFi
Client
Libraries
19. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Listening to our community
How can I … How do I ... What about ...
Version my flows?
Drive CI/CD processes?
Migrate flows between environments?
Provision distributions of NiFi with a set of components?
Make reference datasets/extensions available to the entirety of my data
flow?
Certify / Audit / Sign-off on flows as compliant per regulations?
20. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capturing the essence of a flow in your organization
The n-dimensions of data flow
Consider a flowfile to be a singular event at a given juncture in its processing
A flow is the directed graph of processing at a given point in time
With each component’s:
Configuration
Version
Referenced Assets
28. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Registry is an enabler
SDLC
Manage variables, sensitive properties for environments
Extension Registry
Association/tagging of data with the flow that created it
Enhanced Command and Control of MiNiFi instances
29. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Evolution of Apache NiFi
Our core substrate for data flow is NiFi & MiNiFi
Command and Control facilitates operations and management of components
Registry for common tasks with disparate resources across the NiFi ecosystem
30. © Hortonworks Inc. 2011 – 2016. All Rights Reserved30
Why the Apache NiFi Ecosystem?
Moving data is multifaceted in its challenges and these are present in different contexts
at varying scopes
Provide components and a platform with common tooling and extensions that are
commonly needed but be flexible for extension in all aspects
– Allow organizations to integrate with their existing infrastructure
Empower folks managing your infrastructure to make changes and reason about issues
that are occurring
– Data Provenance to show context and data’s journey
– User Interface/Experience a key component
32. © Hortonworks Inc. 2011 – 2016. All Rights Reserved32
Apache NiFi Crash Course
Wednesday, 14 June
11:00 AM – 1:30PM, Room LL21A
• Learn more about NiFi, the community, and work through a hands-on lab
• Seats available on a first come, first served basis
• Make sure you are in possession of the latest version of VirtualBox
33. © Hortonworks Inc. 2011 – 2016. All Rights Reserved33
Learn, Share at Birds of a Feather
IOT, STREAMING & DATA FLOW
Thursday, June 15
5:50 pm, Ballroom C
34. © Hortonworks Inc. 2011 – 2016. All Rights Reserved34
Learn more and join us!
Project Sites:
NiFi: https://nifi.apache.org
Subproject MiNiFi: https://nifi.apache.org/minifi/
Subproject Registry: http://nifi.apache.org/registry.html
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
https://issues.apache.org/jira/browse/MINIFI
Follow us on Twitter
@apachenifi