Más contenido relacionado La actualidad más candente (20) Similar a Using Apache Flink with Amazon Kinesis (ANT395) - AWS re:Invent 2018 (20) Más de Amazon Web Services (20) Using Apache Flink with Amazon Kinesis (ANT395) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using Apache Flink with Amazon
Kinesis
Greg Finch
Senior Product Manager
John Deere
A N T 3 9 5
Ryan Nienhuis
Senior Technical Product Manager
AWS, Amazon Kinesis
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Streaming and Amazon Kinesis overview
New Capability: Amazon Kinesis Data Analytics for Java
Streaming data at John Deere
Architectural choices in streaming data
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is streaming data?
Low-latencyContinuous Ordered,
incremental
High volume
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming with Amazon Kinesis
Easily collect, process, and analyze video and data streams in real time
Capture, process,
and store video
streams
Load data streams
into AWS data
stores
Analyze data
streams in real time
Capture, process,
and store data
streams
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams overview
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data processing from a variety of consumers
Fully managed service for real-time processing of streaming data
Cost-effective: $0.014 per 1,000,000 PUT Payload Units
Millions of sources
producing 100’s of
terabytes per hour
Amazon Web Services
Front
End
AZ AZ AZAuthentic
authorization
Durable, highly consistent storage replicas data
across three data centers (availability zones)
Ordered stream of
events supports
multiple readers
Amazon Kinesis
Client Library
on EC2
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Analytics
AWS Lambda
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Flink
Framework and distributed engine for stateful processing of data
streams.
Simple
programming
High performance
Stateful
Processing
Strong data
integrity
Easy to use and
flexible APIs make
building apps fast
In-memory
computing provides
low latency & high
throughput
Durable
application state
saves
Exactly-once
processing and
consistent state
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do you build a Flink application?
Streaming operators are applied to data streams in a pipeline
Source
Sink
DataStream
KeyedDataStream
DataStream
Sink
keyBy,
window
filter
apply
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What does your code look like?
DataStream <GameEvent> rawEvents = env.addSource(
New KinesisStreamSource(“input_events”));
DataStream <UserPerLevel> gameStream =
rawEvents.map(event - > new UserPerLevel(event.gameMetadata.gameId,
event.gameMetadata.levelId,event.userId));
gameStream.keyBy(event -> event.gameId)
.keyBy(1)
.window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
.apply(...) - > {...};
gameStream.addSink(new KinesisStreamSink("myGameStateStream"));
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
12. © 2018 Deere & Company, All rights reserved.
About John Deere
13. © 2018 Deere & Company, All rights reserved.
Sophisticated machines produce massive data streams
14. © 2018 Deere & Company, All rights reserved.
Machine Sync - real-time multi-machine coordination
15. © 2018 Deere & Company, All rights reserved.
Remote monitoring and adjustments
16. © 2018 Deere & Company, All rights reserved.
Operations Center
17. © 2018 Deere & Company, All rights reserved.
John Deere Data Platform
Processing millions
of sensor
measurements per
second.
Serving more than
one billion field
maps.
Supports monitoring,
tracking,
dashboarding, and
deep analysis
applications.
18. © 2018 Deere & Company, All rights reserved.
A simple solution for many applications
19. © 2018 Deere & Company, All rights reserved.
Managing state can get complicated
20. © 2018 Deere & Company, All rights reserved.
Keeping up: Over-sharding the stream
21. © 2018 Deere & Company, All rights reserved.
Keeping up: Fan-out
22. © 2018 Deere & Company, All rights reserved.
System complexity increases with use case complexity
23. © 2018 Deere & Company, All rights reserved.
Shifting to Apache Flink
24. © 2018 Deere & Company, All rights reserved.
An example
Source
Sessions
Stream
Source
Sensors
Stream
Map
Decode
Sessions
Map
Decode
Sensors
Join
Session &
Sensors
Window
Aggregate
Totals
Flat Map
Compute
Tile Keys
Window
Rasterize
Sink
Tiles to
S3
Sink
Totals to
DynamoDB
Key by
Key by
High
Parallelism
25. © 2018 Deere & Company, All rights reserved.
Powerful solution for sophisticated applications
26. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ryan Nienhuis
Greg Finch
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.