ApacheCon 2015 - A Stock Prediction System Using OSS

•

3 recomendaciones•743 vistas

Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud. This session is going to walk-through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the web interface that will consume data through a RESTful endpoint, real-time. The solution leverages in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms

Tecnología

A Stock Prediction System using
open-source software
Fred Melo
fmelo@pivotal.io
@fredmelo_br
William Markito
wmarkito@pivotal.io
@william_markito

It's all about DATA
Data Sources
Look for patterns
Prediction

Machine Learning is the answer
Neural Networks
Clustering Genetic Algorithms

Train with historical dataset
Apply model to the new input
Applying Machine Learning

Hard to add new data sources
Why?
Hard to scale
Why so hard?
Hard to make it real-time

HDFS
Data Lake
Store Analytics
Hard to change
Labor intensive
Inefﬁcient
No real-time information
ETL based
Data-source speciﬁc
Traditional models are reactive and static

HDFSData Lake
Expert System /
Machine Learning
In-Memory
Real-Time Data
Continuous Learning
Continuous Improvement
Continuous Adapting
Data Stream Pipeline
Multiple Data Sources
Real-Time Processing
Store Everything
Stream-based, real-time closed-loop analytics are needed

Info
Analysis
Look at past trends
(for similar input)
Evaluate current input
Score / Predict
Neural Network
How can it be addressed?

Info
Analysis
Filter
[ json ]
Neural Network
How can it be addressed?

Info
Analysis
Filter Enrich
Neural Network
How can it be addressed?

Info
Analysis
Neural Network
Filter Enrich Transform
How can it be addressed?

Info
Analysis
Filter Enrich Transform
Neural Network
How can it be addressed?

Info
Analysis
Filter Enrich Transform
Transform
Neural Network
How can it be addressed?

Neural Network
In-Memory Data Grid
Real-time
scoring
How can it be addressed?
Train

Neural Network
In-Memory Data Grid
Front-end
Update
Push
How can it be addressed?

Ingest Transform Sink
SpringXD
Store / Analyze
Fast Data
Distributed Computing
Predict / Machine Learning
Other Sources and
Destinations
JMS
Streaming real-time analytics architecture

Transform Sink
SpringXD
Extensible
Open-Source
Fault-Tolerant
Horizontally Scalable
HTTP
Machine Learning
Fast Data
Filter
Predict Sink
HTTP
Split
Dashboard
Push
Demo Architecture

SpringXD
shell - R
Transformer
geode-json
client
geode-json
client
http-client
http-server
obj-to-json
splitter
splitter
Simulator
tap

SpringXD
INGEST / SINK PROCESS ANALYZE
•  Little or no coding required
•  Dozens of built-in connectors
•  Seamless integration with Kafka,
Sqoop
•  Create new connectors easily
using Spring
•  Call Spark, Reactor or RxJava
•  Built-in configurable filtering,
splitting and transformation
•  Out-of-box configurable jobs for
batch processing
•  Import and invoke PMML jobs
easily
•  Call Python, R, Madlib and other
tools
•  Built-in configurable counters and
gauges
Data Stream Pipelining

SpringXD
XD NodesXD NodesXD NodesXD Nodes
Ingest
SpringXD
Split Filter Transform Sink
XD admin
XD Nodes
Ingest Split Filter Transform Sink
Stream
Deployment
Messaging
Scale-Out and HA Architecture

medium
avg (x+1)
relative
strength (x)
medium avg (x)
price(x)
Neural Network

SpringXD
http://projectgeode.org
http://projects.spring.io/spring-xd
http://www.r-project.org

Follow-up: In-Memory Unconference 
"A place for all things in-memory: projects, people, ideas, roadmaps, discussions." 
Location: Hill Country A/B” 
Weds 4:15pm - 6pm. (after this talk)
The demo code is on GitHub!
@fredmelo_br
@william_markito

Más contenido relacionado

La actualidad más candente

Sparkler Presentation for Spark Summit East 2017Karanjeet Singh

Pandas UDF: Scalable Analysis with Python and PySparkLi Jin

Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov

Big Data Meets Learning Science: Keynote by Al EssaSpark Summit

Improving Python and Spark (PySpark) Performance and InteroperabilityWes McKinney

When Apache Spark Meets TiDB with Xiaoyu MaDatabricks

Sherlock: an anomaly detection service on top of Druid DataWorks Summit

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit

Apache Druid 101Data Con LA

Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit

Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation

PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"Wes McKinney

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation

Data Warehousing with Spark Streaming at ZalandoDatabricks

Uber's data science workbenchRan Wei

Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit

Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark Summit

Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks

Spark Summit EU talk by Pat PattersonSpark Summit

Migrating from Closed to Open Source - Fonda Ingram & Ken SanfordSri Ambati

La actualidad más candente (20)

Sparkler Presentation for Spark Summit East 2017

Pandas UDF: Scalable Analysis with Python and PySpark

Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...

Big Data Meets Learning Science: Keynote by Al Essa

Improving Python and Spark (PySpark) Performance and Interoperability

When Apache Spark Meets TiDB with Xiaoyu Ma

Sherlock: an anomaly detection service on top of Druid

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...

Apache Druid 101

Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache

PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...

Data Warehousing with Spark Streaming at Zalando

Uber's data science workbench

Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...

Spark and Online Analytics: Spark Summit East talky by Shubham Chopra

Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...

Spark Summit EU talk by Pat Patterson

Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford

Similar a ApacheCon 2015 - A Stock Prediction System Using OSS

IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Me...In-Memory Computing Summit

Integration Patterns for Big Data ApplicationsMichael Häusler

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely

SnappyData at Spark Summit 2017Jags Ramnarayan

SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData

Big Data Applications Made Easy: Fact Or Fiction?Glenn Renfro

Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov

Building Custom Big Data IntegrationsPat Patterson

Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel

Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz

My Master's ThesisHumoyun Ahmedov

The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera

WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...Sriskandarajah Suhothayan

Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent

SnappyData Toronto Meetup Nov 2017SnappyData

Data Architectures for Robust Decision MakingGwen (Chen) Shapira

Jethro data meetup index base sql on hadoop - oct-2014Eli Singer

xGem Data Stream ProcessingJorge Hirtz

InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData

Similar a ApacheCon 2015 - A Stock Prediction System Using OSS (20)

IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Me...

Integration Patterns for Big Data Applications

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...

SnappyData at Spark Summit 2017

SnappyData, the Spark Database. A unified cluster for streaming, transactions...

Big Data Applications Made Easy: Fact Or Fiction?

Spark Based Distributed Deep Learning Framework For Big Data Applications

Building Custom Big Data Integrations

Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure

Self-Service Data Ingestion Using NiFi, StreamSets & Kafka

My Master's Thesis

The Rise of Streaming SQL and Evolution of Streaming Applications

WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...

Big Data Simplified - Is all about Ab'strakSHeN

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset

SnappyData Toronto Meetup Nov 2017

Data Architectures for Robust Decision Making

Jethro data meetup index base sql on hadoop - oct-2014

xGem Data Stream Processing

InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Scaling API-first – The story of a global engineering organizationRadu Cotescu

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

🐬 The future of MySQL is Postgres 🐘RTylerCroy

ApacheCon 2015 - A Stock Prediction System Using OSS

2. A Stock Prediction System using open-source software Fred Melo fmelo@pivotal.io @fredmelo_br William Markito wmarkito@pivotal.io @william_markito

4. It's all about DATA Data Sources Look for patterns Prediction

7. Machine Learning is the answer Neural Networks Clustering Genetic Algorithms

8. Train with historical dataset Apply model to the new input Applying Machine Learning

9. Hard to add new data sources Why? Hard to scale Why so hard? Hard to make it real-time

10. HDFS Data Lake Store Analytics Hard to change Labor intensive Inefﬁcient No real-time information ETL based Data-source speciﬁc Traditional models are reactive and static

11. HDFSData Lake Expert System / Machine Learning In-Memory Real-Time Data Continuous Learning Continuous Improvement Continuous Adapting Data Stream Pipeline Multiple Data Sources Real-Time Processing Store Everything Stream-based, real-time closed-loop analytics are needed

12. Info Analysis Look at past trends (for similar input) Evaluate current input Score / Predict Neural Network How can it be addressed?

13. Info Analysis Filter [ json ] Neural Network How can it be addressed?

14. Info Analysis Filter Enrich Neural Network How can it be addressed?

15. Info Analysis Neural Network Filter Enrich Transform How can it be addressed?

16. Info Analysis Filter Enrich Transform Neural Network How can it be addressed?

17. Info Analysis Filter Enrich Transform Transform Neural Network How can it be addressed?

18. Neural Network In-Memory Data Grid Real-time scoring How can it be addressed? Train

19. Neural Network In-Memory Data Grid Front-end Update Push How can it be addressed?

20. Ingest Transform Sink SpringXD Store / Analyze Fast Data Distributed Computing Predict / Machine Learning Other Sources and Destinations JMS Streaming real-time analytics architecture

21. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable HTTP Machine Learning Fast Data Filter Predict Sink HTTP Split Dashboard Push Demo Architecture

22. SpringXD shell - R Transformer geode-json client geode-json client http-client http-server obj-to-json splitter splitter Simulator tap

23. SpringXD INGEST / SINK PROCESS ANALYZE •  Little or no coding required •  Dozens of built-in connectors •  Seamless integration with Kafka, Sqoop •  Create new connectors easily using Spring •  Call Spark, Reactor or RxJava •  Built-in configurable filtering, splitting and transformation •  Out-of-box configurable jobs for batch processing •  Import and invoke PMML jobs easily •  Call Python, R, Madlib and other tools •  Built-in configurable counters and gauges Data Stream Pipelining

24. SpringXD XD NodesXD NodesXD NodesXD Nodes Ingest SpringXD Split Filter Transform Sink XD admin XD Nodes Ingest Split Filter Transform Sink Stream Deployment Messaging Scale-Out and HA Architecture

25. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable HTTP Machine Learning Fast Data Filter Predict Sink HTTP Split Dashboard Push Demo Architecture

26. Geode client-server architecture

27. Partitioned Regions

28. Event handling

29. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable HTTP Machine Learning Fast Data Filter Predict Sink HTTP Split Dashboard Push Demo Architecture

30. Neural Networks

31. Neural Networks

32. medium avg (x+1) relative strength (x) medium avg (x) price(x) Neural Network

33. Neural Network

34. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable HTTP Machine Learning Fast Data Filter Predict Sink HTTP Split Dashboard Push Demo Architecture

35.

36. Demo Time

37. SpringXD shell - R Transformer geode-json client geode-json client http-client http-server obj-to-json splitter splitter Simulator tap

38. SpringXD http://projectgeode.org http://projects.spring.io/spring-xd http://www.r-project.org

39.

40. Follow-up: In-Memory Unconference  "A place for all things in-memory: projects, people, ideas, roadmaps, discussions."  Location: Hill Country A/B”  Weds 4:15pm - 6pm. (after this talk) The demo code is on GitHub! @fredmelo_br @william_markito

ApacheCon 2015 - A Stock Prediction System Using OSS

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a ApacheCon 2015 - A Stock Prediction System Using OSS

Similar a ApacheCon 2015 - A Stock Prediction System Using OSS (20)

Último

Último (20)

ApacheCon 2015 - A Stock Prediction System Using OSS