http://www.bigdataspain.org/2014/conference/analytics-to-the-masses
In this talk, we will discuss our approach to bring large scale deep analytics to the masses. R is an extremely popular numerical computer environment, but scientific data processing frequently hits its memory limits.
On the other hand, system to execute data intensive tasks like Hadoop or Stratosphere are not popular among R users because writing programs using these paradigms is cumbersome.
https://www.youtube.com/watch?v=o5ulDWr7zWg
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014
1. BIG DATA ANALYTICS TO THE MASSES
JOSE LUIS LÓPEZ PINO
DATA ENGINEER GETYOURGUIDE
2. Big Data Analytics
to the masses
Why it has failed and how we can fix it
Jose Luis Lopez Pino
3. Who am I?
BI Consultant
Large-Scale & Distributed
Founding
Data Engineer
4. Big Data is like Tourism
But if you aren’t an expert,
you can’t make the most of it
It seems easy to do
5. Struggle to analyze Big Data
Harlan Harris, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data
Scientists and Their Work. O’Reilly Media, Inc., 2013
Also: Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Enterprise data analysis and
visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions
6. Tools
Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era.
Proceedings of the VLDB Endowment, 7(13), 2014
7. Tools (October 2014)
Original: Volker Markl. Breaking the chains: On declarative data analysis and data independence in the
big data era. Proceedings of the VLDB Endowment, 7(13), 2014
11. Say it with memes!
When you do
Deep analytics in small data
using R and CRAN packages
When you do
deep analytics in BIG data
using R and CRAN packages
12. When you try to program it
using MapReduce
When you try to program it
using Apache Spark /
Apache Flink
When you try to use a library
scalable to large data sets
13. Can’t we do it better?
- Make it similar to normal R
programs.
- Hide complexity.
- Make file manipulation easier.
- Part of the computing in the
cluster and part of the
computer in the client.
21. Some relevant findings
- Transmission time was not significant.
- Stratosphere/Flink was competitive in highly
iterative programs.
- We were not able to do it keeping the code
100% the same.
- Ensemble scenarios are the most exciting
ones.
22. 4 Takeaways from this talk
- We still need to bring Big Data to the right
people in the right place.
- We need comprehensive libraries.
- We need to move data back and forth.
- Use a syntax that the users are familiar with.
23. That’s all!
- Have you found this talk interesting?
- Follow me: @jllopezpino
- Interested in a job as SEM Data Analyst
(Berlin)?
- Ask me for the details:
- Are you interested in Data + Energy?
- Keep in touch: