Scaling to Infinity - Open Source meets Big Data

•

5 recomendaciones•1,861 vistas

John Hammink's Talk at Great Wide Open 2016. We discuss: 1.) the need for data analytics infrastructure that can scale exponentially and 2.) what such an infrastructure must contain and finally 3.) the need for an infrastructure to be able to handle un - and semi-structured data.

Tecnología

Dealing with
Unstructured Data
Scaling to Inﬁnity
Image: Boykung/Shutterstock

Copyright ©2014 Treasure Data. All Rights Reserved.
Results Push
Results Push
SQL
Big Data Simplified: One ApproachAppServers
Multi-structured Events
• register
• login
• start_event
• purchase
• etc
SQL-based
Ad-hoc Queries
SQL-based Dashboards
DBs & Data Marts
Other Apps
Results Push
Familiar &
Table-oriented
Infinite & Economical
Cloud Data Store
✓App log data
✓Mobile event data
✓Sensor data
✓Telemetry
Mobile SDKs
Web SDK
Multi-structured Events
Multi-structured Events
Multi-structured Events
Multi-structured Events
Agent
Agent
Agent
Agent Agent
Agent
Agent
Agent
Embedded SDKs
Server-side Agents

Copyright ©2014 Treasure Data. All Rights Reserved.
What is the point of all this data?
BI
Business
Intelligence
Using Very Large
Sets of Data

Copyright ©2015 Treasure Data. All Rights
Reserved.
Service Launched
Series A Funding
100 Customers
Selected by Gartner as
Cool Vendor in Big Data
10 Trillion
Records
5 Trillion Records
Treasure Data By the Numbers (Jan-2015):
13T+ records of data imported since launch
500K+ records imported each second
1.5 Trillion+ records imported each month
12B records sent per day by one customer
13 Trillion Records
Series B Funding
Data Records Stored in the Treasure Data Cloud Service
0
3500000000000
7000000000000
10500000000000
14000000000000
Aug-12 Oct-12 Dec-12 Feb-13 Apr-13 Jun-13 Aug-13 Oct-13 Dec-13 Feb-14 Apr-14 Jun-14 Aug-14 Oct-14 Dec-14
8
Last 2 years

Statistics
Total Records
Stored
25
Trillion
Managed &
Supported
24 * 7 *
365
Uptime
99.99%
New Records /
second
1
Million Daily Twitter
volume
100x
1 0 1 1 0
0 0 1 0 1
1 1 0 0
0 0 1
24 / 7

A solution?
• There are trade-offs to consider
• Any trade off should make it easy to collect data
• Easy does it! un- and semi-structured data (multi-
structured data)
• Open source means it’s free; also means that you need
someone on hand to maintain and implement
• Cloud storage means you don’t have to scale and/or
shard; tradeoff means performance hit against bare metal
Image: John Hammink

Images: Lightspring/Shutterstock, John Hammink, Treasure Data
There are a few intro to
Data Science blogs at
blog.treasuredata.com!

Open vs. Closed source
Image: Heather Craig/Shutterstock

Images: PC World, Data-Hive, Wallpapersmela
or
or
?

# logs from a file
<source>
type tail
path /var/log/
httpd.log
format apache2
tag web.access
</source>
# logs from client
libraries
<source>
type forward
port 24224
</source>
# store logs to ES and
HDFS
<match *.*>
type copy
<store>
type elasticsearch
logstash_format

Multi- structured data
• un-structured data
better for data for
ultimate use in
statistics

an open-source bulk data loader that helps data
transfer between various databases, storages, ﬁle
formats, and cloud services
embulk.org/docs

Hivemall
Hivemall is a scalable machine learning library that
runs on Apache Hive.
Hivemall is designed to be scalable to the number
of training instances as well as the number of
training features.
• Classification
• Regression
• Recommendation
• k-nearest neighbor
• Anomaly Detection
• Feature Engineering
https://github.com/myui/hivemall

The Hadoop Story on MongoDB
Image courtesy of Steven Francia @ Docker

Más contenido relacionado

La actualidad más candente

Apache Druid is a high-performance distributed analytics store for modern analytics applications. It supports ingesting millions of events per second and sub-second query processing. Druid supports various types of data sources for ingestion, including Apache Kafka. You can immediately query on stream events once they get ingested into Druid. Since Kafka provides scalable and robust data delivery while Druid supports advanced complex analysis on streams, Kafka and Druid are widely used together for BI and operational analytics use cases, which require interactivity, scalability, real-time, and performance. This talk is based on our real-world experiences building out streaming analytics stacks powering production use cases across many industries.

Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...

HostedbyConfluent

An Intro to Elasticsearch and Kibana

ObjectRocket

Using Premium Data - for Business Analysts

Lynn Langit

Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Chris Schalk

Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an Apache 2.0 licensed open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.

Building Data Pipelines with Spark and StreamSets

Pat Patterson

Data Pipline Observability meetup

Omid Vahdaty

Your data layer - Choosing the right database solutions for the future

ObjectRocket

Open source data ingestion

Treasure Data, Inc.

Exploring BigData with Google BigQuery

Dharmesh Vaya

Cloud Big Data Architectures

Lynn Langit

Building Scalable Big Data Pipelines

Christian Gügi

Lambda architecture for real time big data

Trieu Nguyen

Managed Cluster Services

Adam Doyle

Benjamin Hopp (Solutions Architect) @ Imply: Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”-style) on large data sets. This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit. Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics. Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack. The most important contributor to a fast analytical setup is getting the data model right. The talk will center around various choices you can make to prepare your data to get best possible query performance. We’ll look at some general best practices to model your data before ingestion such as OLAP dimensional modeling (called “roll-up” in Druid), data partitioning, and tips for choosing column types and indexes. We’ll also look at how more can be less: often, storing copies of your data partitioned, sorted, or aggregated in different ways can speed up queries by reducing the amount of computation needed. We’ll also look at Druid-specific optimizations that take advantage of approximations; where you can trade accuracy for performance and reduced storage. You’ll get introduced to Druid’s features for approximate counting, set operations, ranking, quantiles, and more. And we will finish with the latest and greatest Druid news, including details about the latest roadmap and releases.

A Day in the Life of a Druid Implementor and Druid's Roadmap

Itai Yaffe

What is support_engineer_in_treasuredata

Treasure Data, Inc.

DataStax Enterprise in Practice (Field Notes)

DataStax

SQL To NoSQL - Top 6 Questions Before Making The Move

IBM Cloud Data Services

As big data and data warehousing scale-up and move into the cloud, they’re increasingly likely to be delivered as services using distributed cloud query engines such as Google BigQuery, loaded using streaming data pipelines and queried using BI tools such as Looker. In this session the presenter will walk through how data modelling and query processing works when storing petabytes of customer event-level activity in a distributed data store and query engine like BigQuery, how data ingestion and processing works in an always-on streaming data pipeline, how additional services such as Google Natural Language API can be used to classify for sentiment and extract entity nouns from incoming unstructured data, and how BI tools such as Looker and Google Data Studio bring data discovery and business metadata layers to cloud big data analytics

Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...

Rittman Analytics

How BigQuery broke my heart

Gabriel Hamilton

La actualidad más candente (19)

Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...

An Intro to Elasticsearch and Kibana

Using Premium Data - for Business Analysts

Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Building Data Pipelines with Spark and StreamSets

Data Pipline Observability meetup

Your data layer - Choosing the right database solutions for the future

Open source data ingestion

Exploring BigData with Google BigQuery

Cloud Big Data Architectures

Building Scalable Big Data Pipelines

Lambda architecture for real time big data

Managed Cluster Services

A Day in the Life of a Druid Implementor and Druid's Roadmap

What is support_engineer_in_treasuredata

DataStax Enterprise in Practice (Field Notes)

SQL To NoSQL - Top 6 Questions Before Making The Move

Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...

How BigQuery broke my heart

Destacado

Using Embulk at Treasure Data

Treasure Data, Inc.

Introduction to New features and Use cases of Hivemall

Treasure Data, Inc.

Scalable Hadoop in the cloud

Treasure Data, Inc.

Keynote - Fluentd meetup v14

Treasure Data, Inc.

How to make your open source project MATTER Let’s face it: most open source projects die. “For every Rails, Docker and React, there are thousands of projects that never take off. They die in the lonely corners of GitHub, only to be discovered by bots scanning for SSH private keys. Over the last 5 years, I worked on and off on marketing a piece of infrastructure middleware called Fluentd. We tried many things to ensure that it did not die: From speaking at events, speaking to strangers, giving away stickers, making people install Fluentd on their laptop. Most everything I tried had a small, incremental effect, but there were several initiatives/hacks that raised Fluentd’s awareness to the next level. As I listed up these “ideas that worked”, I noticed the common thread: they all brought Fluentd into a new ecosystem via packaging.”

Packaging Ecosystems -Monki Gras 2017

Treasure Data, Inc.

Calibre Mining (CXB.V) Feb 2012

Ryan King

Aplicaciones existentes en turismo para nuestro negocio. Redes sociales. Proy...

Rumora00

TrueMail Marketing Media Kit Multi-channel

Bobbi White

Visual Design Samples

Grace Sinaga

20150301 6 - bono auxiliares, ampliaciión y proyecto de ley incentivo al re...

Cristian Cortés Rodríguez

Join our program that provides executives from different organizations with a refocusing time-out. You will reflect on personal values, competencies and potential and will develop strategies to consciously integrate these personal capacities into your daily leadership actions in order to further develop your authentic leadership style. Accompanied by two of our top executive coaches all participants will be sharing cases out of their daily practice to reflect on challenges and to benefit from each other’s lessons. For more information pls. contact us per email: info@sinemuris.com

Leadership Workshop: Explore your authentic leadership style! feb- jun 2014 i...

Antje Croseck

Anmeldeformular Aussteller HRnetworx EXPO 2010

Meike Heidorn v. Koschitzky

Global mba

EOI Escuela de Organización Industrial

Examen parcial

Greciitha

Taller nuevo

nathaliiia3

Darubay May- June Issue

darro5info

Averho Edicion Video Dia 1

Antonio Seoane Nolasco

L. Schmid: GEMA 7 siegel

Musikland Niedersachsen

Publicidad turistica

anuska_gomez

Actualidades 2009

Emilio Salomon IT Advicer

Destacado (20)

Using Embulk at Treasure Data

Introduction to New features and Use cases of Hivemall

Scalable Hadoop in the cloud

Keynote - Fluentd meetup v14

Packaging Ecosystems -Monki Gras 2017

Calibre Mining (CXB.V) Feb 2012

Aplicaciones existentes en turismo para nuestro negocio. Redes sociales. Proy...

TrueMail Marketing Media Kit Multi-channel

Visual Design Samples

20150301 6 - bono auxiliares, ampliaciión y proyecto de ley incentivo al re...

Leadership Workshop: Explore your authentic leadership style! feb- jun 2014 i...

Anmeldeformular Aussteller HRnetworx EXPO 2010

Global mba

Examen parcial

Taller nuevo

Darubay May- June Issue

Averho Edicion Video Dia 1

L. Schmid: GEMA 7 siegel

Publicidad turistica

Actualidades 2009

Similar a Scaling to Infinity - Open Source meets Big Data

Data Lakehouse, Data Mesh, and Data Fabric (r1)

James Serra

Microsoft Azure Big Data Analytics

Mark Kromer

So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.

Data Lakehouse, Data Mesh, and Data Fabric (r2)

James Serra

Data Vault 2.0: Big Data Meets Data Warehousing

All Things Open

The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.

Data Lake Overview

James Serra

The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures. The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years. This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture. Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.

Fundamentals Big Data and AI Architecture

Guido Schmutz

datavault2.pptx

Mounika662749

Big Data Analytics in the Cloud with Microsoft Azure

Mark Kromer

Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.

Reliable Data Intestion in BigData / IoT

Guido Schmutz

Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS

Amazon Web Services LATAM

ADV Slides: Building and Growing Organizational Analytics with Data Lakes

DATAVERSITY

Building IoT and Big Data Solutions on Azure

Ido Flatow

The Briefing Room with Dr. Robin Bloor and WebAction Live Webcast on July 23, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=360d371d3a49ad256942f55350aa0a8b The waiting used to be the hardest part, but not anymore. Today’s cutting-edge enterprises can seize opportunities faster than ever, thanks to an array of technologies that enable real-time responsiveness across the spectrum of business processes. Early adopters are solving critical business challenges by enabling the rapid-fire design, development and production of very specific applications. Functionality can range from improved customer engagement to dynamic machine-to-machine interactions. Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, who will tout a new era in data-driven organizations, and why a data flow architecture will soon be critical for industry leaders. He’ll be briefed by Sami Akbay of WebAction, who will showcase his company’s real-time data management platform, which combines all the component parts needed to access, process and leverage data big and small. He’ll explain how this new approach can provide game-changing power to organizations of all types and sizes. Visit InsideAnlaysis.com for more information.

Take Action: The New Reality of Data-Driven Business

Inside Analysis

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...

Databricks

Big data and the cloud are perfect partners for companies who want to unlock maximum value from all of their unstructured, semi-structured, and structured data. The challenge has been how to create and manage a reliable end-to-end solution that spans data ingestion, storage and analysis in the face of the volume, velocity and variety of big data sources. In this webinar, we will show you how to achieve big data bliss by combining StreamSets Data Collector, which specializes in creating and running complex any-to-any dataflows, with Microsoft's Azure Data Lake and Azure analytic solutions. We will walk through an example of how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform and Azure Data Lake to take advantage of analytics tools with unprecedented scale and performance.

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Streamsets Inc.

The Briefing Room with Rick van der Lans and Think Big, a Teradata Company Live Webcast on June 16, 2015 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises. Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments. Visit InsideAnalysis.com for more information.

The Maturity Model: Taking the Growing Pains Out of Hadoop

Inside Analysis

When you received your Uber ‘Tuesday Evening Ride Receipt’ or Spotify’s ‘This Week’s New Music’ email, did you think about how they got there? SendGrid’s reliable email platform delivers each month over 20 Billion transactional and marketing emails on behalf of many of your favorite brands, including Uber, Airbnb, Spotify, Foursquare and NextDoor. SendGrid was looking to evolve its data warehouse architecture in order to improve decision making and optimize customer experience. They needed a scalable and reliable architecture that would allow them to move nimbly and efficiently with a relatively small IT organization, while supporting the needs of both business and technical users at SendGrid. SendGrid’s Director of Enterprise Data Operations will be joining architects from Amazon Web Services (AWS) and Informatica to discuss SendGrid’s journey to a hybrid cloud architecture and how a hybrid data warehousing solution is optimized to support SendGrid’s analytics initiative. Speakers will also review common technologies and use cases being deployed in hybrid cloud today, common data management challenges in hybrid cloud and best practices for addressing these challenges. Join us to learn: • How to evolve to a hybrid data warehouse with Amazon Redshift for scalability, agility and cost efficiency with minimal IT resources • Hybrid cloud data management use cases • Best practices for addressing hybrid cloud data management challenges

SendGrid Improves Email Delivery with Hybrid Data Warehousing

Amazon Web Services

When you’re building the next killer mobile app, how can you ensure that your app is both stable and capable of near-instant data updates? The answer: Build a backend! Siva Katir says that there’s much more to building a backend than standing up a SQL server in your datacenter and calling it a day. Since different types of apps demand different backend services, how do you know what sort of backend you need? And, more importantly, how can you ensure that your backend scales so you can survive an explosion of users when you are featured in the app store? Siva discusses the common scenarios facing mobile app developers looking to expand beyond just the device. He’ll share best practices learned while building the PlayFab and other companies’ backends. Join Siva to learn how you can ensure that your app can scale safely and affordably into the millions of concurrent users and across multiple platforms.

Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

TechWell

Big Data in Azure

DataWorks Summit/Hadoop Summit

How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)

Moacyr Passador

Similar a Scaling to Infinity - Open Source meets Big Data (20)

Data Lakehouse, Data Mesh, and Data Fabric (r1)

Microsoft Azure Big Data Analytics

Data Lakehouse, Data Mesh, and Data Fabric (r2)

Data Vault 2.0: Big Data Meets Data Warehousing

Data Lake Overview

Fundamentals Big Data and AI Architecture

datavault2.pptx

Big Data Analytics in the Cloud with Microsoft Azure

Reliable Data Intestion in BigData / IoT

Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS

ADV Slides: Building and Growing Organizational Analytics with Data Lakes

Building IoT and Big Data Solutions on Azure

Take Action: The New Reality of Data-Driven Business

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

The Maturity Model: Taking the Growing Pains Out of Hadoop

SendGrid Improves Email Delivery with Hybrid Data Warehousing

Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Big Data in Azure

How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)

Más de Treasure Data, Inc.

The new GDPR regulation went into effect on May 25th. While a majority of conversations have revolved around the security and IT aspects of the law, marketing teams will play a crucial role in helping organizations meet GDPR standards and playing a strategic role across the organization . Join us to learn more, engage with your peers and get prepared. This webinar will cover: - How complying with the GDPR will drive better marketing and raise the standard of the quality of your customer engagement - The GDPR elements marketers must know about - The elements of PII that will be affected and what marketers need to do about it - A deep dive on how GDPR regulations will affect your marketing channels - email, programmatic advertising, cold calls, etc. - Tactical marketing updates needed to meet GDPR guidelines

GDPR: A Practical Guide for Marketers

Treasure Data, Inc.

With AR and VR technologies, it’s the first time that data collection has been part of the front-end strategy vs back-end process. As companies compete to create new, interactive experiences, data is the tool of choice to measure all aspects of player engagement and marketing effectiveness. In this webinar, two industry experts, Nicolas Nadeau and Andrew Mayer, will talk about the trends driving AR and VR markets today, and what data-driven approaches companies need to think about to compete in these markets tomorrow.

AR and VR by the Numbers: A Data First Approach to the Technology and Market

Treasure Data, Inc.

An overview of Customer Data Platforms (CDP) with the industry leader who coined the term, David Raab. Find out how to use Live Customer Data to create a better customer experience and how Live Data Management can give you a competitive edge with a 360 degree view of your clients. Learn: - The definition and requirements for Customer Data Platforms - The differences between Customer Data Platforms and comparative technologies such as Data Warehousing and Marketing Automation - Reference architectures/approaches to building CDP - How Treasure Data is used to build Customer Data Platforms And here's the song: https://youtu.be/RalMozVq55A

Introduction to Customer Data Platforms

Treasure Data, Inc.

Hands On: Javascript SDK

Treasure Data, Inc.

Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

Treasure Data, Inc.

Gaming companies with multiple products often struggle to calculate accurate Customer Lifetime Value (CLTV) across their portfolio. This is because user data is often analyzed in silos so companies are unable to get a clear picture of ROI and CLTV across platforms, devices and apps. In this webinar we’ll look at how you can apply a holistic and complete approach to your CLTV and ROI through the lens of gaming companies, though this technique is applicable for any company who has products spanning platforms. We’ll also explore: How the integral power of data in business has shifted over the past 10 years. Discover the current technologies and processes used to analyze data across different platforms by combining multiple data streams, looking at examples in brand and portfolio-based LTV. How to process and centralize dozens of varying data streams. Nicolas Nadeau will speak from his extensive experience and show how leveraging data from multiple product strategies spanning many platforms can be highly beneficial for your company.

Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps

Treasure Data, Inc.

Do you know what your top ten 'happy' customers look like? Would you like to find ten more just like them? Come learn how to leverage 1st & 3rd party data to map your customer journey and drive users down a path where every interaction is personalized, fun, & data-driven. No more detractors, power your Customer Experience with data! In this webinar you will learn: -When, why, and how to leverage 1st, 2nd, and 3rd party data -Tips & Tricks for marketers to become more data driven when launching their campaigns -Why all marketers needs a 360 degree customer view

How to Power Your Customer Experience with Data

Treasure Data, Inc.

The reality is virtual, but successful VR games still require cold, hard data. For wildly popular games like Survios’ Raw Data, the first VR-exclusive game to reach #1 on Steam’s Global Top Sellers list, data and analytics are the key to success. And now online gaming companies have the full-stack analytics infrastructure and tools to measure every aspect of a virtual reality game and its ecosystem in real time. You can keep tabs on lag, which ruins a VR experience, improve gameplay and identify issues before they become showstoppers, and create fully personalized, completely immersive experiences that blow minds and boost adoption, and more. All with the right tools. Make success a reality: Register now for our latest interactive VB Live event, where we’ll tap top experts in the industry to share insights into turning data into winning VR games. Attendees will: * Understand the role of VR in online gaming * Find out how VR company Survios successfully leverages the Exostatic analytics infrastructure for commercial and gaming success * Discover how to deploy full-stack analytics infrastructure and tools Speakers: Nicolas Nadeau, President, Exostatic Kiyoto Tamura, VP Marketing, Treasure Data Ben Solganik, Producer, Survios Stewart Rogers, Director of Marketing Technology, VentureBeat Wendy Schuchart, Moderator, VentureBeat

Why Your VR Game is Virtually Useless Without Data

Treasure Data, Inc.

Connecting the Customer Data Dots

Treasure Data, Inc.

As big data has exploded, the ability for companies to easily leverage it has imploded. Organizations are drowning in their own information, unable to see the forest through the trees, while the big players consistently outperform in their ability to deliver a great customer experience, faster, cheaper…As a result, the vast majority of companies are scrambling to catch up and become more agile, data-driven, to use their data more effectively so they can attract and retain their elusive customers... In this joint deck by 451 Research and Treasure Data, you will learn how to enable your line of business team to own their own data (instead of relying on IT) to be able to: - deliver a single, persistent view of your customer based on behavior data - make that data accessible to the right people at the right time - Increase organizational effectiveness by (finally) breaking down silos with data - enable powerful marketing tools to enhance the customer experience

Harnessing Data for Better Customer Experience and Company Success

Treasure Data, Inc.

* 행사 정보 :2016년 10월 14일 MARU180 에서 진행된 '데이터야 놀자' 1day 컨퍼런스 발표 자료 * 발표자 : Dylan Ko (고영혁) Data Scientist / Data Architect at Treasure Data * 발표 내용 - 데이터사이언티스트 고영혁 소개 - Treasure Data (트레저데이터) 소개 - 데이터로 돈 버는 글로벌 사례 #1 >> MUJI : 전통적 리테일에서 데이터 기반 O2O - 데이터로 돈 버는 글로벌 사례 #2 >> WISH : 개인화&자동화를 통한 쇼핑 최적화 - 데이터로 돈 버는 글로벌 사례 #3 >> Oisix : 머신러닝으로 이탈고객 예측&방지 - 데이터로 돈 버는 글로벌 사례 #4 >> 워너브로스 : 프로세스 자동화로 시간과 돈 절약 - 데이터로 돈 버는 글로벌 사례 #5 >> Dentsu 등의 애드테크(Adtech) 회사들 - 데이터로 돈을 벌고자 할 때 반드시 체크해야 하는 것

글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)

Treasure Data, Inc.

Treasure Data: Move your data from MySQL to Redshift with (not much more tha...

Treasure Data, Inc.

Unifying Events and Logs into the Cloud

Treasure Data, Inc.

Fluentd and Docker - running fluentd within a docker container

Treasure Data, Inc.

Augmenting Mongo DB with Treasure Data

Treasure Data, Inc.

Fluentd and Docker - running fluentd within a docker container

Treasure Data, Inc.

Fluentd - Unified logging layer

Treasure Data, Inc.

Insight Data Engineering: Open source data ingestion

Treasure Data, Inc.

Partner webinar presentation aws pebble_treasure_data

Treasure Data, Inc.

Introduction to Hivemall

Treasure Data, Inc.

Más de Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers

AR and VR by the Numbers: A Data First Approach to the Technology and Market

Introduction to Customer Data Platforms

Hands On: Javascript SDK

Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps

How to Power Your Customer Experience with Data

Why Your VR Game is Virtually Useless Without Data

Connecting the Customer Data Dots

Harnessing Data for Better Customer Experience and Company Success

글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)

Treasure Data: Move your data from MySQL to Redshift with (not much more tha...

Unifying Events and Logs into the Cloud

Fluentd and Docker - running fluentd within a docker container

Augmenting Mongo DB with Treasure Data

Fluentd and Docker - running fluentd within a docker container

Fluentd - Unified logging layer

Insight Data Engineering: Open source data ingestion

Partner webinar presentation aws pebble_treasure_data

Introduction to Hivemall

Último

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Mcleodganj Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Mcleodganj Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Mcleodganj Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

Discover the innovative features and strategic vision that keep WSO2 an industry leader. Explore the exciting 2024 roadmap of WSO2 API management, showcasing innovations, unified APIM/APK control plane, natural language API interaction, and cloud native agility. Discover how open source solutions, microservices architecture, and cloud native technologies unlock seamless API management in today's dynamic landscapes. Leave with a clear blueprint to revolutionize your API journey and achieve industry success!

WSO2's API Vision: Unifying Control, Empowering Developers

WSO2

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Six Myths about Ontologies: The Basics of Formal Ontology

johnbeverley2021

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

Exploring Multimodal Embeddings with Milvus

Zilliz

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

DBX First Quarter 2024 Investor Presentation

Dropbox

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Bhuvaneswari Subramani

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Scaling to Infinity - Open Source meets Big Data

1. Dealing with Unstructured Data Scaling to Inﬁnity Image: Boykung/Shutterstock

2. Image: John Hammink

4. There are many sources of information

5. Copyright ©2014 Treasure Data. All Rights Reserved. Results Push Results Push SQL Big Data Simplified: One ApproachAppServers Multi-structured Events • register • login • start_event • purchase • etc SQL-based Ad-hoc Queries SQL-based Dashboards DBs & Data Marts Other Apps Results Push Familiar & Table-oriented Infinite & Economical Cloud Data Store ✓App log data ✓Mobile event data ✓Sensor data ✓Telemetry Mobile SDKs Web SDK Multi-structured Events Multi-structured Events Multi-structured Events Multi-structured Events Agent Agent Agent Agent Agent Agent Agent Agent Embedded SDKs Server-side Agents

8. Copyright ©2015 Treasure Data. All Rights Reserved. Service Launched Series A Funding 100 Customers Selected by Gartner as Cool Vendor in Big Data 10 Trillion Records 5 Trillion Records Treasure Data By the Numbers (Jan-2015): 13T+ records of data imported since launch 500K+ records imported each second 1.5 Trillion+ records imported each month 12B records sent per day by one customer 13 Trillion Records Series B Funding Data Records Stored in the Treasure Data Cloud Service 0 3500000000000 7000000000000 10500000000000 14000000000000 Aug-12 Oct-12 Dec-12 Feb-13 Apr-13 Jun-13 Aug-13 Oct-13 Dec-13 Feb-14 Apr-14 Jun-14 Aug-14 Oct-14 Dec-14 8 Last 2 years

9. Statistics Total Records Stored 25 Trillion Managed & Supported 24 * 7 * 365 Uptime 99.99% New Records / second 1 Million Daily Twitter volume 100x 1 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 1 24 / 7

10. A solution? • There are trade-offs to consider • Any trade off should make it easy to collect data • Easy does it! un- and semi-structured data (multi- structured data) • Open source means it’s free; also means that you need someone on hand to maintain and implement • Cloud storage means you don’t have to scale and/or shard; tradeoff means performance hit against bare metal Image: John Hammink

11. Image: Dreamstime

12. Images: Lightspring/Shutterstock, John Hammink, Treasure Data There are a few intro to Data Science blogs at blog.treasuredata.com!

13. What does a pipeline need?

14. Open vs. Closed source Image: Heather Craig/Shutterstock

15. Images: PC World, Data-Hive, Wallpapersmela or or ?

16. LAMBDA ARCHITECTURE

17. # logs from a file <source> type tail path /var/log/ httpd.log format apache2 tag web.access </source> # logs from client libraries <source> type forward port 24224 </source> # store logs to ES and HDFS <match *.*> type copy <store> type elasticsearch logstash_format

18. LESS SIMPLE FORWARDING

19. Before fluentd

20. Multi- structured data • un-structured data better for data for ultimate use in statistics

21. fluentd! http://www.ﬂuentd.org/

22. http://msgpack.org/

23. an open-source bulk data loader that helps data transfer between various databases, storages, ﬁle formats, and cloud services embulk.org/docs

24.

25.

26. Hivemall Hivemall is a scalable machine learning library that runs on Apache Hive. Hivemall is designed to be scalable to the number of training instances as well as the number of training features. • Classification • Regression • Recommendation • k-nearest neighbor • Anomaly Detection • Feature Engineering https://github.com/myui/hivemall

27. The Hadoop Story on MongoDB Image courtesy of Steven Francia @ Docker

28. Questions?

Scaling to Infinity - Open Source meets Big Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (20)

Similar a Scaling to Infinity - Open Source meets Big Data

Similar a Scaling to Infinity - Open Source meets Big Data (20)

Más de Treasure Data, Inc.

Más de Treasure Data, Inc. (20)

Último

Último (20)

Scaling to Infinity - Open Source meets Big Data