Building Audience Analytics Platform

•

1 like•768 views

This presentation presents the common challenges in building an analytics platform (audience platform is chosen as the use case) and provides a few guidelines and recommendations on how to address them. The presentation starts with motivating the need for such a platform and the components that make it up. It then provides common design options for these components and suggests alternatives for them. The presentation concludes with a design proposal that is being evaluated for the audience platform in Inmobi.

Technology

Building Audience Analytics Platform
Jothi Padmanabhan
Inmobi
6-Sep-2014

Motivation
➔Audience Analytics platform is extremely critical
➔Segmentation
➔Rule Based
➔Inferred based on Sciences Modeling
➔Third Party
➔Targeting
➔Maximize CTR and CVR

Challenges
➔Scale
➔Billions of Ad requests/day, Peak 25K rps, 800M
Users
➔ Multiple Input Sources and Types
➔Fact Data, Dimension Data
➔ Multiple Consumers
➔Reporting, Segmentation and Targeting, Inferences

Challenges
➔ Data Curation
➔Define and Measure Data Quality
➔Track sources and possibly assign
confidence
➔Governance and Licensing restrictions
➔ Consistent Querying Interface

Challenges
● Storage capacity and retention
● Optimal usage of grid resources

Activity Data
➔Records actual activity
➔Time-series data
➔ Immutable, actual facts
➔Comprises Dimensions and Measures
➔Measures
➔Ad requests, Impressions, Clicks, Conversion, ...

Dimension Data
➔Domain specific Metadata (user, location, app
etc)
➔Each domain will have its own schema
➔User (uid, age, gender, interests etc)
➔Location (Lat/Long – zip/city/country, etc)
➔Device (Handset model, OS, version etc)
➔Mutable (but possibly slowly changing)

ETL
➔Need to ingest data from different
sources
➔Transform the data into a format for
optimized storage and easy queriability
➔Query interface for different consumers

ETL - Ingestion
➔Naive -- Have custom ingestion flows
➔Quick to develop
➔Could be highly optimized
➔Not scalable
➔Have a generic framework
➔Streamlined and scalable
➔Might need more processing

ETL - Storage
➔Naive -- Storage schema closely coupled
with ingestion schema
➔Multiple representations of same data. Age
could be DOB or years
➔Consitent representation a must
➔Would require transformation from input
schema to storage schema

ETL - Storage
➔Location – Lat/Long, Zip, City, Country
➔Need to store in the lowest possible granularity
(Lat/Long)
➔GPS readings come with accuracy that needs to
be recorded
➔Queries are almost always nearness queries,
not exact matches
➔

ETL - Storage
➔Quadtile representation
➔Use leading bits for tile id, remaining for storing
accuracy
➔Transform all location information to such ids
➔Nearness with Lat/Long distance is a cross-product
join
➔With Tiles, we can translate this into equi-joins (of
course with some loss of accuracy)

ETL - Querying
➔Naive -- Users aware of multiple feeds
and schemas, query appropriately
➔Extremely difficult as schemas change,
new feeds get added
➔Closely coupled with internal
representation, not good

ETL - Querying
➔Having a consistent, published schema
➔Enables exploration and discovery
➔Well defined querying interfaces that
abstract out internal representation
➔Provide primitives (for example UDFs for
nearness calculations) for easier querying

Ingestion Server
● Curation to filter out dubious records
● Adapters for transformation
● REST based ingestion server
– Support multiple compression types
– Support multiple serialization formats
– Handle rate-limiting/throttling
– Bulk/Streaming inputs
●

Storage and Querying
● Possibly different schema than ingestion
schema
● Columnar storage format (Parquet/ORC)
● Predominantly Hive friendly
● No direct access to internal storage, access
only through a HQL-like query layer
● Export option for other use case (online store)

Tech Stack
● Pig for most pipeline tasks
● Grill for analytics interface
● Hive as the primary execution engine
● Tez as the runtime environment
● ORC/Parquet for the storage format
●

Viewers also liked

This is the presentation from Null/OWASP/g4h November Bangalore MeetUp by Shivendra Saxena. technology.inmobi.com/events/null-owasp-g4h-november-meetup This topic would deal with the introduction to threat modeling. We'll discuss about the process of brainstorming about the issues which might appear when the product gets built. Will discuss about the STRIDE model and about the importance of the eraky detection of the security issues.

Introduction to Threat Modeling

InMobi Technology

Ensemble Methods for Algorithmic Trading

InMobi Technology

In these slides, we explore the unique challenges that mobile data present. The high cardinality, low signal to noise ratio and realtime needs have significant system implications. We outline how InMobi tackles these challenges. A specific Data Science use case is also presented. We outline our approach to user segmentation. A brief description of the challenges faced and our attempts to address them is also included.

Big Data and User Segmentation in Mobile Context

InMobi Technology

Optimizer Hints

InMobi Technology

Building Machine Learning Pipelines

InMobi Technology

PostgreSQL 9.5 - Major Features

InMobi Technology

Building Spark as Service in Cloud

InMobi Technology

InMobi Presentation for IT Minister @iSPIRT Event - Conclave for India as Pr...

ProductNation/iSPIRT

Case Studies on PostgreSQL

InMobi Technology

InMobi - The Economics of Building an Advertising Supported App Business

InMobi

This presentation is from Null/OWASP/G4H November Bangalore MeetUp 2014. technology.inmobi.com/events/null-owasp-g4h-november-meetup Talk Outline:- A) Reflective-(Non-Persistent Cross-site Scripting) - What is Reflective Cross-site scripting. - Testing for Reflected Cross site scripting How to Test - Black Box testing - Bypass XSS filters - Gray Box testing Tools Defending Against Reflective Cross-site scripting. Examples of Reflective Cross-Site Scripting Attacks. B) Stored -(Persistent Cross-site Scripting) What is Stored Cross-site scripting. How to Test - Black Box testing - Gray Box testing Tools Defending Against Stored Cross-site scripting. Examples of Stored Cross-Site Scripting Attacks.

Reflective and Stored XSS- Cross Site Scripting

InMobi Technology

Did you know that only 1% of the app store makes money through ads? A major reason for this is a reliance on traditional banners and interstitial ads – ad formats that have lower eCPMs than newer, high-engagement formats. Enter Rewarded Video ads where users can opt to watch a video in exchange for virtual goods or gifts within the app! Rewarded video is seeing huge adoption within the game developer community with the added benefit that often encourages players to stay with a game longer than they might otherwise. And developers are seeing their CTRs and eCPMs increase 2-3x after integrating rewarded video ads. Also big brands and large publishers want to acquire users through video ads thus creating a massive demand pool. Read this webinar to hear the latest insights from InMobi’s global experience increasing the value of monetization for game and app developers, specifically: -How you can implement and monetize with Rewarded Video ads. -What the right placements for your Rewarded Video ads are. -How much money you can earn using Rewarded Video ads. -How it works for different types of games and can it work for non-gaming apps? [Spoiler: Yes!] -Reward management and the types of rewards that are effective.

How To Succeed With Rewarded Video Ads

InMobi

The digital universe is huge and is growing at a stellar rate and along with it grows the data generated every second. By 2020, there will be nearly as many digital bits as there are stars in this universe. That effectively means infinite as per the reports published by IDC in 2014. InMobi has grown leaps and bounds globally in past few years and that has only caused the data here to grow exponentially. There are thousands of advertisers and publishers on InMobi network, handling the OLTP ( 200-300 GB ) and OLAP ( 14TB ) demands high availability and the best performance. To ensure the smoothness and 24/7 availability of our production database servers, we are using a lot of open source technologies to keep an eye on all the Postgresql servers running across different data centres. We have one of the biggest Postgresql Master-Slave Streaming Replication production setup and it is very important for us to monitor the database performance, production traffic and some analytics on top of each and every database server @InMobi.

24/7 Monitoring and Alerting of PostgreSQL

InMobi Technology

Mobile marketing strategy guide

InMobi

2016 was an eventful year for mobile in Asia Pacific especially in Indonesia, as more consumers are deserting desktops for mobile devices. Right from socialising, banking, planning their travel, to purchasing groceries mobile has become ingrained in a consumer’s life. Mobile, in short, has become a catalyst for transformation in the way we live our lives. As mobile technology and innovation continue to drive the way forward, here are the key mobile marketing research insights of 2017 in Indonesia based on InMobi’s network data.

Top 2017 Mobile Advertising Trends in Indonesia

InMobi

Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL

InMobi Technology

The Synapse IoT Stack: Technology Trends in IOT and Big Data

InMobi Technology

Viewers also liked (17)

Introduction to Threat Modeling

Ensemble Methods for Algorithmic Trading

Big Data and User Segmentation in Mobile Context

Optimizer Hints

Building Machine Learning Pipelines

PostgreSQL 9.5 - Major Features

Building Spark as Service in Cloud

InMobi Presentation for IT Minister @iSPIRT Event - Conclave for India as Pr...

Case Studies on PostgreSQL

InMobi - The Economics of Building an Advertising Supported App Business

Reflective and Stored XSS- Cross Site Scripting

How To Succeed With Rewarded Video Ads

24/7 Monitoring and Alerting of PostgreSQL

Mobile marketing strategy guide

Top 2017 Mobile Advertising Trends in Indonesia

Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL

The Synapse IoT Stack: Technology Trends in IOT and Big Data

Similar to Building Audience Analytics Platform

Scalable And Incremental Data Profiling With Spark

Jen Aman

Ledingkart Meetup #4: Data pipeline @ lk

Mukesh Singh

Bitkom Cray presentation - on HPC affecting big data analytics in FS

Philip Filleul

Zeotap’s Connect product addresses the challenges of identity resolution and linking for AdTech and MarTech. Zeotap manages roughly 20 billion ID and growing. In their presentation, Zeotap engineers will delve into data access patterns, processing and storage requirements to make a case for a graph-based store. They will share the results of PoCs made on technologies such as D-graph, OrientDB, Aeropike and Scylla, present the reasoning for selecting JanusGraph backed by Scylla, and take a deep dive into their data model architecture from the point of ingestion. Learn what is required for the production setup, configuration and performance tuning to manage data at this scale.

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale

ScyllaDB

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale

Saurabh Verma

Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html

Data Warehouse Modernization: Accelerating Time-To-Action

MapR Technologies

NDC Minnesota - Analyzing StackExchange data with Azure Data Lake

Tom Kerkhove

Data Platform Architecture Principles and Evaluation Criteria

ScyllaDB

This module has been created to answer all the questions on how IPFS can be used for dynamic real-time applications. In this module, you will learn: - how to reason about dynamic data on IPFS, - IPNS, the simplest construction for naming in IPFS, - how PubSub can offer subsecond speeds for interactive applications, - how CRDTs are a fundamental building block for distributed applications, - what is available in the ecosystem.

Module: Mutable Content in IPFS

Ioannis Psaras

In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP . Over 200K tests runs per day providing statistics and insights into the problem as they happen. Arik will go through different topics and stages of the journey and share details that led to current results . Part of the menu topics are : The Awakens of the End2End Insights • How we measure our services using synthetic user experience • Measuring through analytics & insights • How we collect our data • How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights • Future logs App correlation with End2End data • Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana) In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.

Measure() or die()

Tamar Duvshani Hermel

Measure() or die()

LivePerson

Putting Apache Drill into Production

MapR Technologies

Scaling systems using change propagation across data stores

Jagadeesh Huliyar

The traditional approach to insurance pricing involves fitting a generalized linear model (GLM) to data collected on historical claims payments and premiums received. The explosive growth in data availability and increasing competitiveness in the marketplace are challenging actuaries to find new insights in their data and make predictions with more granularity, improved speed and efficiency, and with tighter integration among business units to support strategic decisions. In this session we will share our experience implementing deep hierarchical neural networks using TensorFlow and PySpark on Databricks. We will discuss the benefits of the ML Runtime, our experience using the goofys mount, our process for hyperparameter tuning, specific considerations for the large dataset size and extreme volatility present in insurance data, among other topics. Authors: Bryn Clark, Krish Rajaram

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide

Databricks

Rohit Jain (Esgyn) Customers are looking for one database engine to address all their varied needs--from transactional to analytical workloads--against structured, semi-structured, and unstructured data (Gartner’s term Hybrid Transactional/Analytical Processing, or HTAP, perhaps comes closest to describing this nirvana.) But can it be achieved? The motivation of this talk is to establish a framework for assessing the maturity and capabilities of query engines on Apache Hadoop ecosystem storage engines such as HBase in meeting these diverse needs.

In Search of Database Nirvana: Challenges of Delivering HTAP

HBaseCon

Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...

DataStax

[WSO2Con Asia 2018] Patterns for Building Streaming Apps

WSO2

How to create custom dashboards in Elastic Search / Kibana with Performance V...

PerformanceVision (previously SecurActive)

Big data and the cloud are perfect partners for companies who want to unlock maximum value from all of their unstructured, semi-structured, and structured data. The challenge has been how to create and manage a reliable end-to-end solution that spans data ingestion, storage and analysis in the face of the volume, velocity and variety of big data sources. In this webinar, we will show you how to achieve big data bliss by combining StreamSets Data Collector, which specializes in creating and running complex any-to-any dataflows, with Microsoft's Azure Data Lake and Azure analytic solutions. We will walk through an example of how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform and Azure Data Lake to take advantage of analytics tools with unprecedented scale and performance.

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Streamsets Inc.

Scalable and Available, Patterns for Success

Derek Collison

Similar to Building Audience Analytics Platform (20)

Scalable And Incremental Data Profiling With Spark

Ledingkart Meetup #4: Data pipeline @ lk

Bitkom Cray presentation - on HPC affecting big data analytics in FS

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale

Data Warehouse Modernization: Accelerating Time-To-Action

NDC Minnesota - Analyzing StackExchange data with Azure Data Lake

Data Platform Architecture Principles and Evaluation Criteria

Module: Mutable Content in IPFS

Measure() or die()

Putting Apache Drill into Production

Scaling systems using change propagation across data stores

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide

In Search of Database Nirvana: Challenges of Delivering HTAP

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...

[WSO2Con Asia 2018] Patterns for Building Streaming Apps

How to create custom dashboards in Elastic Search / Kibana with Performance V...

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Scalable and Available, Patterns for Success

More from InMobi Technology

This is the presentation from Null/OWASP/g4h December Bangalore MeetUp by Akash Mahajan. technology.inmobi.com/events/null-owasp-g4h-december-meetup Abstract: This will cover the basics of Hyper Text Transfer Protocol. You will learn how to send HTTP requests like GET, POST by crafting them manually and using a command line tool like CURL. You will also see how session management using cookies happens using the same tools. To practice along please install curl (http://curl.haxx.se/download.html).

HTTP Basics Demo

InMobi Technology

What's new in Hadoop Yarn- Dec 2014

InMobi Technology

Attacking Web Proxies

InMobi Technology

Security News Bytes Null Dec Meet Bangalore

InMobi Technology

This is the presentation from Null/OWASP/g4h Bangalore October MeetUp by Narayanan Subramaniam. technology.inmobi.com/events/null-october-meetup Matriux is a GNU/Linux, Debian based security distribution designed for penetration testing and cyber forensic investigations. It is a distribution designed for security enthusiasts and professionals, can also be used normally as your default OS. In the presentation , we will see how we can turn any system into a powerful penetration testing toolkit, without having to install any software into your hardisk. Matriux is designed to run from a Live environment like a CD / DVD or USB stick or it can easily be installed to your hard disk in a few steps.

Matriux blue

InMobi Technology

This is the presentation from Null/OWASP/g4h Bangalore October MeetUp by Manasdeep. http://technology.inmobi.com/events/null-october-meetup This talk will focus on the general overview of the PCI-DSS standard and how does it help to protect the cardholder data. Changes introduced in the new PCI DSS v3.0 standard will further explore how it safeguards the Cardholder data environment for the various entities. Talk Outline: - PCI DSS v3 : An Overview - PCI DSS: How it is different from other similar standards? - PCI DSS vs ISO 27001 - Protecting Cardholder data through PCI DSS v3 - Common Myths regarding PCI DSS - Security vs Compliance

PCI DSS v3 - Protecting Cardholder data

InMobi Technology

Running Hadoop as Service in AltiScale Platform

InMobi Technology

Shodan- That Device Search Engine

InMobi Technology

Massively Parallel Processing with Procedural Python - Pivotal HAWQ

InMobi Technology

Tez Data Processing over Yarn

InMobi Technology

Freedom Hack Report 2014

InMobi Technology

More from InMobi Technology (11)

HTTP Basics Demo

What's new in Hadoop Yarn- Dec 2014

Attacking Web Proxies

Security News Bytes Null Dec Meet Bangalore

Matriux blue

PCI DSS v3 - Protecting Cardholder data

Running Hadoop as Service in AltiScale Platform

Shodan- That Device Search Engine

Massively Parallel Processing with Procedural Python - Pivotal HAWQ

Tez Data Processing over Yarn

Freedom Hack Report 2014

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

ICT role in 21st century education and its challenges

rafiqahmad00786416

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Exploring Multimodal Embeddings with Milvus

Zilliz

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

When you’re building (micro)services, you have lots of framework options. Spring Boot is no doubt a popular choice. But there’s more! Take Quarkus, a framework that’s considered the rising star for Kubernetes-native Java. It always depends on what's best for your situation, but how to choose the best solution if you're comparing 2 frameworks? Both Spring Boot and Quarkus have their positives and negatives. Let us compare the two by live coding a couple of common use cases in Spring Boot and Quarkus. After this talk, you’ll be ready to get started with Quarkus yourself, and know when to select Quarkus or Spring Boot.

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Jago de Vreede

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

MINDCTI Revenue Release Quarter One 2024

Manulife - Insurer Transformation Award 2024

Artificial Intelligence Chap.5 : Uncertainty

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Why Teams call analytics are critical to your entire business

Boost Fertility New Invention Ups Success Rates.pdf

ICT role in 21st century education and its challenges

Corporate and higher education May webinar.pptx

Apidays New York 2024 - The value of a flexible API Management solution for O...

AXA XL - Insurer Innovation Award Americas 2024

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Exploring Multimodal Embeddings with Milvus

[BuildWithAI] Introduction to Gemini.pdf

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Building Audience Analytics Platform

1. Building Audience Analytics Platform Jothi Padmanabhan Inmobi 6-Sep-2014

2. Motivation ➔Audience Analytics platform is extremely critical ➔Segmentation ➔Rule Based ➔Inferred based on Sciences Modeling ➔Third Party ➔Targeting ➔Maximize CTR and CVR

3. Challenges ➔Scale ➔Billions of Ad requests/day, Peak 25K rps, 800M Users ➔ Multiple Input Sources and Types ➔Fact Data, Dimension Data ➔ Multiple Consumers ➔Reporting, Segmentation and Targeting, Inferences

4. Challenges ➔ Data Curation ➔Define and Measure Data Quality ➔Track sources and possibly assign confidence ➔Governance and Licensing restrictions ➔ Consistent Querying Interface

5. Challenges ● Storage capacity and retention ● Optimal usage of grid resources

6. Activity Data ➔Records actual activity ➔Time-series data ➔ Immutable, actual facts ➔Comprises Dimensions and Measures ➔Measures ➔Ad requests, Impressions, Clicks, Conversion, ...

7. Dimension Data ➔Domain specific Metadata (user, location, app etc) ➔Each domain will have its own schema ➔User (uid, age, gender, interests etc) ➔Location (Lat/Long – zip/city/country, etc) ➔Device (Handset model, OS, version etc) ➔Mutable (but possibly slowly changing)

8. ETL ➔Need to ingest data from different sources ➔Transform the data into a format for optimized storage and easy queriability ➔Query interface for different consumers

9. ETL - Ingestion ➔Naive -- Have custom ingestion flows ➔Quick to develop ➔Could be highly optimized ➔Not scalable ➔Have a generic framework ➔Streamlined and scalable ➔Might need more processing

10. ETL - Storage ➔Naive -- Storage schema closely coupled with ingestion schema ➔Multiple representations of same data. Age could be DOB or years ➔Consitent representation a must ➔Would require transformation from input schema to storage schema

11. ETL - Storage ➔Location – Lat/Long, Zip, City, Country ➔Need to store in the lowest possible granularity (Lat/Long) ➔GPS readings come with accuracy that needs to be recorded ➔Queries are almost always nearness queries, not exact matches ➔

12. ETL - Storage ➔Quadtile representation ➔Use leading bits for tile id, remaining for storing accuracy ➔Transform all location information to such ids ➔Nearness with Lat/Long distance is a cross-product join ➔With Tiles, we can translate this into equi-joins (of course with some loss of accuracy)

13. ETL - Querying ➔Naive -- Users aware of multiple feeds and schemas, query appropriately ➔Extremely difficult as schemas change, new feeds get added ➔Closely coupled with internal representation, not good

14. ETL - Querying ➔Having a consistent, published schema ➔Enables exploration and discovery ➔Well defined querying interfaces that abstract out internal representation ➔Provide primitives (for example UDFs for nearness calculations) for easier querying

15.

16. Ingestion Server ● Curation to filter out dubious records ● Adapters for transformation ● REST based ingestion server – Support multiple compression types – Support multiple serialization formats – Handle rate-limiting/throttling – Bulk/Streaming inputs ●

17. Storage and Querying ● Possibly different schema than ingestion schema ● Columnar storage format (Parquet/ORC) ● Predominantly Hive friendly ● No direct access to internal storage, access only through a HQL-like query layer ● Export option for other use case (online store)

18. Tech Stack ● Pig for most pipeline tasks ● Grill for analytics interface ● Hive as the primary execution engine ● Tez as the runtime environment ● ORC/Parquet for the storage format ●

19. Questions

Building Audience Analytics Platform

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Building Audience Analytics Platform

Similar to Building Audience Analytics Platform (20)

More from InMobi Technology

More from InMobi Technology (11)

Recently uploaded

Recently uploaded (20)

Building Audience Analytics Platform