0629venmoplus

•

0 recomendaciones•118 vistas

Qingpeng "Q.P." Zhang

venmoplus.com

Empresariales

Historical
transactions
Real time
transactions
Pipeline

2013
Biggest Challenge:
● Calculate/Query graph distance in real time

Solutions
● Two databases
● Graph algorithm optimizations
● Query/search optimizations
● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark
● ...

Historical
transactions
Real time
transactions
A Tale of Two Databases

Two Databases
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)

VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day

About Me
● Postdoc in Lawrence Berkeley National Lab
● PhD in Computer Science, Michigan State
Certified Volunteers:
● Software Carpentry
● Data Carpentry
● American Red Cross
Christmas Eve 2014, ice storm, Michigan

Algorithm 1
Shortest distance -> intersection of sets (friend lists)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?
● 2nd degree friends of A ∩ 1st degree friends of B == []?

Algorithm 2
Query distance between vertices in a historic moment in a constantly changing graph (because we
don’t pre-calculate the distance….)
● A recent transaction for a user is history and has changed the graph
● Query distance of the two users at that moment.
○ not considering that specific transaction)
○ Remove the influence of that specific transaction temporarily and restore
■ Test if that transaction is the first between the pair of users.

Query/Search Optimizations
1. Remove aggregation for better performance… (trade-off)
2. Friend recommender:
a. Using Counter to get only 5 users with the most common friends
3. Search message in friend circle
a. Combine query of Elasticsearch and Redis

● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)

Más contenido relacionado

La actualidad más candente

ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt

MLSD18. Basic Transformations - QCRIBigML, Inc

MLSD18. Data CleaningBigML, Inc

MLSD18. Summary of Morning SessionsBigML, Inc

MLSD18. Real-World Use Case IBigML, Inc

Research Automationfor Data-Driven DiscoveryGlobus

CCCB Germline Variant Analysis on Cloud PlatformYaoyu Wang

Enabling Clinical Research in the Real WorldMongoDB

Honey on the Wire KohaCon18Joy Nelson

[Data Innovation Summit 2015] Belga Big Content PlatformRobert Gibbon

Real-time Data Analytics mit Elasticsearchinovex GmbH

A Data Decomposition Method for Stepwise Migration of Complex Legacy DataAndy Martens

PageRank and Related MethodsJohn Breslin

GDBinSV_Meetup_DBMS_Trends_10062016Joshua Bae

JSON-LD and SHACL for Knowledge GraphsFranz Inc. - AllegroGraph

Data tools ecosystem for non-programmersOutliers Collective

GraphDBÖmer Taşkın

Scaling collaborative data science with Globus and JupyterIan Foster

La actualidad más candente (18)

ROI in Linking Content to CRM by Applying the Linked Data Stack

MLSD18. Basic Transformations - QCRI

MLSD18. Data Cleaning

MLSD18. Summary of Morning Sessions

MLSD18. Real-World Use Case I

Research Automationfor Data-Driven Discovery

CCCB Germline Variant Analysis on Cloud Platform

Enabling Clinical Research in the Real World

Honey on the Wire KohaCon18

[Data Innovation Summit 2015] Belga Big Content Platform

Real-time Data Analytics mit Elasticsearch

A Data Decomposition Method for Stepwise Migration of Complex Legacy Data

PageRank and Related Methods

GDBinSV_Meetup_DBMS_Trends_10062016

JSON-LD and SHACL for Knowledge Graphs

Data tools ecosystem for non-programmers

GraphDB

Scaling collaborative data science with Globus and Jupyter

Destacado

Georeferenciar Imágenes (GOOGLE EARTH, raster desing Y CIVIL 3D 2015,2016)IStvn Salvador

Usabilidadmarthamaziric

The Future of Search MarketingInsight Summit Series

Receta para una vejez feliz Perrutii

Genre survey resultsHollieRackley

12. pertanian hal 116 129fadilrazqa

Letter sound- key word quiz threeCristy Love

Atlas prici decisionImplant Sharma

Christina_2014 CVChristina Adel

Scalable. Digital. Success.Insight Summit Series

Evaluation question 2TylaJayde

Letter sound- key word2Cristy Love

Q2 finallucyeaston

Bethany 22Lisa Carner Dailey

Q2 finallucyeaston

Aula de la naturalezapedroprimeroprimerciclo

RecticelMatthias de Mey

Evaluation 2HollieRackley

Itg investor presentation_30apr15Investment_Tech_Group

Drug Courtjstcroix

Destacado (20)

Georeferenciar Imágenes (GOOGLE EARTH, raster desing Y CIVIL 3D 2015,2016)

Usabilidad

The Future of Search Marketing

Receta para una vejez feliz

Genre survey results

12. pertanian hal 116 129

Letter sound- key word quiz three

Atlas prici decision

Christina_2014 CV

Scalable. Digital. Success.

Evaluation question 2

Letter sound- key word2

Q2 final

Bethany 22

Q2 final

Aula de la naturaleza

Recticel

Evaluation 2

Itg investor presentation_30apr15

Drug Court

Similar a 0629venmoplus

Qingpeng zhang week5Qingpeng "Q.P." Zhang

Introducing VenmoPlus.com 6/27 versionQingpeng "Q.P." Zhang

Qingpeng zhang 0711Qingpeng "Q.P." Zhang

Qingpeng zhang 0713Qingpeng "Q.P." Zhang

VenmoPlusQingpeng "Q.P." Zhang

VenmoPlus0708Qingpeng "Q.P." Zhang

Graph Analytics with ArangoDBArangoDB Database

Data Science as ScaleConor B. Murphy

Data Discovery and Metadatamarkgrover

Open Analytics EnvironmentIan Foster

Data platform architecture principles - ieee infrastructure 2020Julien Le Dem

Democratizing Data within your organization - Data DiscoveryMark Grover

Graph Gurus Episode 1: Enterprise GraphTigerGraph

Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen

Big data real time architecturesDaniel Marcous

Clickstream data with sparkMarissa Saunders

GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos

GraphGen: Conducting Graph Analytics over Relational DatabasesPyData

SDSC18 and DSATL Meetup March 2018 CareerBuilder.com

Pydata talkTuri, Inc.

Similar a 0629venmoplus (20)

Qingpeng zhang week5

Introducing VenmoPlus.com 6/27 version

Qingpeng zhang 0711

Qingpeng zhang 0713

VenmoPlus

VenmoPlus0708

Graph Analytics with ArangoDB

Data Science as Scale

Data Discovery and Metadata

Open Analytics Environment

Data platform architecture principles - ieee infrastructure 2020

Democratizing Data within your organization - Data Discovery

Graph Gurus Episode 1: Enterprise Graph

Lambda Architecture and open source technology stack for real time big data

Big data real time architectures

Clickstream data with spark

GraphGen: Conducting Graph Analytics over Relational Databases

SDSC18 and DSATL Meetup March 2018

Pydata talk

Último

Katrina Personal Brand Project and portfolio 1kcpayne

Famous Olympic Siblings from the 21st Centuryrwgiffor

Value Proposition canvas- Customer needs and painsP&CO

Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...lizamodels9

Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...allensay1

Cracking the Cultural Competence Code.pptxWorkforce Group

BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444

Uneak White's Personal Brand Exploration Presentationuneakwhite

FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066

B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo

Insurers' journeys to build a mastery in the IoT usageMatteo Carbone

VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888

Call Girls In Noida 959961⊹3876 Independent Escort Service Noidadlhescort

How to Get Started in Social Media for Art League CityEric T. Tung

Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora

Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9

It will be International Nurses' Day on 12 MayNZSG

0629venmoplus

1. Introducing VenmoPlus.com -Explore your Venmo network! Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

2. Historical transactions Real time transactions Pipeline

3. 2013 Biggest Challenge: ● Calculate/Query graph distance in real time

4. Solutions ● Two databases ● Graph algorithm optimizations ● Query/search optimizations ● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark ● ...

5. Historical transactions Real time transactions A Tale of Two Databases

6. Two Databases 420890 Graham Hadley 1630476 Leon Tang 810029 Harminder Toor 1371353 Ephraim Park 562884 Paul Min 420890 set(14935158, 562884) 1630476 set(1371353) 810029 set(190230,14935158) 1371353 set(810029,971156) 562884 set(196371,1371353)

7. Two Databases

8. VenmoPlus.com m4.xlarge m4.large m4.xlarge m4.large t2.micro $29.11/day

9. About Me ● Postdoc in Lawrence Berkeley National Lab ● PhD in Computer Science, Michigan State Certified Volunteers: ● Software Carpentry ● Data Carpentry ● American Red Cross Christmas Eve 2014, ice storm, Michigan

10. Algorithm 1 Shortest distance -> intersection of sets (friend lists) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?

11. Algorithm 2 Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….) ● A recent transaction for a user is history and has changed the graph ● Query distance of the two users at that moment. ○ not considering that specific transaction) ○ Remove the influence of that specific transaction temporarily and restore ■ Test if that transaction is the first between the pair of users.

12. Pipeline, raw data, in distributed way

13. This, or that? - to build graph

14. This, or that? - for fast searching

15. Query/Search Optimizations 1. Remove aggregation for better performance… (trade-off) 2. Friend recommender: a. Using Counter to get only 5 users with the most common friends 3. Search message in friend circle a. Combine query of Elasticsearch and Redis

16.

17. Lesson learned

18.

19.

20. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions)

21. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions) No cache (precalculation)? No GraphDB?

0629venmoplus

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Destacado

Destacado (20)

Similar a 0629venmoplus

Similar a 0629venmoplus (20)

Último

Último (20)

0629venmoplus