SlideShare una empresa de Scribd logo
1 de 14
2013
Technical Anatomy of a
Caller ID application
Kristine Delossantos
Oct 3rd’2013
#GHC13
1:17 PM
2013
2013
Outline
 Overview of WhitePages Current Caller ID
 Technical Architecture
 Key Problems and Solutions
2013
Sweet Call Alert
Know who’s calling and
what is happening with
them real-time
2013
One List, One Touch
 Consolidated call/text log
 One tap easy access to
top contacts
2013
Make It Visual
 Sharable insights into communication style
– who when and how.
2013
How It Works
Meet Spongebob…
He just got a new phone,
Installed Current,
And wired it to Facebook!
2013
Spongebob’s Friends Are Excited!
They want to celebrate and get Krabby Patties
together,
So they text him about it
2013
Technical Architecture
Active MQ
Contact
Graph
Store
WhitePages
Mobile
Service Front
Ends
Entity
Resolution
System
Data
Collection
Services
2013
Keeping Data Fresh
Network variance
Data connections
Usage Plans
Push/Pull protocols
Our solution:
• We periodically update the data on a schedule in
the background, in batch.
• Active MQ & worker machines
2013
Data Transfer
Our solution:
Thrift over Http and we only
deliver objects since the last
successful request.
ThriftJSON
Serialized Contact List Size
Comparison
GZip
Thrift
HTTP
Updates
2013
Storage Solution
Engineering costs
Operational costs
Postgres
Our solution:
We settled on Postgres and treat it as a NoSQL key-
value store. This saved engineering time as well as
costs.
2013
Entity Resolution System
Machine learning
Tunable
Performance
2013
Developing Mobile Applications
Carrier variance
Test matrix
Device variance
Platform solutions
2013
Got Feedback?
Rate and Review the session using the
GHC Mobile App
To download visit www.gracehopper.org

Más contenido relacionado

Similar a Technical Anatomy of a Caller ID Android App

Socket programming assignment
Socket programming assignmentSocket programming assignment
Socket programming assignment
Ravi Gupta
 
Knowledge Matters Issue 15 - Technology at Concern
Knowledge Matters Issue 15 - Technology at ConcernKnowledge Matters Issue 15 - Technology at Concern
Knowledge Matters Issue 15 - Technology at Concern
Ellen Ward
 

Similar a Technical Anatomy of a Caller ID Android App (20)

HIGH SPEED DATA RETRIEVAL FROM NATIONAL DATA CENTER (NDC) REDUCING TIME AND I...
HIGH SPEED DATA RETRIEVAL FROM NATIONAL DATA CENTER (NDC) REDUCING TIME AND I...HIGH SPEED DATA RETRIEVAL FROM NATIONAL DATA CENTER (NDC) REDUCING TIME AND I...
HIGH SPEED DATA RETRIEVAL FROM NATIONAL DATA CENTER (NDC) REDUCING TIME AND I...
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 
Offline and Online Bank Data Synchronization System
Offline and Online Bank Data Synchronization SystemOffline and Online Bank Data Synchronization System
Offline and Online Bank Data Synchronization System
 
Stream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDBStream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDB
 
Vital.AI Creating Intelligent Apps
Vital.AI Creating Intelligent AppsVital.AI Creating Intelligent Apps
Vital.AI Creating Intelligent Apps
 
Alexander Cahill Resume
Alexander Cahill ResumeAlexander Cahill Resume
Alexander Cahill Resume
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Socket programming assignment
Socket programming assignmentSocket programming assignment
Socket programming assignment
 
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
 
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
 
Pppppppppttttttttttttttttttttt
PpppppppptttttttttttttttttttttPppppppppttttttttttttttttttttt
Pppppppppttttttttttttttttttttt
 
Nishant_CV
Nishant_CVNishant_CV
Nishant_CV
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
 
Smart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat TranSmart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat Tran
 
Knowledge Matters Issue 15 - Technology at Concern
Knowledge Matters Issue 15 - Technology at ConcernKnowledge Matters Issue 15 - Technology at Concern
Knowledge Matters Issue 15 - Technology at Concern
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering Algorithm
 
Google Apps in Legal Aid - Part 1
Google Apps in Legal Aid - Part 1Google Apps in Legal Aid - Part 1
Google Apps in Legal Aid - Part 1
 
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
 
Khude Barta - Online Messaging Application
Khude Barta - Online Messaging ApplicationKhude Barta - Online Messaging Application
Khude Barta - Online Messaging Application
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Technical Anatomy of a Caller ID Android App

Notas del editor

  1. My name is Kristine Delossantos and I am a Software Engineer on the Mobile Team at Whitepages. I wanted to take this time to talk about the technical workings of an Android application we released last August called Current Caller ID and leave you with some key takeaways from our development experience.
  2. I’ll start off with a quick overview of our app.Then I’ll show an architectural diagram.Afterwards I’ll get to the key problems and our current solutionsFirst.. What is Current Caller id?
  3. We have a sweet call alert. Not only will it tell you who’s calling, but it’ll also integrate Facebook, LinkedIn, and Twitter data to show what is happening with them in real-time.
  4. In the app, you can access a consolidated call/text log with one tap easy access to your top contacts.
  5. Then, we make it visual. We have sharable infographics that show insights into your communication style. They show you who you communicate with, when you communicate, and how. Now I’ll give you more detail about the technical side of things by giving you this scenario……
  6. Meet Spongebob.He just got a new cell phone, installed Current, and wired it up to his Facebook account.He posted a status to Facebook with his new number, telling his friends to text him.
  7. Patrick,Squidward, MrKrabs, and Sandy are so excited that Spongebob finally has a phone, so they all text him to get Krabby Patties to celebrate.Current recognizes these as new contacts and gets to work.
  8. Current sends the data to our servers and we store it for further processing. Our front ends deliver a message to an asynchronous messaging queue system alerting the data collection services of the new contacts.Our data collection services pick that up and reaches out to our whitepages data and social networks to collect more information about the contacts. Then we store it.Our data collection services deliver another message to our Active MQ pipeline alerting the entity resolution system that we’ve collected information that needs to be resolved together.The entity resolution system picks that up and fetches data from our contact graph store. (I’ll get into more detail about the Entity Resolution system in a bit, but) It resolves the data, stores it, and sends it back to the client. Now I’ll dive into the key takeaways we learned while trying to make all of this work.
  9. When dealing with large data sets, you want to make sure you keep it fresh, and do it efficiently. You don’t want to violate your customer’s trust by usingt up their data plan.We first needed to decide between a push or a pull protocol. Since the client triggers updates from the server, we didn’t need realtime updates every step of the way. Whenever a change in your Call/Text log or the Address book happens, the client sends the changes over to the server and then increases the polling frequency to fetch the updates to any new associations that have been created on the server, and then the client refreshes the UI. Doing real-time lookups is not fast enough to present a rich call alert in a timely fashion. Additionally, when we first started, CDMA was more prevalent so simultaneous voice and data communication wasn’t possible. So we chose to pull to minimize our customers data usage while still responding to updates quickly.The system was designed to perform these jobs on their own by using ActiveMQ, a popular open source messaging queue system, and a scalable host of worker machines to process messages delivered to the queue and update our databases.The key takeaway here is that it was best to deliver data as its available in an asynchronous fashion and deliver only new data, that way the user experience doesn’t suffer with long wait times and loading screens.
  10. When transferring large sets of data, you want to pay close attention to using smaller serialization schemes. Keep in mind that the mobile device may not always be connected and make sure your app can handle that. When choosing our transfer protocol, we realized that HTTP was easiest to plug into our infrastructure. Then we compared Thrift and json for the format of our data. Json can be compressed and is easy to debug, but ideally we wanted to keep payloads as small as possible, and thrift was best for the job in its compact binary form.Compact binary thrift compared to JSON, with the same data set, cut payloads in ~½. We usedGzip since the HTTP protocol supports Gzip compression so it was a widely available compression scheme, which gave us an average 30% savings under thrift. We also make sure we only deliver data that has changed in batches so the client only receives data that is necessary. Make sure to be cognizant of payload size from a serialization format perspective, compression perspective, and overall structural perspective (choice of delivering only deltas)When you’re dealing with large sets of data, you will probably need to store that data somewhere.
  11. It is important for your storage solution to be fault tolerant, maintain consistency, and scale horizontally.You might want to consider using data partitioning for increasing I/O and maintaining scalability.We use postgres and treat it as a NoSQL key-value store. We use partitions to spread our data across multiple databases.A drawback with our solution was that it’s difficult to add more partitions without high engineering costs. We are currently exploring tools that can scale automatically so that adding capacity is a simpler task. One of the things we did that helps this effort was Early on in our development, we deliberately segmented our api and model code from underlying storage.It’s important to choose a data model that is efficient and flexible and choose a storage engine that can easily adapt to unexpected events.Make sure it meets the customer requirements, I/O requirements, and processing requirements. Keep in mind operational requirements and growth. Make sure it still works with 20x projection.If you’re developing an application that’s data centric you might need to detect separate records that refer to the same entities.
  12. … which means you’ll need an entity resolution system. The Infolab at Stanford University defines Entity resolution as “locating and merging records that refer to the same real-world entities”. In our case, we needed one to match names.If an entity resolution system is required for your application, you want to make sure it is tunable and performs well.The obvious choice when it comes to developing large scale entity resolution systems is machine learning. We originally opted not to do machine learning because we had a predefined set of rules we thought were correct. As we tried to implement our system, we learned that it wasn’t as simple as we thought. You might want to consider machine learning upfront because in hindsight, we could have explored it more. The first step to building our entity resolution system was Defining the rules that would resolve two entities together. For example, to resolve two contacts together, they have to have a last name match from two different sources while the first name could be a nickname or complete match. We started with a decision tree to support the rules we had outlined.One drawback with the decision tree is that it scales very well vertically if you were to add additional rules but doesn't scale very well horizontally , in our example, if we were to add a few more social networks to match the contacts against, it wouldn’t be easy.We wrote tools to process sets of data that we could run user data samples against to see if we got the expected results based on the defined rules. This helped speed up iteration time significantly on further improving our match rate and the resolution engine.
  13. I’d like to close out our talk with what to keep in mind when developing mobile applications. The mobile team at WhitePages has developed several applications in the past but this one was particularly interesting and we came out of it with several takeaways. When you are conceptualizing an idea, the first step is to evaluate the feasibility of the product by exploring various platforms and evaluating the capabilities available to you. For instance, in our case iOS doesn’t provide access to call history or any kind of call/text communication data. The platform we targeted for current was Android as it gives us most access to enable caller ID functionality. We also noticed during development that since we were working with private APIs, there was a lot of variation between implementations on different carriers and manufacturers.For example1) Annotation of call type and notifications of incoming/outgoing calls are different among devices and carriers. 2) Current allows blocking calls and texts, and for blocking calls on HTC, we had to set additional state so it would respond to pick up and hangup API calls. To avoid surprises we’d highly recommend defining your device matrix for testing well ahead of time and note this can be very different from the top devices published on the platformbased on the demographics you are targeting and the nature of your product, so do your research well ahead of time.
  14. This is the last slide and must be included in the slide deck