SlideShare una empresa de Scribd logo
1 de 14
Being a Software Engineer at
Facebook
13 November 2013
Food
Perks
Culture
Bootcamp
Scale
In total:

Every DAY:
∗400,000 net new users
∗350 million photos
∗5 billion shares
∗10 billion messages

∗1.2 billion users
∗150 billion friendships
∗250 billion photos
∗1 trillion likes
∗16% of all time spent on
Internet
∗1 million users per engineer
UIs
This is your fault
Impact
Impact
Scope
∗ Machine learning
∗ Big Data
∗ Search and information
retrieval
∗ Performance
∗ Hardware
∗ Network
∗ Human-computer
interaction

∗
∗
∗
∗
∗
∗
∗
∗

Web UI
Mobile UI
Static analysis
Compilers
Virtual machines
Image processing
Video processing
Datacentre design
Fixing a bug

tldr: We've found the source for the File Descriptor leaks and we have potential fixes to address the problem.
Taking a deeper look at the problem, in order to verify what the real fixes could be, we need to determine the actual destination of where the file descriptors are pointing to. In adb shell you can do
so by cd-ing into /proc/<pid>fd and then ls -l to show you the actual destination...
A quick glance at the result set reveals that pipes and /dev/ashmem occupies the majority of the open fds, since both items take up almost 50% of all the open fds, these two items are ideal
candidates to figure out the fd leaking issue.
In short, ashmem stands for Android Shared Memory, and it is used by the Android system to facilitates memory sharing across all processes. Each ashmem registers a shrinker and the shrinker
would reclaim the memory when the device is in low memory state, just like jvm, but in the native space. As for pipe in unix world, it is an interprocess channel that places two file descriptors, one
for reading and one for writing.
My first task is see why the number of pipe fds are building up upon scrolling in newsfeed. When FB4A first started we have around xx open pipes. Scrolling through couple pages will grow the
number… To isolate the problem from fb4a, I built the fbsimple app which only contains the newsfeed module and I observed the same behavior. To further isolate the problem, I then turned off
image fetching/prefetching to see if the problem is correlated to the image fetching pipeline. Surprisingly, without image fetching I can still see the problem, and I am convinced that the problem
affects more than just the image pipeline. My next experiment disabled newsfeed database caching and the same problem persisted, which rules out db access as the main cause of the problem.
The only thing left to do is to play around with the network executor. On FB4A by default, we use the HttpClient from Apache to execute all the network requests. Earlier last month, we introduced
the SPDY library okHttp as an experiment to replace Apache HttpClient. A quick test reveals that Apache HttpClient is indeed the culprit for the leaking pipes - with the same configuration, okHttp
keeps the open fd pipe to around 20 versus 90 with Apache HttpClient. Not only okHttp is better at reusing network connection, it also has better fd management. A sanity check with okhttp
enabled in FB4A reveals the same result.
Ashmem debugging is rather straight forward - Ashmem is allocated when image fetch is enabled. A deep dive reveals that the FD is only allocated after the bitmap decoding has been called, and I
suspected that ashmem has to do with purgeability. To verify, I disabled image cache and instead of relying on a disk file to decode the image, I passed in the http content inputstream directly and
used BitmapFactory.decodeStream to decode the image. With this I can confirmed that we are not longer allocating ashmem for decoding because the images are no longer purgeable and lives in
the java heap space. However, we ran into the same memory problem with byte decoding experiment, and big images would be black or partially decoded on fb4a. So instead of decoding every
images with the stream based approach, I made a quick prototype to have big images (images from single photo stories, multi photos collage) to render with our existing solution and have small
images like profile pictures to be rendered with stream and the result looks promising. Scrolling through the list of 1000 people in the flyout now would not grow the number of open fds. I think this
hybrid approach would work.
With the combined approach as stated above, FB4A now stays around half the open fds.
Values
Questions?
∗ Engineer: tnicholas@fb.com
∗ Recruiter (£££): ruth@fb.com

Más contenido relacionado

La actualidad más candente

Wp7 performance challenges
Wp7 performance challengesWp7 performance challenges
Wp7 performance challenges
Gergely Orosz
 
Seattle javascript game development - Overview
Seattle javascript game development - OverviewSeattle javascript game development - Overview
Seattle javascript game development - Overview
Grant Goodale
 
Marmalade: bittersweet experience
Marmalade: bittersweet experienceMarmalade: bittersweet experience
Marmalade: bittersweet experience
Max Klyga
 

La actualidad más candente (16)

A preview of Feathers 2.2 and the Feathers SDK
A preview of Feathers 2.2 and the Feathers SDKA preview of Feathers 2.2 and the Feathers SDK
A preview of Feathers 2.2 and the Feathers SDK
 
JavaScript all the things! - FullStack 2017
JavaScript all the things! - FullStack 2017JavaScript all the things! - FullStack 2017
JavaScript all the things! - FullStack 2017
 
Wp7 performance challenges
Wp7 performance challengesWp7 performance challenges
Wp7 performance challenges
 
Building desktop applications with web technologies - ELECTRON the easy way
Building desktop applications with web technologies - ELECTRON the easy wayBuilding desktop applications with web technologies - ELECTRON the easy way
Building desktop applications with web technologies - ELECTRON the easy way
 
Production Schedule
Production Schedule Production Schedule
Production Schedule
 
Full stack development in Go
Full stack development in GoFull stack development in Go
Full stack development in Go
 
From React to React Native
From React to React NativeFrom React to React Native
From React to React Native
 
Turning Plone into a dynamic site factory
Turning Plone into a dynamic site factoryTurning Plone into a dynamic site factory
Turning Plone into a dynamic site factory
 
Seattle javascript game development - Overview
Seattle javascript game development - OverviewSeattle javascript game development - Overview
Seattle javascript game development - Overview
 
HTML5DevConf - Unleash the power of 3D with babylon.js
HTML5DevConf - Unleash the power of 3D with babylon.jsHTML5DevConf - Unleash the power of 3D with babylon.js
HTML5DevConf - Unleash the power of 3D with babylon.js
 
Boulder JS meet up presentation for April 16
Boulder JS meet up presentation for April 16Boulder JS meet up presentation for April 16
Boulder JS meet up presentation for April 16
 
Google I/O 2016 Recap
Google I/O 2016 RecapGoogle I/O 2016 Recap
Google I/O 2016 Recap
 
Ppt
PptPpt
Ppt
 
Coffee script throwdown
Coffee script throwdownCoffee script throwdown
Coffee script throwdown
 
Marmalade: bittersweet experience
Marmalade: bittersweet experienceMarmalade: bittersweet experience
Marmalade: bittersweet experience
 
Microsoft Breeze CA AI Workshop
Microsoft Breeze CA AI WorkshopMicrosoft Breeze CA AI Workshop
Microsoft Breeze CA AI Workshop
 

Similar a Being a Software Engineer at Facebook

Building with JavaScript - write less by using the right tools
Building with JavaScript -  write less by using the right toolsBuilding with JavaScript -  write less by using the right tools
Building with JavaScript - write less by using the right tools
Christian Heilmann
 
It questions
It questionsIt questions
It questions
eramma.s
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
Mohit Jain
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl Workshop
Jesse Vincent
 

Similar a Being a Software Engineer at Facebook (20)

Trusting files (and their formats)
Trusting files (and their formats)Trusting files (and their formats)
Trusting files (and their formats)
 
A tale of two proxies
A tale of two proxiesA tale of two proxies
A tale of two proxies
 
Building with JavaScript - write less by using the right tools
Building with JavaScript -  write less by using the right toolsBuilding with JavaScript -  write less by using the right tools
Building with JavaScript - write less by using the right tools
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
It questions
It questionsIt questions
It questions
 
Even internet computers want to be free: Using Linux and open source software...
Even internet computers want to be free: Using Linux and open source software...Even internet computers want to be free: Using Linux and open source software...
Even internet computers want to be free: Using Linux and open source software...
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
React Conf 17 Recap
React Conf 17 RecapReact Conf 17 Recap
React Conf 17 Recap
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...
 
Using MongoDB and a Relational Database at MongoDB Day
Using MongoDB and a Relational Database at MongoDB DayUsing MongoDB and a Relational Database at MongoDB Day
Using MongoDB and a Relational Database at MongoDB Day
 
2010 Sopac Cosugi
2010 Sopac Cosugi2010 Sopac Cosugi
2010 Sopac Cosugi
 
Building Secure Open & Distributed Social Networks
Building Secure Open & Distributed Social NetworksBuilding Secure Open & Distributed Social Networks
Building Secure Open & Distributed Social Networks
 
3stages Wdn08 V3
3stages Wdn08 V33stages Wdn08 V3
3stages Wdn08 V3
 
Caring for file formats
Caring for file formatsCaring for file formats
Caring for file formats
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl Workshop
 
Making it Work Offline: Current & Future Offline APIs for Web Apps
Making it Work Offline: Current & Future Offline APIs for Web AppsMaking it Work Offline: Current & Future Offline APIs for Web Apps
Making it Work Offline: Current & Future Offline APIs for Web Apps
 
xkcd viewer report
xkcd viewer reportxkcd viewer report
xkcd viewer report
 
Ruby in the Browser - RubyConf 2011
Ruby in the Browser - RubyConf 2011Ruby in the Browser - RubyConf 2011
Ruby in the Browser - RubyConf 2011
 
Design Reviewing The Web
Design Reviewing The WebDesign Reviewing The Web
Design Reviewing The Web
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Being a Software Engineer at Facebook

  • 1. Being a Software Engineer at Facebook 13 November 2013
  • 6. Scale In total: Every DAY: ∗400,000 net new users ∗350 million photos ∗5 billion shares ∗10 billion messages ∗1.2 billion users ∗150 billion friendships ∗250 billion photos ∗1 trillion likes ∗16% of all time spent on Internet ∗1 million users per engineer
  • 7. UIs
  • 8. This is your fault
  • 11. Scope ∗ Machine learning ∗ Big Data ∗ Search and information retrieval ∗ Performance ∗ Hardware ∗ Network ∗ Human-computer interaction ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Web UI Mobile UI Static analysis Compilers Virtual machines Image processing Video processing Datacentre design
  • 12. Fixing a bug tldr: We've found the source for the File Descriptor leaks and we have potential fixes to address the problem. Taking a deeper look at the problem, in order to verify what the real fixes could be, we need to determine the actual destination of where the file descriptors are pointing to. In adb shell you can do so by cd-ing into /proc/<pid>fd and then ls -l to show you the actual destination... A quick glance at the result set reveals that pipes and /dev/ashmem occupies the majority of the open fds, since both items take up almost 50% of all the open fds, these two items are ideal candidates to figure out the fd leaking issue. In short, ashmem stands for Android Shared Memory, and it is used by the Android system to facilitates memory sharing across all processes. Each ashmem registers a shrinker and the shrinker would reclaim the memory when the device is in low memory state, just like jvm, but in the native space. As for pipe in unix world, it is an interprocess channel that places two file descriptors, one for reading and one for writing. My first task is see why the number of pipe fds are building up upon scrolling in newsfeed. When FB4A first started we have around xx open pipes. Scrolling through couple pages will grow the number… To isolate the problem from fb4a, I built the fbsimple app which only contains the newsfeed module and I observed the same behavior. To further isolate the problem, I then turned off image fetching/prefetching to see if the problem is correlated to the image fetching pipeline. Surprisingly, without image fetching I can still see the problem, and I am convinced that the problem affects more than just the image pipeline. My next experiment disabled newsfeed database caching and the same problem persisted, which rules out db access as the main cause of the problem. The only thing left to do is to play around with the network executor. On FB4A by default, we use the HttpClient from Apache to execute all the network requests. Earlier last month, we introduced the SPDY library okHttp as an experiment to replace Apache HttpClient. A quick test reveals that Apache HttpClient is indeed the culprit for the leaking pipes - with the same configuration, okHttp keeps the open fd pipe to around 20 versus 90 with Apache HttpClient. Not only okHttp is better at reusing network connection, it also has better fd management. A sanity check with okhttp enabled in FB4A reveals the same result. Ashmem debugging is rather straight forward - Ashmem is allocated when image fetch is enabled. A deep dive reveals that the FD is only allocated after the bitmap decoding has been called, and I suspected that ashmem has to do with purgeability. To verify, I disabled image cache and instead of relying on a disk file to decode the image, I passed in the http content inputstream directly and used BitmapFactory.decodeStream to decode the image. With this I can confirmed that we are not longer allocating ashmem for decoding because the images are no longer purgeable and lives in the java heap space. However, we ran into the same memory problem with byte decoding experiment, and big images would be black or partially decoded on fb4a. So instead of decoding every images with the stream based approach, I made a quick prototype to have big images (images from single photo stories, multi photos collage) to render with our existing solution and have small images like profile pictures to be rendered with stream and the result looks promising. Scrolling through the list of 1000 people in the flyout now would not grow the number of open fds. I think this hybrid approach would work. With the combined approach as stated above, FB4A now stays around half the open fds.
  • 14. Questions? ∗ Engineer: tnicholas@fb.com ∗ Recruiter (£££): ruth@fb.com