SlideShare una empresa de Scribd logo
1 de 25
Architecture of PBS.org
DCPython - June 7, 2011
PBS is…
• PBS is a national federation of independently owned and
operated public television stations and producers
– Each with their own management and development resources
• 1500+ highly trafficked websites:
– http://www.pbs.org/
– http://www.pbs.org/nova/
– http://pbskids.org/
– http://pbskids.org/sesame/
– http://video.pbs.org/
• Enterprise services/APIs
PBS is not!
• We do television dammit!
• Or any of the other ~200 local stations.
What we do
• Technology leadership within public
broadcasting community
• Distribution of national programming content
• Services to local stations
• Core application development. Yeah!!!
A few of our sites
History of PBS.org
Early 1990’s: Hand rolled static html
Late 1990’s: Hand crafted static html + CGI!
Most of 2000’s: Zope/Plone CMS generated static html
2008-10: Django generated static html
Launched Oct 2010: Django all the way
COVE API
• Contains the metadata for all PBS videos online
including pointers to streaming video
• Needed to be:
– Secure
– Fast
– Scalable
COVE API – Technology Stack
• Amazon Elastic Cluster Computing (EC2)
• Amazon Relational Database Service (RDS)
• Linux
• Python
• Django
• Piston for REST API
COVE API - Architecture
Internet
Elastic Load Balancer
Auto Scale Array
App Server 1 App Server N…
HA Proxy
RDS Master RDS Slave 1
RDS Slave 1
RDS Slave 1
App Sync Server
S3
Backups
COVE API – Management Tools
• Amazon Web Service Console
• RightScale
• Splunk
COVE API – Interesting Stuff
• Easy to load test
– Duplicate environment for several days
• Easy to scale
– Autoscale array grows automatically
• Easy to upgrade
– Each server built from vanilla base
COVE API – Lessons learned
• Use normalized data for administration and de-
normalized data for API
COVE API – Lessons learned
• Piston is fine, but lacks flexibility without
significant customization
– TastyPie?
• JSON is probably good enough
• Don’t get fancy with your endpoints
• Stick to REST principles
• Don’t get fancy with your authentication
– Use OAuth2 or simple token
PBS.org and Merlin API
• PBS.org
– Slim, fast layer
– Pulls data from Merlin API
– Uses memcache extensively
– Currently Django, but could be anything (Flask?)
• Merlin API
– Aggregate content from distributed CMSes
– Expose via standardized API
– Power PBS.org and more
Merlin API – Technology stack
• Python
• Django
• MySQL
• Piston
• Solr
• Celery
• RabbitMQ
• Amazon Web Services (“cloud”)
– EC2
– RDS - Relational Database Service
– ELB - Elastic Load Balancing
– Cloudfront CDN
– S3 Storage
Data flow
RSS Feed
Ingestor
Standardized
API
Merlin API architecture
API Endpoint – Django Piston
Search service
Django-haystack
Indexing service
Solr
Data layer – MySQL (RDS)
Administration
Django admin
Feed ingestion
Celery
Merlin API server topology
Elastic Load Balancer
Internet
S3 backups
Celery
Master
DB RDS
Solr
Index
App #N
App #N
App #N
App #n
Autoscaling
array
Merlin API – Management Tools
• Amazon Web Service Console
• RightScale
• Splunk
API - Piston/Haystack/Solr
class WebObjectIndexHandler(BaseHandler):
...
def get_queryset(self):
...
return PistonSearchQuerySet().models(*models)
from haystack.query import SearchQuerySet
class PistonSearchQuerySet(SearchQuerySet):
...
def __getitem__(self, k):
...
return [IndexSerializer(i) for i in
super(PistonSearchQuerySet, self).__getitem__(k)]
Feed ingestor - Celery
from celery.decorators import task, periodic_task
@periodic_task(run_every=timedelta(seconds=300))
def update_webobject_states():
...
solr_visible = WebObject.children.filter(visible=True)
solr_visible = solr_visible.exclude(
flag__api_visible=True, available__isnull=True)
...
updated = solr_visible.update(visible=False,
is_indexed = False)
...
signals.bulk_update.send('tasks.update_webobject_states')
Merlin API - Lessons learned
• Memcached was not necessary
• Denormalized search data via Solr index is much faster
than querying database
• Asynchronous task delegation is awesome
• Celery prone to memory leaks
• App server array for easy horizontal scaling
– Even if not autoscaling, increase min servers
• Never trust data you don’t control (validate!)
Resources
• http://lucene.apache.org/solr/
• http://haystacksearch.org/
• http://celeryproject.org/
• http://celeryproject.org/docs/django-celery/
• http://aws.amazon.com/
PBS Developer Community
• Dedicated to making open.PBS the industry
standard in open development communities.
http://open.pbs.org/
https://github.com/pbs
open@pbs.org
Questions?
Drew Engelson
drew@engelson.net
http://tomatohater.com
Edgar Roman
emroman@pbs.org

Más contenido relacionado

La actualidad más candente

La actualidad más candente (10)

Rails 5 subjective overview
Rails 5 subjective overviewRails 5 subjective overview
Rails 5 subjective overview
 
Rails - getting started
Rails - getting startedRails - getting started
Rails - getting started
 
RPKI Overview, Case Studies, Deployment and Operations
RPKI Overview, Case Studies, Deployment and OperationsRPKI Overview, Case Studies, Deployment and Operations
RPKI Overview, Case Studies, Deployment and Operations
 
Ruby on Rails from an ASP.NET Perspective
Ruby on Rails from an ASP.NET PerspectiveRuby on Rails from an ASP.NET Perspective
Ruby on Rails from an ASP.NET Perspective
 
LINX97 - Exascale Member Talk
LINX97 - Exascale Member TalkLINX97 - Exascale Member Talk
LINX97 - Exascale Member Talk
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing play
 
What’s New in Rails 5.0?
What’s New in Rails 5.0?What’s New in Rails 5.0?
What’s New in Rails 5.0?
 
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
 
Integrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and CamelIntegrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and Camel
 
Spotify services (SDC 2013)
Spotify services (SDC 2013)Spotify services (SDC 2013)
Spotify services (SDC 2013)
 

Similar a DCPython: Architecture at PBS (Jun 7, 2011)

Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
RX-M Enterprises LLC
 
O365Engage17 - How to Automate SharePoint Provisioning with PNP Framework
O365Engage17 - How to Automate SharePoint Provisioning with PNP FrameworkO365Engage17 - How to Automate SharePoint Provisioning with PNP Framework
O365Engage17 - How to Automate SharePoint Provisioning with PNP Framework
NCCOMMS
 

Similar a DCPython: Architecture at PBS (Jun 7, 2011) (20)

Architecture at PBS
Architecture at PBSArchitecture at PBS
Architecture at PBS
 
Api FUNdamentals #MHA2017
Api FUNdamentals #MHA2017Api FUNdamentals #MHA2017
Api FUNdamentals #MHA2017
 
APIs.JSON: Bootstrapping The Web of APIs
APIs.JSON: Bootstrapping The Web of APIsAPIs.JSON: Bootstrapping The Web of APIs
APIs.JSON: Bootstrapping The Web of APIs
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
Api fundamentals
Api fundamentalsApi fundamentals
Api fundamentals
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
A high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSA high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTS
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
 
Alfresco Day Vienna 2015 - Technical Track - REST API of the Future
Alfresco Day Vienna 2015 - Technical Track - REST API of the FutureAlfresco Day Vienna 2015 - Technical Track - REST API of the Future
Alfresco Day Vienna 2015 - Technical Track - REST API of the Future
 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
 
AppScale @ LA.rb
AppScale @ LA.rbAppScale @ LA.rb
AppScale @ LA.rb
 
Agile Deployment using Git and AWS Elastic Beanstalk
Agile Deployment using Git and AWS Elastic BeanstalkAgile Deployment using Git and AWS Elastic Beanstalk
Agile Deployment using Git and AWS Elastic Beanstalk
 
Building Modern Digital Services on Scalable Private Government Infrastructur...
Building Modern Digital Services on Scalable Private Government Infrastructur...Building Modern Digital Services on Scalable Private Government Infrastructur...
Building Modern Digital Services on Scalable Private Government Infrastructur...
 
How to automate the SharePoint Provisioning
How to automate the SharePoint Provisioning How to automate the SharePoint Provisioning
How to automate the SharePoint Provisioning
 
Build Modern Web Apps Using ASP.NET Web API and AngularJS
Build Modern Web Apps Using ASP.NET Web API and AngularJSBuild Modern Web Apps Using ASP.NET Web API and AngularJS
Build Modern Web Apps Using ASP.NET Web API and AngularJS
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
O365Engage17 - How to Automate SharePoint Provisioning with PNP Framework
O365Engage17 - How to Automate SharePoint Provisioning with PNP FrameworkO365Engage17 - How to Automate SharePoint Provisioning with PNP Framework
O365Engage17 - How to Automate SharePoint Provisioning with PNP Framework
 
Building Content-Rich Java Apps in the Cloud with the Alfresco API
Building Content-Rich Java Apps in the Cloud with the Alfresco APIBuilding Content-Rich Java Apps in the Cloud with the Alfresco API
Building Content-Rich Java Apps in the Cloud with the Alfresco API
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to Espresso
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

DCPython: Architecture at PBS (Jun 7, 2011)

  • 2. PBS is… • PBS is a national federation of independently owned and operated public television stations and producers – Each with their own management and development resources • 1500+ highly trafficked websites: – http://www.pbs.org/ – http://www.pbs.org/nova/ – http://pbskids.org/ – http://pbskids.org/sesame/ – http://video.pbs.org/ • Enterprise services/APIs
  • 3. PBS is not! • We do television dammit! • Or any of the other ~200 local stations.
  • 4. What we do • Technology leadership within public broadcasting community • Distribution of national programming content • Services to local stations • Core application development. Yeah!!!
  • 5. A few of our sites
  • 6. History of PBS.org Early 1990’s: Hand rolled static html Late 1990’s: Hand crafted static html + CGI! Most of 2000’s: Zope/Plone CMS generated static html 2008-10: Django generated static html Launched Oct 2010: Django all the way
  • 7. COVE API • Contains the metadata for all PBS videos online including pointers to streaming video • Needed to be: – Secure – Fast – Scalable
  • 8. COVE API – Technology Stack • Amazon Elastic Cluster Computing (EC2) • Amazon Relational Database Service (RDS) • Linux • Python • Django • Piston for REST API
  • 9. COVE API - Architecture Internet Elastic Load Balancer Auto Scale Array App Server 1 App Server N… HA Proxy RDS Master RDS Slave 1 RDS Slave 1 RDS Slave 1 App Sync Server S3 Backups
  • 10. COVE API – Management Tools • Amazon Web Service Console • RightScale • Splunk
  • 11. COVE API – Interesting Stuff • Easy to load test – Duplicate environment for several days • Easy to scale – Autoscale array grows automatically • Easy to upgrade – Each server built from vanilla base
  • 12. COVE API – Lessons learned • Use normalized data for administration and de- normalized data for API
  • 13. COVE API – Lessons learned • Piston is fine, but lacks flexibility without significant customization – TastyPie? • JSON is probably good enough • Don’t get fancy with your endpoints • Stick to REST principles • Don’t get fancy with your authentication – Use OAuth2 or simple token
  • 14. PBS.org and Merlin API • PBS.org – Slim, fast layer – Pulls data from Merlin API – Uses memcache extensively – Currently Django, but could be anything (Flask?) • Merlin API – Aggregate content from distributed CMSes – Expose via standardized API – Power PBS.org and more
  • 15. Merlin API – Technology stack • Python • Django • MySQL • Piston • Solr • Celery • RabbitMQ • Amazon Web Services (“cloud”) – EC2 – RDS - Relational Database Service – ELB - Elastic Load Balancing – Cloudfront CDN – S3 Storage
  • 17. Merlin API architecture API Endpoint – Django Piston Search service Django-haystack Indexing service Solr Data layer – MySQL (RDS) Administration Django admin Feed ingestion Celery
  • 18. Merlin API server topology Elastic Load Balancer Internet S3 backups Celery Master DB RDS Solr Index App #N App #N App #N App #n Autoscaling array
  • 19. Merlin API – Management Tools • Amazon Web Service Console • RightScale • Splunk
  • 20. API - Piston/Haystack/Solr class WebObjectIndexHandler(BaseHandler): ... def get_queryset(self): ... return PistonSearchQuerySet().models(*models) from haystack.query import SearchQuerySet class PistonSearchQuerySet(SearchQuerySet): ... def __getitem__(self, k): ... return [IndexSerializer(i) for i in super(PistonSearchQuerySet, self).__getitem__(k)]
  • 21. Feed ingestor - Celery from celery.decorators import task, periodic_task @periodic_task(run_every=timedelta(seconds=300)) def update_webobject_states(): ... solr_visible = WebObject.children.filter(visible=True) solr_visible = solr_visible.exclude( flag__api_visible=True, available__isnull=True) ... updated = solr_visible.update(visible=False, is_indexed = False) ... signals.bulk_update.send('tasks.update_webobject_states')
  • 22. Merlin API - Lessons learned • Memcached was not necessary • Denormalized search data via Solr index is much faster than querying database • Asynchronous task delegation is awesome • Celery prone to memory leaks • App server array for easy horizontal scaling – Even if not autoscaling, increase min servers • Never trust data you don’t control (validate!)
  • 23. Resources • http://lucene.apache.org/solr/ • http://haystacksearch.org/ • http://celeryproject.org/ • http://celeryproject.org/docs/django-celery/ • http://aws.amazon.com/
  • 24. PBS Developer Community • Dedicated to making open.PBS the industry standard in open development communities. http://open.pbs.org/ https://github.com/pbs open@pbs.org