SlideShare a Scribd company logo
1 of 20
Download to read offline
Josh Bloom (PI)
       , Justin Higgins, Adam Morgan
“Object”
 Datastream




Transients
Classification
Pipeline

 Classify

   Database

 Broadcast
SASIR              LSST
        SDSS                     PTF / LBL                           (future)         (future)
        stripe-82                    subtraction
      archived data                    pipeline                Survey X Survey Y
                                                               (real-time survey
                                                                  telescope)        (static survey
                                                                                     repository)




                                 Transients
                                 Classification
                                 Pipeline
  Database containing                                                     Classify
                               Broadcast “sources”
        “sources”
• features for a source    • interesting or transient source
                           • include classifications                             Database
• data epochs associated   • include features, context
     with a source                                                       Broadcast
SDSS Stripe 82
        SDSS
        stripe-82
      archived data
                           •   A deep field from the Sloan Digital Sky Survey

                           •   750 Million observation epochs


Transients                 •   ~20 Million “sources” clustered from epochs

                           •   5 colors / filters, 4 years of observations
Classification              •   We used Stripe-82 for testing and development

Pipeline
  Database containing
        “sources”
• features for a source
• data epochs associated
     with a source
Palomar Transient Factory
                    •   Palomar 48” telescope

                    •   100 Mpix, 7.8 sq-deg detector

                    •   ~120s cadence : ~200MB : <100GB/night

                    •   Post subtraction: ~1M difference objects / night

                    •   Post filtering: ~10k difference objects / night
                                         ~100s transient and variable stars



 LBL
subtraction
  pipeline
                T       PTF consortium
                                                           PAIRITEL 1.3m


                C
                P                           Palomar 60”           MDM 1.3m & 2.4m
Next Generation Survey: LSST


                 Large Synoptic Survey
                   Telescope (LSST):
                   1 Gb every 2 seconds

                     106 supernovae/yr
                     105 eclipsing systems
                     107 asteroids...

                      light curves of 800
                     million sources every
                             3 days
Transients Classification Pipeline
                                  “Object”
                                 Datastream




                                   source


                           T
                                 generation




                           C
                                   feature
                                 generation



                           P       source
                                classification
                                                   Database



    Follow-up
telescope observations

                                Broadcast
Parallelized source correlation
                             and classification

                •   Retrieve difference objects

                •   Each difference-object is passed to an IPython client

                •   Each parallel IPython client performs:
                     •   Source creation or correlation with existing sources

                     •   “Feature” generation (or re-generation) for that source

   source            •   Classification of that source
 generation




   feature
 generation




   source
classification
Parallelized source correlation
                             and classification

                •   Realtime TCP runs on 22 dedicated cores

                •   LCOGT’s 96 core beowulf
                     •   non run-time tasks

                     •   Classifier generation


                •   Additional resources: (for future classification work)
                     •   Yahoo! M45 cluster
   source
 generation          •   Amazon EC2 cluster


   feature
 generation




   source
classification
Warehouse of light-curves

•   Need representative light-curves for all science

•   With these we can model each science class

•   We’ve built a warehouse of example light-curves




     TCP-TUTOR                 DotAstro.org
        internal interface        public interface
“Noisifying to the Survey”

•   Well sampled light-curves
     •   Can make good classifiers for well-sampled data.

     •   Don’t immediately make good classifiers for noisy, sparse data.


•   We need classifiers which are trained using:
     •   sampling cadence of our survey

     •   sparseness of our survey data

     •   noise and sensitivity limitations of our instrument


•   We need “Noisification” software which:
     •   Resamples well-sampled light-curves

     •   Outputs noisified sources which are used for generating classifiers
“Noisifying to the Survey”
“Noisifying to the Survey”

•   For PTF:
     •   Code uses PTF pointing and survey observing plans

     •   Occasionally PTF observes using a faster cadence:

           •    7.5 minutes between revisiting an RA, Dec

           •    Faster cadence requires a separate set of noisified light-curves
                and classifiers.


•   Other surveys:
     •   Other pointing and observing plans could be used.

     •   Can generate noisified light-curves for other surveys.

     •   Then we can generate science classifiers for these surveys.
Classifiers
       •    General Classifier
                  Identify:                               Filter out:

•   well sampled (periodic & nonperiodic)       •   poorly subtracted sources

•   interesting sources near known galaxies     •   minor planets / rocks

•   periodic variable science class when        •   cosmic rays
    confidence is high
                                                •   detector defects


       •    Timeseries Classifiers
              •    Weighted combination of WEKA classifiers

                     •    bagged Random Forest classifier using a cost-matrix

                     •    Each classifier trained on different cadenced noisified data

              •    Astronomer crafted classifiers for specific science types

                     •    Microlens, Super Nova
Interesting near-galaxy PTF sources

 • Identified by TCP during end of Aug ‘09
 • Classification triggered by latest epoch
    added to the source
Periodic variable classifiers
                   •     Currently, science classes are determined by combining
                         the weighted probabilities generated by different
                         classification models, for a source.
                                                                                                         ~0.4 day period
~0.14 day period
 RR Lyrae using    •     Each machine-learned classification model is trained using                       RR Lyrae using
                                                                                                            10 epoch
   20 epoch              “noisified” lightcurves which were generated using
                         different parameters.                                                            noisification
  noisification
                                                               ...shows highest classification
                               Clicking on a class for one
                                                                probability sources for that
                               of dozens of ML models...
                                                                        model::class




                     Overplotting of
                                                                                  period-fold plotting
                   period-folded model
                                                                                  probably failed here
                     still needs work



                                            0.1 - 0.17 day period RR Lyrae
                                             using 15 epoch noisification
Evaluating and Combining Classifiers


•   Issues when using multiple classifiers:
      •    How to combine classifiers when using:

            •    weighted classifiers

            •    tree-hierarchy of sub-classifiers

      •    How to generate final classification “probabilities” when using:

         • Widely varying types of classifiers
         • Classifiers which contain sub-classifications & probabilities
•   Evaluate the final combination of classifiers
      •    Classify PTF09xxx user classified sources, determine efficiencies

      •    Classify noisified sources, determine efficiencies
Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

More Related Content

Viewers also liked

Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopCaltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopDan Starr
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
authenticity digital records term essay
authenticity digital records term essayauthenticity digital records term essay
authenticity digital records term essayapogarl
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionAnalog Devices, Inc.
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Current Educational Issue Powerpoint
Current Educational Issue PowerpointCurrent Educational Issue Powerpoint
Current Educational Issue PowerpointCasandraAdams
 
What would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemWhat would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemJoshua Sin
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Mohammad Hijazi
 

Viewers also liked (19)

Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
S E V E N W O N D E R S
S E V E N W O N D E R SS E V E N W O N D E R S
S E V E N W O N D E R S
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopCaltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
authenticity digital records term essay
authenticity digital records term essayauthenticity digital records term essay
authenticity digital records term essay
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solution
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
Current Educational Issue Powerpoint
Current Educational Issue PowerpointCurrent Educational Issue Powerpoint
Current Educational Issue Powerpoint
 
What would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemWhat would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystem
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Culture Of Great India
Culture Of  Great IndiaCulture Of  Great India
Culture Of Great India
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Exacqvision2
Exacqvision2Exacqvision2
Exacqvision2
 
Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)
 
Proxy & CGLIB
Proxy & CGLIBProxy & CGLIB
Proxy & CGLIB
 
News Corp
News CorpNews Corp
News Corp
 

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRLucaCinquini
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013smarru
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkDatabricks
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAbhishek Asthana
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Information Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchInformation Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchKepa J. Rodriguez
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...George Ang
 
Private Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionPrivate Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionJunpei Kawamoto
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911. (20)

Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
Far cry 3
Far cry 3Far cry 3
Far cry 3
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in Java
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Information Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchInformation Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical Research
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
 
Private Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionPrivate Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based Encryption
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

  • 1. Josh Bloom (PI) , Justin Higgins, Adam Morgan
  • 3. SASIR LSST SDSS PTF / LBL (future) (future) stripe-82 subtraction archived data pipeline Survey X Survey Y (real-time survey telescope) (static survey repository) Transients Classification Pipeline Database containing Classify Broadcast “sources” “sources” • features for a source • interesting or transient source • include classifications Database • data epochs associated • include features, context with a source Broadcast
  • 4. SDSS Stripe 82 SDSS stripe-82 archived data • A deep field from the Sloan Digital Sky Survey • 750 Million observation epochs Transients • ~20 Million “sources” clustered from epochs • 5 colors / filters, 4 years of observations Classification • We used Stripe-82 for testing and development Pipeline Database containing “sources” • features for a source • data epochs associated with a source
  • 5. Palomar Transient Factory • Palomar 48” telescope • 100 Mpix, 7.8 sq-deg detector • ~120s cadence : ~200MB : <100GB/night • Post subtraction: ~1M difference objects / night • Post filtering: ~10k difference objects / night ~100s transient and variable stars LBL subtraction pipeline T PTF consortium PAIRITEL 1.3m C P Palomar 60” MDM 1.3m & 2.4m
  • 6. Next Generation Survey: LSST Large Synoptic Survey Telescope (LSST): 1 Gb every 2 seconds 106 supernovae/yr 105 eclipsing systems 107 asteroids... light curves of 800 million sources every 3 days
  • 7. Transients Classification Pipeline “Object” Datastream source T generation C feature generation P source classification Database Follow-up telescope observations Broadcast
  • 8. Parallelized source correlation and classification • Retrieve difference objects • Each difference-object is passed to an IPython client • Each parallel IPython client performs: • Source creation or correlation with existing sources • “Feature” generation (or re-generation) for that source source • Classification of that source generation feature generation source classification
  • 9. Parallelized source correlation and classification • Realtime TCP runs on 22 dedicated cores • LCOGT’s 96 core beowulf • non run-time tasks • Classifier generation • Additional resources: (for future classification work) • Yahoo! M45 cluster source generation • Amazon EC2 cluster feature generation source classification
  • 10. Warehouse of light-curves • Need representative light-curves for all science • With these we can model each science class • We’ve built a warehouse of example light-curves TCP-TUTOR DotAstro.org internal interface public interface
  • 11.
  • 12.
  • 13. “Noisifying to the Survey” • Well sampled light-curves • Can make good classifiers for well-sampled data. • Don’t immediately make good classifiers for noisy, sparse data. • We need classifiers which are trained using: • sampling cadence of our survey • sparseness of our survey data • noise and sensitivity limitations of our instrument • We need “Noisification” software which: • Resamples well-sampled light-curves • Outputs noisified sources which are used for generating classifiers
  • 14. “Noisifying to the Survey”
  • 15. “Noisifying to the Survey” • For PTF: • Code uses PTF pointing and survey observing plans • Occasionally PTF observes using a faster cadence: • 7.5 minutes between revisiting an RA, Dec • Faster cadence requires a separate set of noisified light-curves and classifiers. • Other surveys: • Other pointing and observing plans could be used. • Can generate noisified light-curves for other surveys. • Then we can generate science classifiers for these surveys.
  • 16. Classifiers • General Classifier Identify: Filter out: • well sampled (periodic & nonperiodic) • poorly subtracted sources • interesting sources near known galaxies • minor planets / rocks • periodic variable science class when • cosmic rays confidence is high • detector defects • Timeseries Classifiers • Weighted combination of WEKA classifiers • bagged Random Forest classifier using a cost-matrix • Each classifier trained on different cadenced noisified data • Astronomer crafted classifiers for specific science types • Microlens, Super Nova
  • 17. Interesting near-galaxy PTF sources • Identified by TCP during end of Aug ‘09 • Classification triggered by latest epoch added to the source
  • 18. Periodic variable classifiers • Currently, science classes are determined by combining the weighted probabilities generated by different classification models, for a source. ~0.4 day period ~0.14 day period RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using 10 epoch 20 epoch “noisified” lightcurves which were generated using different parameters. noisification noisification ...shows highest classification Clicking on a class for one probability sources for that of dozens of ML models... model::class Overplotting of period-fold plotting period-folded model probably failed here still needs work 0.1 - 0.17 day period RR Lyrae using 15 epoch noisification
  • 19. Evaluating and Combining Classifiers • Issues when using multiple classifiers: • How to combine classifiers when using: • weighted classifiers • tree-hierarchy of sub-classifiers • How to generate final classification “probabilities” when using: • Widely varying types of classifiers • Classifiers which contain sub-classifications & probabilities • Evaluate the final combination of classifiers • Classify PTF09xxx user classified sources, determine efficiencies • Classify noisified sources, determine efficiencies