SlideShare una empresa de Scribd logo
1 de 19
searching for KDD in MDS standards…
        …the DAME experience
 Marianna Annunziatella, Massimo Brescia, Stefano Cavuoti, Raffaele D’Abrusco, George
     S. Djorgovski, Ciro Donalek, Mauro Garofalo , Marisa Guglielmo, Omar Laurino,
 Giuseppe Longo, Ashish Mahabal, Ettore Mancini, Francesco Manna, Amata Mercurio,
  Alfonso Nocella, Maurizio Paolillo, Luca Pellecchia, Sandro Riccardi, Giovanni Vebber,
                                      Civita Vellucci.

                      Department of Physics – University Federico II – Napoli
     INAF – National Institute of Astrophysics – Capodimonte Astronomical Observatory – Napoli
                       CALTECH – California Institute of Technology - Pasadena
Data Mining (KDD) as the Fourth
      Paradigm Of Science




Definition
DM is the exploration and analysis of large quantities of data in order
to discover meaningful patterns and rules
                 M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The BoK’s Problem
Limited number of problems due to limited number of reliable BoKs

 So far
 • Limited number of BoK (and of limited scope) available
 • Painstaking work for each application (es. spectroscopic redshifts for photometric
    redshifts training)
 • Fine tuning on specific data sets needed (e.g., if you add a band you need to re-train the
    methods)
  • There’s a need of standardization and interoperability between data together
  with DM application


                  Community believes AI/DM methods are black boxes
            You feed in something, and obtain patters, trends, i.e. knowledge….




                         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
What DAME is
DAME Program is a joint effort between University Federico II, INAF-OACN, and Caltech aimed at
implementing (as web applications and services) a scientific gateway for massive data analysis,
exploration and mining, on top of a virtualized distributed computing environment.

                                                                    http://dame.dsf.unina.it/
                                                                    Technical and management info
                                                                    Documents
                                                                    Science cases
                                                                    Newsletters




  http://dame.dsf.unina.it/beta_info.html
  DAMEWARE Web application Beta Version
                         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DM 4-rule virtuous cycle
                                                       •     Virtuous cycle implementation steps:
•  Finding patterns is not enough                             – Transforming data into information
•  Science business must:                                        via:
– Respond to patterns by taking action                             • Hypothesis testing
                                                                   • Profiling
– Turning:
                                                                   • Predictive modeling
         • Data into Information                              – Taking action
         • Information into Action                                 • Model deployment
         • Action into Value                                       • Scoring
• Hence, the Virtuous Cycle of DM:                            – Measurement
                                                                   • Assessing a model’s stability &
                                                                      effectiveness before it is used
             1.    Identify the problem

             2.    Mining data to transform it into actionable information

             3.    Acting on the information

             4.    Measuring the results


                      M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DM: 11-step Methodology
 The four rules reflect into an 11-step exploded strategy, at the base of DAME concept


1.    Translate any opportunity (science case) into DM opportunity (problem)
2.    Select appropriate data
3.    Get to know the data
4.    Create a model set
5.    Fix problems with the data
6.    Transform data to bring information
7.    Build models
8.    Assess models
9.    Deploy models
10.   Assess results
11.   Begin again (GOTO 1)
                       M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Effective DM process break-down




         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Black box Infrastructure
In this scenario DAME (Data Mining & Exploration) project, starting from astrophysics
requirements domain, has investigated the Massive Data Sets (MDS) exploration by
producing a taxonomy of data mining applications (hereinafter called functionalities)
and collected a set of machine learning algorithms (hereinafter called models).

This association functionality-model is made of what we defined "use case", easily
configurable by the user through specific tutorials. At low level, any experiment
launched on the DAME framework, externally configurable through dynamical
interactive web pages, is treated in a standard way, making completely transparent to
the user the specific computing infrastructure used and specific data format given as
input.

So the user doesn’t need to know anything about the computing infrastructure and
almost nothing about the internal mechanisms of the chosen machine learning
model..




                      M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DAME Infrastructure
                                    DR Storage DR Execution

           GRID SE                                         GRID UI                    GRID CE
User & Data Archives                                                                 DM Models Job Execution
(300 TB dedicated)                                                                           (300 multi-core
                                                                                                 processors)




                                                                                        Cloud facilities
                                                                                        16 TB
                                                                                        15 processors

                       M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DAME SW Architecture




    M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Available Services
DAMEWARE Web Application Resource
Main service providing via browser a list of algorithms and tools to configure and launch
experiments as complete workflows (dataset creation, model setup and run, graphical/text
output):
• Functionalities: Regression, Classification, Image Segmentation, Multi-layer Clustering;
• Models: MLP+BP, MLP+GA, SVM, MLP+QNA, K-Means (through KNIME), PPS, SOM, NEXT-II;

VOGCLUSTERS
Web Application for data and text mining on globular clusters;

STraDiWA (Sky Transient Discovery Web Application)
detect variable objects from real or simulated images (under R&D);

WFXT (Wide Field X-Ray Telescope) Transient Calculator
Web service to estimate the number of transient and variable sources that can be detected by
WFXT within the 3 main planned extragalactic surveys, with a given significant threshold;

SDSS (Sloan Digital Sky Survey)
Local mirror website hosting a complete SDSS Data Archive and Exploration System;
                         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
K-Means (through KNIME)




                                                                             KNIME WORKFLOW
                                                                                  Offline
                                                                                  creation
   OUTPUT

                               DM PLUG-IN COMPONENT

                                                                                   Offline
              EXECUTION
                                                                  Offline          creation
                                                                  creation




                                                          DMM API COMPONENT
CLOUD EXE/STORAGE ENVIRONMENT
               M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Web 2.0 Features in DAME
Web 2.0? It is a system that breaks with the old model of centralized Web sites
and moves the power of the Web/Internet to the desktop. [J. Robb]
the Web becomes a universal, standards-based integration platform. [S. Dietzen]




                     M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
VO Interoperability scenarios
DA1       SAMP                Full interoperability between DA (Desktop Applications)
                              Local user desktop fully involved (requires computing
           DA2                power)


                              Full WA  DA interoperability
          WSC                 Partial DA  WA interoperability (such as remote file
 DA
                              storing)
                              MDS must be moved between local and remote apps
MASSIVE    WA
                              Local user desktop partially involved (requires minor
 DATA
                              computing and storage power)
 SETS

                              Except from URI exchange, no standard interoperabilty
WA1       URI?
                              Different accounting policy
                              MDS must be moved between remote apps (but larger
MASSIVE    WA2
                              bandwidth)
 DATA
                              No local computing power required
 SETS
                 M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Our vision: improving aspects
                                    DAs has to become WAs
                                    Unique accounting policy (google/Microsoft like)
   WA1           plugins
                                    To overcome MDS flow apps must be plug&play (e.g.
                                    any WAx feature should be pluggable in WAy on
                   WA2              demand)
                                    No local computing power required. Also smartphones
                                    can run VO apps


                            Requirements
• Standard accounting system;
• No more MDS moving on the web, but just moving Apps, structured as plugin
repositories and execution environments;
• standard modeling of WA and components to obtain the maximum level of granularity;
• Evolution of SAMP architecture to extend web interoperability (in particular for the
migration of the plugins);



                       M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Our vision: plugin granularity flow

    WAx                                                                     WAy
   Px-1                                                                    Py-1

   Px-2                                                                    Py-2

   Px-3                                                                    Py-…

   Px-…                                                                    Py-n

   Px-n                                                                    Px-3
                          3. Way execute Px-3



          This scheme could be iterated and extended
              involving all standardized web apps



             M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Lernaean Hydra VO KDD App




      M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Lernaean Hydra VO KDD App
After a certain number of such iterations…

                               The VO KDD App scenario
         WAx                         will become:                                   WAy
                              No different WAs, but simply
       Px-1                    one WA with several sites                           Py-1
                             (eventually with different GUIs
       Px-2                                                                        Py-2
                             and computing environments)
       Px-3                     All WA sites can become a                          Py-…
                                mirror site of all the others
       Px-…                                                                        Py-n

       Px-n                   The synchronization of plugin
                                                                                   Px-1
                                releases between WAs is
       Py-1                    performed at request time                           Px-2

       Py-2                   Minimization of data exchange                        Px-3
                             flow (just few plugins in case of
       Py-…                     synchronization between                            Px-…
                                          mirrors)
       Py-n                                                                        Px-n
                     M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Conclusions
DAME was not originally conceived (for the lack of suitable standards) to
be interoperable with the VO, but offers a good benchmark to plan for
the future developments of KDD on MDS in a VO environment.

1. DAME is just an example of what new ICT (Web 2.0) can do for A&A
KDD problems.
2. A new vision of the KDD App approach, suitable for VO must be based
on the minimization of data transfer and maximization of
interoperability within the VO community.
3. If implemented, the new scheme can reach a wider science
community by giving the opportunity to share data and apps worldwide,
without any particular infrastructure requirements (i.e. by using a
simple smartphone with a low-band connection).
 DAME group is currently involved in the definition of standards and rules and is working to
 modify and adapt the present infrastructure to become compliant with the VO.
                        M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Más contenido relacionado

Destacado

Design Domicile Showcase
Design Domicile ShowcaseDesign Domicile Showcase
Design Domicile ShowcaseMichael M Grant
 
0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)LeNS_slide
 
Steve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital LibrarySteve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital Libraryscottsayre
 
Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030INAF-OAC
 
Citizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction DesignCitizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction Designsbisker
 
Cadei - input2012
Cadei - input2012Cadei - input2012
Cadei - input2012INPUT 2012
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Destacado (9)

Design Domicile Showcase
Design Domicile ShowcaseDesign Domicile Showcase
Design Domicile Showcase
 
0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)
 
Steve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital LibrarySteve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital Library
 
Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030
 
C2C Network Good Practice Handbook
C2C Network Good Practice HandbookC2C Network Good Practice Handbook
C2C Network Good Practice Handbook
 
Citizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction DesignCitizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction Design
 
Bresciaclass Report Long
Bresciaclass Report LongBresciaclass Report Long
Bresciaclass Report Long
 
Cadei - input2012
Cadei - input2012Cadei - input2012
Cadei - input2012
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similar a Dame ivoa interop_brescia_naples2011

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Ectel nods v2
Ectel nods v2Ectel nods v2
Ectel nods v2nodenot
 
A Cloud Multimedia Platform
A Cloud Multimedia PlatformA Cloud Multimedia Platform
A Cloud Multimedia PlatformDejan Kovachev
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationNikolay Milovanov
 
Mmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokiaMmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokiaRufael Mekuria
 
Image transformation using grid(synopsis)
Image transformation using grid(synopsis)Image transformation using grid(synopsis)
Image transformation using grid(synopsis)Mumbai Academisc
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfFörderverein Technische Fakultät
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Zsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approachZsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approachzslmarketing
 
Population Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself TechnologyPopulation Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself TechnologyTokyo University of Science
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdf1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdfarchurssu
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...Paolo Nesi
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitaebutest
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentSafayet Hossain
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 

Similar a Dame ivoa interop_brescia_naples2011 (20)

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Work Package 3 - Month 6 by Christian Morbidoni
Work Package 3 - Month 6 by Christian MorbidoniWork Package 3 - Month 6 by Christian Morbidoni
Work Package 3 - Month 6 by Christian Morbidoni
 
Ectel nods v2
Ectel nods v2Ectel nods v2
Ectel nods v2
 
2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg
 
A Cloud Multimedia Platform
A Cloud Multimedia PlatformA Cloud Multimedia Platform
A Cloud Multimedia Platform
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformation
 
Dynamic formation of the distributed micro clouds
Dynamic formation of the distributed micro cloudsDynamic formation of the distributed micro clouds
Dynamic formation of the distributed micro clouds
 
Mmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokiaMmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokia
 
Image transformation using grid(synopsis)
Image transformation using grid(synopsis)Image transformation using grid(synopsis)
Image transformation using grid(synopsis)
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Zsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approachZsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approach
 
Simms-fsci-madmps-2017
Simms-fsci-madmps-2017Simms-fsci-madmps-2017
Simms-fsci-madmps-2017
 
Population Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself TechnologyPopulation Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself Technology
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdf1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdf
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Dame ivoa interop_brescia_naples2011

  • 1. searching for KDD in MDS standards… …the DAME experience Marianna Annunziatella, Massimo Brescia, Stefano Cavuoti, Raffaele D’Abrusco, George S. Djorgovski, Ciro Donalek, Mauro Garofalo , Marisa Guglielmo, Omar Laurino, Giuseppe Longo, Ashish Mahabal, Ettore Mancini, Francesco Manna, Amata Mercurio, Alfonso Nocella, Maurizio Paolillo, Luca Pellecchia, Sandro Riccardi, Giovanni Vebber, Civita Vellucci. Department of Physics – University Federico II – Napoli INAF – National Institute of Astrophysics – Capodimonte Astronomical Observatory – Napoli CALTECH – California Institute of Technology - Pasadena
  • 2. Data Mining (KDD) as the Fourth Paradigm Of Science Definition DM is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 3. The BoK’s Problem Limited number of problems due to limited number of reliable BoKs So far • Limited number of BoK (and of limited scope) available • Painstaking work for each application (es. spectroscopic redshifts for photometric redshifts training) • Fine tuning on specific data sets needed (e.g., if you add a band you need to re-train the methods) • There’s a need of standardization and interoperability between data together with DM application Community believes AI/DM methods are black boxes You feed in something, and obtain patters, trends, i.e. knowledge…. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 4. What DAME is DAME Program is a joint effort between University Federico II, INAF-OACN, and Caltech aimed at implementing (as web applications and services) a scientific gateway for massive data analysis, exploration and mining, on top of a virtualized distributed computing environment. http://dame.dsf.unina.it/ Technical and management info Documents Science cases Newsletters http://dame.dsf.unina.it/beta_info.html DAMEWARE Web application Beta Version M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 5. DM 4-rule virtuous cycle • Virtuous cycle implementation steps: • Finding patterns is not enough – Transforming data into information • Science business must: via: – Respond to patterns by taking action • Hypothesis testing • Profiling – Turning: • Predictive modeling • Data into Information – Taking action • Information into Action • Model deployment • Action into Value • Scoring • Hence, the Virtuous Cycle of DM: – Measurement • Assessing a model’s stability & effectiveness before it is used 1. Identify the problem 2. Mining data to transform it into actionable information 3. Acting on the information 4. Measuring the results M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 6. DM: 11-step Methodology The four rules reflect into an 11-step exploded strategy, at the base of DAME concept 1. Translate any opportunity (science case) into DM opportunity (problem) 2. Select appropriate data 3. Get to know the data 4. Create a model set 5. Fix problems with the data 6. Transform data to bring information 7. Build models 8. Assess models 9. Deploy models 10. Assess results 11. Begin again (GOTO 1) M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 7. Effective DM process break-down M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 8. The Black box Infrastructure In this scenario DAME (Data Mining & Exploration) project, starting from astrophysics requirements domain, has investigated the Massive Data Sets (MDS) exploration by producing a taxonomy of data mining applications (hereinafter called functionalities) and collected a set of machine learning algorithms (hereinafter called models). This association functionality-model is made of what we defined "use case", easily configurable by the user through specific tutorials. At low level, any experiment launched on the DAME framework, externally configurable through dynamical interactive web pages, is treated in a standard way, making completely transparent to the user the specific computing infrastructure used and specific data format given as input. So the user doesn’t need to know anything about the computing infrastructure and almost nothing about the internal mechanisms of the chosen machine learning model.. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 9. DAME Infrastructure DR Storage DR Execution GRID SE GRID UI GRID CE User & Data Archives DM Models Job Execution (300 TB dedicated) (300 multi-core processors) Cloud facilities 16 TB 15 processors M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 10. DAME SW Architecture M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 11. The Available Services DAMEWARE Web Application Resource Main service providing via browser a list of algorithms and tools to configure and launch experiments as complete workflows (dataset creation, model setup and run, graphical/text output): • Functionalities: Regression, Classification, Image Segmentation, Multi-layer Clustering; • Models: MLP+BP, MLP+GA, SVM, MLP+QNA, K-Means (through KNIME), PPS, SOM, NEXT-II; VOGCLUSTERS Web Application for data and text mining on globular clusters; STraDiWA (Sky Transient Discovery Web Application) detect variable objects from real or simulated images (under R&D); WFXT (Wide Field X-Ray Telescope) Transient Calculator Web service to estimate the number of transient and variable sources that can be detected by WFXT within the 3 main planned extragalactic surveys, with a given significant threshold; SDSS (Sloan Digital Sky Survey) Local mirror website hosting a complete SDSS Data Archive and Exploration System; M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 12. K-Means (through KNIME) KNIME WORKFLOW Offline creation OUTPUT DM PLUG-IN COMPONENT Offline EXECUTION Offline creation creation DMM API COMPONENT CLOUD EXE/STORAGE ENVIRONMENT M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 13. Web 2.0 Features in DAME Web 2.0? It is a system that breaks with the old model of centralized Web sites and moves the power of the Web/Internet to the desktop. [J. Robb] the Web becomes a universal, standards-based integration platform. [S. Dietzen] M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 14. VO Interoperability scenarios DA1 SAMP Full interoperability between DA (Desktop Applications) Local user desktop fully involved (requires computing DA2 power) Full WA  DA interoperability WSC Partial DA  WA interoperability (such as remote file DA storing) MDS must be moved between local and remote apps MASSIVE WA Local user desktop partially involved (requires minor DATA computing and storage power) SETS Except from URI exchange, no standard interoperabilty WA1 URI? Different accounting policy MDS must be moved between remote apps (but larger MASSIVE WA2 bandwidth) DATA No local computing power required SETS M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 15. Our vision: improving aspects DAs has to become WAs Unique accounting policy (google/Microsoft like) WA1 plugins To overcome MDS flow apps must be plug&play (e.g. any WAx feature should be pluggable in WAy on WA2 demand) No local computing power required. Also smartphones can run VO apps Requirements • Standard accounting system; • No more MDS moving on the web, but just moving Apps, structured as plugin repositories and execution environments; • standard modeling of WA and components to obtain the maximum level of granularity; • Evolution of SAMP architecture to extend web interoperability (in particular for the migration of the plugins); M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 16. Our vision: plugin granularity flow WAx WAy Px-1 Py-1 Px-2 Py-2 Px-3 Py-… Px-… Py-n Px-n Px-3 3. Way execute Px-3 This scheme could be iterated and extended involving all standardized web apps M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 17. The Lernaean Hydra VO KDD App M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 18. The Lernaean Hydra VO KDD App After a certain number of such iterations… The VO KDD App scenario WAx will become: WAy No different WAs, but simply Px-1 one WA with several sites Py-1 (eventually with different GUIs Px-2 Py-2 and computing environments) Px-3 All WA sites can become a Py-… mirror site of all the others Px-… Py-n Px-n The synchronization of plugin Px-1 releases between WAs is Py-1 performed at request time Px-2 Py-2 Minimization of data exchange Px-3 flow (just few plugins in case of Py-… synchronization between Px-… mirrors) Py-n Px-n M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 19. Conclusions DAME was not originally conceived (for the lack of suitable standards) to be interoperable with the VO, but offers a good benchmark to plan for the future developments of KDD on MDS in a VO environment. 1. DAME is just an example of what new ICT (Web 2.0) can do for A&A KDD problems. 2. A new vision of the KDD App approach, suitable for VO must be based on the minimization of data transfer and maximization of interoperability within the VO community. 3. If implemented, the new scheme can reach a wider science community by giving the opportunity to share data and apps worldwide, without any particular infrastructure requirements (i.e. by using a simple smartphone with a low-band connection). DAME group is currently involved in the definition of standards and rules and is working to modify and adapt the present infrastructure to become compliant with the VO. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011