SlideShare una empresa de Scribd logo
1 de 24
Project group knowAAN
   Final presentation


 Computer Science Education Group
     University of Paderborn


     October 20th 2011
Overview



Overview



    Introduction
    System components & Work flow
    Demonstration
    Development process
    Summary & Outlook
    Time for further questions of detail




                   PG knowAAN                    2
Overview



Overview: First part



    Goals
    Extraction & Storage (of data)
    Exploration (of data)
    System components & Work flow
    Analysis & Visualization (of data)




                PG knowAAN                     3
Goals



Goals

    Explore research networks
    Based on: Artifacts (scientific publications) and metadata
    Combination and analysis of data
    Computation of similarities of full texts
    Support for conference management system Ginkgo
    Data visualization
    Recommendations

              (Source: PG knowAAN project description)



                 PG knowAAN                                        4
Goals


Imagine you are interested in a conference.
You downloaded the papers of 2 or 3 years.
  Now you have nearly 100 publications.
       How do you explore them?




   100 publications. Do you know tools?
      PG knowAAN                                 5
Extraction & Storage



Extraction & Storage




           First step: Extract data and store it.




             PG knowAAN                                               6
Extraction & Storage




PG knowAAN                     7
Exploration



Exploration




               Second step: Explore data.




              PG knowAAN                             8
Exploration



Exploring a conference




             PG knowAAN            9
Exploration



Exploration




      Which extracted data is available for a publication?
                     → Database schema




                PG knowAAN                                           10
discipline                                     pub_dis                           pub_aff                                                                             affiliation
            id GUID                                        publication_id GUID               publication_id GUID                                                               id GUID
            text VARCHAR(512)                              discipline_id GUID                affiliation_id GUID                                                               text VARCHAR(512)
            parent_id GUID                               Indexes                           Indexes                                                                             location_id GUID
                                                                                                                                           aut_aff
           Indexes                                                                                                                                                            Indexes
                                                                                                                                         author_id GUID
                                                                                                                                         affiliation_id GUID
                                                                                                                                        Indexes
                                    pub_key                           publication
   keyword                        publication_id GUID               id GUID
 id GUID                          keyword_id GUID                   lucuid VARCHAR(512)
 text VARCHAR(512)                score DOUBLE                      title VARCHAR(512)                                                         author
                                                                                                                   pub_aut
Indexes                           source VARCHAR(512)               booktitle VARCHAR(512)                                                   id GUID
                                                                                                              publication_id GUID
                                 Indexes                            normtitle VARCHAR(512)                                                   text VARCHAR(512)
                                                                                                              author_id GUID                                                       location
                                                                    date VARCHAR(512)                                                        normtext VARCHAR(512)
                                                                                                           Indexes                                                             id GUID
                                    pub_con                         editor VARCHAR(512)                                                      firstname VARCHAR(512)
                                                                                                                                                                               latitude DOUBLE
   concept                        publication_id GUID               journal VARCHAR(512)                                                     lastname VARCHAR(512)
                                                                                                                                                                               longitude DOUBLE
 id GUID                          concept_id GUID                   note VARCHAR(512)                              citation                  created BIGINT
                                                                                                                                                                               text VARCHAR(512)
 text VARCHAR(512)                score DOUBLE                      pages VARCHAR(512)                        publication1_id GUID           modified BIGINT
                                                                                                                                                                              Indexes
Indexes                           source VARCHAR(512)               publisher VARCHAR(512)                                                 Indexes
                                                                                                              publication2_id GUID
                                 Indexes                            tech VARCHAR(512)                      Indexes
                                                                    volume VARCHAR(512)
                                    pub_cat                         number VARCHAR(512)
                                                                                                                                                          aut_add
   category                       publication_id GUID               rawstring VARCHAR(4096)                        pub_add
                                                                                                                                                        author_id GUID
 id GUID                          category_id GUID                  xmlfile VARCHAR(512)                      publication_id GUID
                                                                                                                                                        address_id GUID
 text VARCHAR(512)                score DOUBLE                      pdffile VARCHAR(512)                      address_id GUID
                                                                                                                                                       Indexes
Indexes                           source VARCHAR(512)               topicfile VARCHAR(512)                 Indexes
                                 Indexes                            created BIGINT
                                                                    modified BIGINT
   eventseries                                                    Indexes
                                                                                                                                                                         address
 id GUID
                                                                                                                                                                    id GUID
 text VARCHAR(512)
                                                                                               pub_evt                                                              text VARCHAR(512)
 filepath VARCHAR(512)
                                                                                             publication_id GUID                                                    location_id GUID
Indexes
                                                 event                                       event_id GUID                                                        Indexes

                                              id GUID                                      Indexes
                                              text VARCHAR(512)                                                                     category_count               bib_coupling
            evt_evs                           filepath VARCHAR(512)
           event_id GUID                      predecessor_id GUID                            discipline_count                       concept_count                co_author
           eventseries_id GUID                successor_id GUID
      Indexes                              Indexes                                           evt_pub_aut_count                      keyword_count                co_citation
System components & Work flow



System components & Work flow




           How is our system structured?
                  → Some examples.




            PG knowAAN                                              12
System components & Work flow



Components
                                                      Model                 << component >>
                      << component >>
                          Backend                                            ParscitTrainer


                                   << component >>
    << component >>
                                        Parscit
       Clustering
                                                     WebServices                  << component >>
                                                                            FrontendReferenceExtraction


    << component >>                << component >>
          DB                       TrendDetection

                                                     WebServices            << component >>
                                                                              DocBrowser


    << component >>                << component >>
       Roundtrip                    TF-Component

                                                                     JDBC


    << component >>                << component >>                          << component >>
      PDFToText                                       JDBC
                                   TopicExtraction                             DataBase




    << component >>                << component >>                          << component >>
                                                       WebServices
    Recommendation                   xmlBuilder                                   Solr




                                                       FileSystem           << component >>
                                                                              FileStorage




                              PG knowAAN                                                                  13
DocumentBrowser:              RoundTrip :                  RoundTripExecutor :             PDFToText :            Parscit:       Languagedetection:       Lemmatizer:   NounExtraction:   Solr:   DB:

             a / 1) .addPDF


                                            a / 2) .writeToFS




                                            a / 2) Path


                                            a / 3) .createThread

                                              .submitThread


                                            a / 3)

                   a / 1)




                                                                           b / 1) .run

                                                                         b / 2) .getText


                                                                           b / 2) Text
                                                                                 b / 3) .ParseFullText


                                                                                    b / 3) ParscitXML




                                                                            b / 4) .extractBodyAndAstract




                                                                            b / 4) BodyAndAbstract

                                                                                              b / 5) .getLanguage


                                                                                             b / 5) LanguageString
                                                                                                            b / 6) .lemmatize


                                                                                                         b / 6) LemmatizedText

                                                                                                                    b / 7) .extractNouns


                                                                                                                      b / 7) NounsList
                                                                                                     b / 8) .lemmatizeNounslist


                                                                                                         b / 8) LemmatizedNouns




                                                                            b / 9) .ReduceToTopNouns




                                                                            b / 9) TopNouns


                                                                            b / 10) .writeToFiles




                                                                            b / 10) Paths
                                                                                                                                 b / 11) .addTexts


                                                                                                                                   b / 11) Solrid


                                                                                                                                     b / 12) .addPublication


                                                                                                                                              b / 12)


                                                                           b / 1)
System components & Work flow



Work flow




           PG knowAAN                            15
Analysis & Visualization



Analysis & Visualization




           Third step: Analyze and visualize data.




               PG knowAAN                                                 16
Analysis & Visualization



Analysis of authors




              PG knowAAN                        17
Analysis & Visualization



Analysis of scientific publications




              PG knowAAN                                  18
Demonstration



Demonstration




                            Now: Demo.
           Image: http://www.flickr.com/photos/plaisanter/5525977163/


             PG knowAAN                                                          19
Development process



Technologies




                            Jersey



               PG knowAAN                            20
Development process



Methods of agile software development



     FDD                  XP
                                        Scrum




             PG knowAAN                                  21
Development process



Methods of agile software development




    Weekly meetings
    Sit together (as much as possible)
    Automated building system
    Continuous integration
    Issue tracking


                PG knowAAN                               22
Summary and Outlook



Summary and future work

 Summary
     Integrated processing of scientific papers
     Aggregated visualization of authors, publications and
     events
     Compute various analysis over the data
     Cleaning functionality for automated processed data

 Future work
     Parallelized Clustering
     Additional graphical visualization
     Improve extraction of metadata from PDF files
                 PG knowAAN                                           23
Summary and Outlook



Thank you for your attention




                           Questions?

              PG knowAAN                                24

Más contenido relacionado

Más de Wolfgang Reinhardt

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Wolfgang Reinhardt
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksWolfgang Reinhardt
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Wolfgang Reinhardt
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Wolfgang Reinhardt
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Wolfgang Reinhardt
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsWolfgang Reinhardt
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksWolfgang Reinhardt
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Wolfgang Reinhardt
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Wolfgang Reinhardt
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...Wolfgang Reinhardt
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Wolfgang Reinhardt
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12Wolfgang Reinhardt
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenWolfgang Reinhardt
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksWolfgang Reinhardt
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINWolfgang Reinhardt
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Wolfgang Reinhardt
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBWolfgang Reinhardt
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisWolfgang Reinhardt
 

Más de Wolfgang Reinhardt (20)

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...
 
Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large Groups
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research Networks
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzen
 
FSLN12 Introduction Paderborn
FSLN12 Introduction PaderbornFSLN12 Introduction Paderborn
FSLN12 Introduction Paderborn
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPIN
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPB
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in Unternehmenswikis
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Último (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Final presentation of the project group Knowledge Awareness in Artefact-Actor-Networks (knowAAN)

  • 1. Project group knowAAN Final presentation Computer Science Education Group University of Paderborn October 20th 2011
  • 2. Overview Overview Introduction System components & Work flow Demonstration Development process Summary & Outlook Time for further questions of detail PG knowAAN 2
  • 3. Overview Overview: First part Goals Extraction & Storage (of data) Exploration (of data) System components & Work flow Analysis & Visualization (of data) PG knowAAN 3
  • 4. Goals Goals Explore research networks Based on: Artifacts (scientific publications) and metadata Combination and analysis of data Computation of similarities of full texts Support for conference management system Ginkgo Data visualization Recommendations (Source: PG knowAAN project description) PG knowAAN 4
  • 5. Goals Imagine you are interested in a conference. You downloaded the papers of 2 or 3 years. Now you have nearly 100 publications. How do you explore them? 100 publications. Do you know tools? PG knowAAN 5
  • 6. Extraction & Storage Extraction & Storage First step: Extract data and store it. PG knowAAN 6
  • 8. Exploration Exploration Second step: Explore data. PG knowAAN 8
  • 10. Exploration Exploration Which extracted data is available for a publication? → Database schema PG knowAAN 10
  • 11. discipline pub_dis pub_aff affiliation id GUID publication_id GUID publication_id GUID id GUID text VARCHAR(512) discipline_id GUID affiliation_id GUID text VARCHAR(512) parent_id GUID Indexes Indexes location_id GUID aut_aff Indexes Indexes author_id GUID affiliation_id GUID Indexes pub_key publication keyword publication_id GUID id GUID id GUID keyword_id GUID lucuid VARCHAR(512) text VARCHAR(512) score DOUBLE title VARCHAR(512) author pub_aut Indexes source VARCHAR(512) booktitle VARCHAR(512) id GUID publication_id GUID Indexes normtitle VARCHAR(512) text VARCHAR(512) author_id GUID location date VARCHAR(512) normtext VARCHAR(512) Indexes id GUID pub_con editor VARCHAR(512) firstname VARCHAR(512) latitude DOUBLE concept publication_id GUID journal VARCHAR(512) lastname VARCHAR(512) longitude DOUBLE id GUID concept_id GUID note VARCHAR(512) citation created BIGINT text VARCHAR(512) text VARCHAR(512) score DOUBLE pages VARCHAR(512) publication1_id GUID modified BIGINT Indexes Indexes source VARCHAR(512) publisher VARCHAR(512) Indexes publication2_id GUID Indexes tech VARCHAR(512) Indexes volume VARCHAR(512) pub_cat number VARCHAR(512) aut_add category publication_id GUID rawstring VARCHAR(4096) pub_add author_id GUID id GUID category_id GUID xmlfile VARCHAR(512) publication_id GUID address_id GUID text VARCHAR(512) score DOUBLE pdffile VARCHAR(512) address_id GUID Indexes Indexes source VARCHAR(512) topicfile VARCHAR(512) Indexes Indexes created BIGINT modified BIGINT eventseries Indexes address id GUID id GUID text VARCHAR(512) pub_evt text VARCHAR(512) filepath VARCHAR(512) publication_id GUID location_id GUID Indexes event event_id GUID Indexes id GUID Indexes text VARCHAR(512) category_count bib_coupling evt_evs filepath VARCHAR(512) event_id GUID predecessor_id GUID discipline_count concept_count co_author eventseries_id GUID successor_id GUID Indexes Indexes evt_pub_aut_count keyword_count co_citation
  • 12. System components & Work flow System components & Work flow How is our system structured? → Some examples. PG knowAAN 12
  • 13. System components & Work flow Components Model << component >> << component >> Backend ParscitTrainer << component >> << component >> Parscit Clustering WebServices << component >> FrontendReferenceExtraction << component >> << component >> DB TrendDetection WebServices << component >> DocBrowser << component >> << component >> Roundtrip TF-Component JDBC << component >> << component >> << component >> PDFToText JDBC TopicExtraction DataBase << component >> << component >> << component >> WebServices Recommendation xmlBuilder Solr FileSystem << component >> FileStorage PG knowAAN 13
  • 14. DocumentBrowser: RoundTrip : RoundTripExecutor : PDFToText : Parscit: Languagedetection: Lemmatizer: NounExtraction: Solr: DB: a / 1) .addPDF a / 2) .writeToFS a / 2) Path a / 3) .createThread .submitThread a / 3) a / 1) b / 1) .run b / 2) .getText b / 2) Text b / 3) .ParseFullText b / 3) ParscitXML b / 4) .extractBodyAndAstract b / 4) BodyAndAbstract b / 5) .getLanguage b / 5) LanguageString b / 6) .lemmatize b / 6) LemmatizedText b / 7) .extractNouns b / 7) NounsList b / 8) .lemmatizeNounslist b / 8) LemmatizedNouns b / 9) .ReduceToTopNouns b / 9) TopNouns b / 10) .writeToFiles b / 10) Paths b / 11) .addTexts b / 11) Solrid b / 12) .addPublication b / 12) b / 1)
  • 15. System components & Work flow Work flow PG knowAAN 15
  • 16. Analysis & Visualization Analysis & Visualization Third step: Analyze and visualize data. PG knowAAN 16
  • 17. Analysis & Visualization Analysis of authors PG knowAAN 17
  • 18. Analysis & Visualization Analysis of scientific publications PG knowAAN 18
  • 19. Demonstration Demonstration Now: Demo. Image: http://www.flickr.com/photos/plaisanter/5525977163/ PG knowAAN 19
  • 20. Development process Technologies Jersey PG knowAAN 20
  • 21. Development process Methods of agile software development FDD XP Scrum PG knowAAN 21
  • 22. Development process Methods of agile software development Weekly meetings Sit together (as much as possible) Automated building system Continuous integration Issue tracking PG knowAAN 22
  • 23. Summary and Outlook Summary and future work Summary Integrated processing of scientific papers Aggregated visualization of authors, publications and events Compute various analysis over the data Cleaning functionality for automated processed data Future work Parallelized Clustering Additional graphical visualization Improve extraction of metadata from PDF files PG knowAAN 23
  • 24. Summary and Outlook Thank you for your attention Questions? PG knowAAN 24