SlideShare una empresa de Scribd logo
1 de 30
INFORMATION MANAGEMENT



                    Semantic Document Architecture for Desktop
                    Data Integration And Management


Place image here
                      November 30, 2010


                      Saša Nešić
                      PhD Dissertation Defense
Motivation



   Semantic Web



  Semantic Desktop

                       Ontologies
                       Resource Description Framework (RDF)
                       SPARQL query language
 Semantic Documents




                                                               2
Semantic Documents


Semantic document are composite information resources composed of data/information
units that are:


    uniquely identified by globally unique URIs,
    semantically annotated by concepts from domain ontologies,
    interlinked with other data/information units via explicit semantic links .




                                                                                     3
Thesis Statement


  “Semantic documents integrate desktop data into a unified desktop information
space and enable desktop data to be integrated into a unified information space of
                             social communities”




     Improving the Effectiveness and Efficiency of Desktop Users




                                                                                     4
Outline

  Motivation

  Semantic Document Model - SDM

  Semantic Document Architecture - SDArch

  Prototype

  Thesis Validation

  Conclusions
Semantic Document Model
                                                  Semantic-Linking Part
                                                  Change-Tracking Part
                                                    Annotation Part
                                                       Core Part


           Core Part              Annotation Part            Semantic-Linking Part               Change-Tracking Part




- document unit types        - annotation types              - semantic linking interface       - types of doc. unit changes

- structural relationships   - annotation interface                                             - change-tracking interface
- identification

- binary content linking




                                                                                Annals of Information systems’ 09              6
Machine-Processable and Human-Readable instances of SDM


 MP document representation
   Unique and permanent instance
   HTTP de-referencable URIs
   RDF data format


 HR document representation
   Temporal document instances
   Rendered from the MP instance
   Existing document formats




                                                           7
Outline

  Motivation

  Semantic Document Model - SDM

  Semantic Document Architecture - SDArch

  Prototype

  Thesis Validation

  Conclusions
Semantic Document Architecture - SDArch




                            Annals of Information systems’ 09   9
Semantic Document Authoring, Search, and Navigation


     Concept Exploration Algorithm

   Objective:
     Search Algorithm
  - conceptualization of DU semantics
 Objective:
 Input:
      Search Personalization Algorithm
 - search for semantic document units (DUs)
  - document unit:
 Objective:
 - Input: ontology(ies)
    domain
  - personalization of semantic doc. Search
 - Output:
     a free-text keyword query
 Input:
 - Output: vector:
    concept
  - list of retrieved semantic DUs:
  - a ranked list of semantic DUs
  - list of user preferences
  - concept weight vector:
 Features:
 Output:
   - forming semantic query:
  - re-ranked list of semantic DUs
 Features:
 Features:
  - lexical expansion of concept labels
 - - executingSCA for each DUagainst CI:
    extracting semantic query
  - syntactic concept matching
 - weighting schema for each user preference
  - semantic concept matching
 - ranking DUs based on calculated weights
  - measuring concept relevance

    - measuring similarity between   and



                                               Semantic document authoring service
                                               Semantic document search and navigation service


                                                                                SEKE’ 10         10
                                                                                                 10
Semantic Document Sharing


 SDArch social network
 Publishing only RDFs
 Capturing social-context annotations
 Contributing to:
    Linked Open Data Cloud
    Web of Linked Data
    Semantic Web




                                         ESEC/FSE – SoSEA’09   11
Outline

  Motivation

  Semantic Document Model - SDM

  Semantic Document Architecture - SDArch

  Prototype

  Thesis Valiadtion

  Conclusions
SDArch Prototype

Objectives:                                     Source Code Organization:
                                                Number of services          5
  Validation of SDArch and SDM
                                                Number of .NET assemblies   15
  Enabling experimental evaluation
                                                Number of .NET namespaces   14
  Enabling usability evaluation

Implementation:
 Semantic Document Repository
      Sesame 2 RDF repository
      SemWeb C# Library
      MySQL DB-backed persistent RDF storage
      SPARQL query support
      Full-Text query support (Lucene)
 Services
    WCF Framework

 Tools
    MS Office Add-Ins

                                                                                 13
SemanticDoc - MS Office Add-Ins




                                  ICWE’08   14
Outline

  Motivation

  Semantic Document Model - SDM

  Semantic Document Architecture - SDArch

  Prototype

  Thesis Validation

  Conclusions
Thesis Validation



Q1: How do semantic documents improve information finding and retrieval in
semantically integrated document collections?

     1. Experimental evaluation of Information Retrieval in Semantic Documents


Q2: How do semantic documents facilitate desktop users in completing tasks that
draw data from both a personal desktop and social communities?

     2. Usability evaluation of SDArch Services and Tools




                                                                                  16
                                                                                  16
Experimental Evaluation of Information Retrieval in
 Semantic Documents

 Objectives:

     Measuring effectiveness of the semantic document search
     Measuring effectiveness of the semantic document annotation (indexing)



 Compared approaches:

       Concept-Based Indexing and Search – Simple Syntactic Matching
       Concept-Based Indexing and Search – Lexically Expanded Syntactic Matching
       Full Text Indexing and Search (Lucene)
       Semantic Document Indexing and Search




                                                                               SEMAPRO' 10   17
                                                                                             17
Test Collections

      Mammals of the World                    Metals and Alloys

 MAMO Ontology                      Metals Ontology
   OWL + SKOS                         OWL + SKOS
   Finnish National Museum            Key-To-Metals, Zurich
   ~ 5000 domain concepts             ~ 1800 domain concepts
 Document Set                       Document Set
   Wikipedia – List of Mammals        Key-To-Metals records
   150 articles                       240 Word documents
   2130 semantic document units       3312 semantic document units
 Query Set                          Query Set
    5 queries related to Mammals       5 queries related to Metals and Alloys




                                                                                  18
                                                                                  18
Measuring Effectiveness of the Semantic Document
(Indexing) Annotation


Test collection 1: Mammals of the World

                                            # of syn.   # of sem.   weight of syn.   weight of sem.
                Approach
                                            matches     matches       matches          matches
CB – simple syntactic matching                1524          -            2.56              -

CB – lexically expand. syntactic matching     3182          -            3.62              -

Semantic document indexing and annotation     3182        2437           3.62             2.96



Test collection 2: Metals and Alloys

                                            # of syn.   # of sem.   weight of syn.     weight of
                Approach
                                            matches     matches       matches        sem. matches
CB – simple syntactic matching                2153          -            1.73              -

CB – lexically expand. syntactic matching     2879          -            2.43              -

Semantic document indexing and annotation     2879        1024           2.43             2.14




                                                                                                  19
                                                                                                  19
Measuring Effectiveness of the Semantic Document Search


Test collection: Mammals of the World   Test collection: Metals and alloys




                                                                             20
                                                                             20
Usability Evaluation

Evaluation Hypothesis :

    “Using SDArch results in a more effective, efficient, and satisfactory user
 experience when authoring, exploring (i.e., searching and navigating) and utilizing
                     documents in carrying out daily tasks.”



Usability evaluation criteria :


      User Effectiveness
      User Efficiency
      User Satisfaction




                                                                           ICALT’ 10   21
                                                                                       21
Case Study: Authoring of Course Material


 Participants – SDArch Social Network
       University of Lugano, Switzerland – 7 participants
       Simon Fraser University, Canada – 7 participants
       Athabasca University, Canada – 2 participants
       University of Belgrade, Serbia – 2 participants



 Document Collection
     “Software Design Patterns” – 70 PowerPoint and Word documents


 Evaluation Session
     Task-Based Usability Test
     Follow-up questionnaires




                                                                      22
                                                                      22
Usability Test Use Cases


i. Setting Up the User Profile and the Social Network Properties



ii. Authoring and Publishing Semantic Documents




iii. Searching and Navigating across Semantic Documents

                                        Task        Task objective          Slide

                                         1     Design patterns definition    1

                                         2     Example 1 - definition
                                                                             2
                                         3     Example 1 - illustration

                                         4     Example 2 - definition
                                                                             3
                                         5     Example 2 - illustration


                                                                                    23
Evaluation Methods and Metrics



  Evaluation Criteria          Evaluation Method                Evaluation Metric

1. Effectiveness        Objective - Quantitative Measure   • Task Success Rates

                        Objective - Quantitative Measure   • Task Completion Times
2. Efficiency           “                                  • Number of Mouse Clicks
                        “                                  • Number of Window Switches

3. Satisfaction         Subjective - Questionnaire         • 5-level Likert scale




                                                                                         24
                                                                                         24
1. User Effectiveness

 metric: Task success rate




                   Conventional System                 SDArch System
    Task
              Successful Completions     %     Successful Completions    %
     1                 18              100              18              100

     2                 17              94.44            18              100

     3                 15              83.33            17              94.44

     4                 17              94.44            18              100

     5                 14              77.77            16              88.88




                                                                                25
                                                                                25
2. User Efficiency


metric: Task execution time   metric: Number of mouse clicks   metric: Number of window switches




                                                                                    T-Test results:

                                                                             Task
                                                                             Task           p-value
                                                                                            p-value
                                                                                1
                                                                                1            1.6*10-12
                                                                                              0.00071
                                                                                             0.00004

                                                                                2
                                                                                2           1.22*10-7
                                                                                             0.00011
                                                                                             0.0041

                                                                                3
                                                                                3           6.91*10-8
                                                                                            9.17*10-6
                                                                                             0.00016

                                                                                4
                                                                                4           3.67*10-7
                                                                                             0.00034
                                                                                             0.00009

                                                                                5
                                                                                5           4.82*10-10
                                                                                             2.6*10-6
                                                                                             0.00004


                                                                  If p < 0.05  results are statistically significant




                                                                                                                26
                                                                                                                26
3. User Satisfaction


      metric: 5-level Likert Scale
                                     Internal consistency (reliability) test:

                                        Dimension           Cronbach’s α
                                     Usefulness                  0.85
Strongly
agree                               Ease-of-Use                 0.78

                                     Ease-of-Learning            0.92

                                     Overall Satisfaction        0.83


                                           Recommended α values > 0.75




Strongly
disagree 




                                                                            27
                                                                            27
Outline

  Motivation

  Semantic Document Model - SDM

  Semantic Document Architecture - SDArch

  Prototype

  Thesis Valiadtion

  Conclusions
Conclusions


 Main contributions

      Introducing the Semantic Document Model – SDM
      Designing the Semantic Document Architecture – SDArch
      Providing the SDArch Prototype Implementation
      Experimental and Usability evaluations


 Future directions:

    Document units versioning
    Document units privacy and security
    Decentralized storage of shared semantic documents




                                                               29
Publications
Journals:
 S. Nešić, "Semantic Document Model to Enhance Data and Knowledge Interoperability," Annals of Information Systems - Special
Issue on Semantic Web & Web 2.0, Springer US, pp. 135 – 160, 2009.

Conferences:
 S. Nešić, F. Crestani, D. Gašević , M. Jazayeri, "Search and Navigation in Semantically Integrated Document Collections," 4th
International Conference on Advances in Semantic Processing - SEMAPRO, pp. 123 – 129, Firenze, Italy, 2010.

 S. Nešić, D. Gašević , M. Jazayeri, "Semantic Document Architecture for Desktop Data Integration and Management," The 22nd
International Conference on Software Engineering and Knowledge Engineering - SEKE, pp. 73 – 78, San Francisco, USA, 2010.

 S. Nešić, D. Gašević , M. Jazayeri, M. Landoni, "Using Semantic Documents and Social Networking in Authoring Course Material: An
Empirical Study," 10th IEEE International Conference on Advanced Learning Technologies - ICALT, pp. 666 – 670, Sousse,Tunisia,
2010. (Best paper award)

 S. Nešić, F. Crestani, D. Gašević , M. Jazayeri, "Concept-Based Semantic Annotation, Indexing and Retrieval of Office-Like
Document Units," 9th RIAO Conference, pp. 234 – 237 Paris, France, 2010.

 S. Nešić, D. Gašević, M. Jazayeri, "Extending MS Office for sharing Document Content Units over the Semantic Web," 8th
International Conference on Web Engineering - ICWE, Yorktown Heights, pp. 350 – 353, New York, USA, 2008.

 S. Nešić, D. Gašević, M. Jazayeri, "Semantic Document Management for Collaborative Learning Object Authoring," 8th IEEE
International Conference on Advanced Learning Technologies - ICALT, pp. 751 – 755, Santander, Spain, 2008.

 S. Nešić, D. Gašević, M. Jazayeri, "An ontology-based framework for author-learning content interaction," 6th International
Conference on Web-based Education - WBE, Chamonix, France, 2007.

 S. Nešić, D. Gašević, M. Jazayeri, "An Ontology-Based Framework for Authoring Assisted by Recommendation," 7th IEEE
International Conference on Advanced Learning Technologies - ICALT, pp. 227 – 231, Niigata, Japan, 2007.

 S. Nešić, J. Jovanović, D. Gašević, M. Jazayeri, "Ontology-Based Content Model for Scalable Content Reuse," 4th ACM SIGART
International Conference on Knowledge Capture - K-CAP, pp. 195 – 198, Whistler, Canada, 2007.

Workshops:
 S. Nešić, M. Jazayeri, F. Lelli, S. Nešić, "Towards Efficient Document Content Sharing in Social Networks” 2nd Workshop on Social
Software Engineering and Applications, co-located with ESEC/FSE, pp. 1- 8, Amsterdam, Netherlands, 2009.


                                                                                                                                      30

Más contenido relacionado

Similar a Sasa Nesic - PhD Dissertation Defense

Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
DatiGovIT
 
Linked Data Technology and Status
Linked Data Technology and StatusLinked Data Technology and Status
Linked Data Technology and Status
Myungjin Lee
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
imranlatif
 

Similar a Sasa Nesic - PhD Dissertation Defense (20)

ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017
 
SRBench Streaming RDF SPARQL Benchmark
SRBench Streaming  RDF SPARQL BenchmarkSRBench Streaming  RDF SPARQL Benchmark
SRBench Streaming RDF SPARQL Benchmark
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glass
 
Standards for Semantic Mashups
Standards for Semantic MashupsStandards for Semantic Mashups
Standards for Semantic Mashups
 
LRMI in Context, Brandt Redd
LRMI in Context, Brandt ReddLRMI in Context, Brandt Redd
LRMI in Context, Brandt Redd
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
Spotlight
SpotlightSpotlight
Spotlight
 
Semantic web
Semantic webSemantic web
Semantic web
 
Ontology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in ChinaOntology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in China
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatil
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Negotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender systemNegotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender system
 
Linked Data Technology and Status
Linked Data Technology and StatusLinked Data Technology and Status
Linked Data Technology and Status
 
Crowdsourcing-enabled Linked Data management architecture
Crowdsourcing-enabled Linked Data management architectureCrowdsourcing-enabled Linked Data management architecture
Crowdsourcing-enabled Linked Data management architecture
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

Sasa Nesic - PhD Dissertation Defense

  • 1. INFORMATION MANAGEMENT Semantic Document Architecture for Desktop Data Integration And Management Place image here November 30, 2010 Saša Nešić PhD Dissertation Defense
  • 2. Motivation Semantic Web Semantic Desktop  Ontologies  Resource Description Framework (RDF)  SPARQL query language Semantic Documents 2
  • 3. Semantic Documents Semantic document are composite information resources composed of data/information units that are:  uniquely identified by globally unique URIs,  semantically annotated by concepts from domain ontologies,  interlinked with other data/information units via explicit semantic links . 3
  • 4. Thesis Statement “Semantic documents integrate desktop data into a unified desktop information space and enable desktop data to be integrated into a unified information space of social communities” Improving the Effectiveness and Efficiency of Desktop Users 4
  • 5. Outline Motivation Semantic Document Model - SDM Semantic Document Architecture - SDArch Prototype Thesis Validation Conclusions
  • 6. Semantic Document Model Semantic-Linking Part Change-Tracking Part Annotation Part Core Part Core Part Annotation Part Semantic-Linking Part Change-Tracking Part - document unit types - annotation types - semantic linking interface - types of doc. unit changes - structural relationships - annotation interface - change-tracking interface - identification - binary content linking Annals of Information systems’ 09 6
  • 7. Machine-Processable and Human-Readable instances of SDM  MP document representation  Unique and permanent instance  HTTP de-referencable URIs  RDF data format  HR document representation  Temporal document instances  Rendered from the MP instance  Existing document formats 7
  • 8. Outline Motivation Semantic Document Model - SDM Semantic Document Architecture - SDArch Prototype Thesis Validation Conclusions
  • 9. Semantic Document Architecture - SDArch Annals of Information systems’ 09 9
  • 10. Semantic Document Authoring, Search, and Navigation Concept Exploration Algorithm  Objective: Search Algorithm - conceptualization of DU semantics  Objective:  Input: Search Personalization Algorithm - search for semantic document units (DUs) - document unit:  Objective:  - Input: ontology(ies) domain - personalization of semantic doc. Search  - Output: a free-text keyword query  Input:  - Output: vector: concept - list of retrieved semantic DUs: - a ranked list of semantic DUs - list of user preferences - concept weight vector:  Features:  Output: - forming semantic query: - re-ranked list of semantic DUs  Features:  Features: - lexical expansion of concept labels - - executingSCA for each DUagainst CI: extracting semantic query - syntactic concept matching - weighting schema for each user preference - semantic concept matching - ranking DUs based on calculated weights - measuring concept relevance - measuring similarity between and Semantic document authoring service Semantic document search and navigation service SEKE’ 10 10 10
  • 11. Semantic Document Sharing  SDArch social network  Publishing only RDFs  Capturing social-context annotations  Contributing to:  Linked Open Data Cloud  Web of Linked Data  Semantic Web ESEC/FSE – SoSEA’09 11
  • 12. Outline Motivation Semantic Document Model - SDM Semantic Document Architecture - SDArch Prototype Thesis Valiadtion Conclusions
  • 13. SDArch Prototype Objectives: Source Code Organization: Number of services 5  Validation of SDArch and SDM Number of .NET assemblies 15  Enabling experimental evaluation Number of .NET namespaces 14  Enabling usability evaluation Implementation:  Semantic Document Repository  Sesame 2 RDF repository  SemWeb C# Library  MySQL DB-backed persistent RDF storage  SPARQL query support  Full-Text query support (Lucene)  Services  WCF Framework  Tools  MS Office Add-Ins 13
  • 14. SemanticDoc - MS Office Add-Ins ICWE’08 14
  • 15. Outline Motivation Semantic Document Model - SDM Semantic Document Architecture - SDArch Prototype Thesis Validation Conclusions
  • 16. Thesis Validation Q1: How do semantic documents improve information finding and retrieval in semantically integrated document collections? 1. Experimental evaluation of Information Retrieval in Semantic Documents Q2: How do semantic documents facilitate desktop users in completing tasks that draw data from both a personal desktop and social communities? 2. Usability evaluation of SDArch Services and Tools 16 16
  • 17. Experimental Evaluation of Information Retrieval in Semantic Documents  Objectives:  Measuring effectiveness of the semantic document search  Measuring effectiveness of the semantic document annotation (indexing)  Compared approaches:  Concept-Based Indexing and Search – Simple Syntactic Matching  Concept-Based Indexing and Search – Lexically Expanded Syntactic Matching  Full Text Indexing and Search (Lucene)  Semantic Document Indexing and Search SEMAPRO' 10 17 17
  • 18. Test Collections Mammals of the World Metals and Alloys  MAMO Ontology  Metals Ontology  OWL + SKOS  OWL + SKOS  Finnish National Museum  Key-To-Metals, Zurich  ~ 5000 domain concepts  ~ 1800 domain concepts  Document Set  Document Set  Wikipedia – List of Mammals  Key-To-Metals records  150 articles  240 Word documents  2130 semantic document units  3312 semantic document units  Query Set  Query Set  5 queries related to Mammals  5 queries related to Metals and Alloys 18 18
  • 19. Measuring Effectiveness of the Semantic Document (Indexing) Annotation Test collection 1: Mammals of the World # of syn. # of sem. weight of syn. weight of sem. Approach matches matches matches matches CB – simple syntactic matching 1524 - 2.56 - CB – lexically expand. syntactic matching 3182 - 3.62 - Semantic document indexing and annotation 3182 2437 3.62 2.96 Test collection 2: Metals and Alloys # of syn. # of sem. weight of syn. weight of Approach matches matches matches sem. matches CB – simple syntactic matching 2153 - 1.73 - CB – lexically expand. syntactic matching 2879 - 2.43 - Semantic document indexing and annotation 2879 1024 2.43 2.14 19 19
  • 20. Measuring Effectiveness of the Semantic Document Search Test collection: Mammals of the World Test collection: Metals and alloys 20 20
  • 21. Usability Evaluation Evaluation Hypothesis : “Using SDArch results in a more effective, efficient, and satisfactory user experience when authoring, exploring (i.e., searching and navigating) and utilizing documents in carrying out daily tasks.” Usability evaluation criteria :  User Effectiveness  User Efficiency  User Satisfaction ICALT’ 10 21 21
  • 22. Case Study: Authoring of Course Material  Participants – SDArch Social Network  University of Lugano, Switzerland – 7 participants  Simon Fraser University, Canada – 7 participants  Athabasca University, Canada – 2 participants  University of Belgrade, Serbia – 2 participants  Document Collection  “Software Design Patterns” – 70 PowerPoint and Word documents  Evaluation Session  Task-Based Usability Test  Follow-up questionnaires 22 22
  • 23. Usability Test Use Cases i. Setting Up the User Profile and the Social Network Properties ii. Authoring and Publishing Semantic Documents iii. Searching and Navigating across Semantic Documents Task Task objective Slide 1 Design patterns definition 1 2 Example 1 - definition 2 3 Example 1 - illustration 4 Example 2 - definition 3 5 Example 2 - illustration 23
  • 24. Evaluation Methods and Metrics Evaluation Criteria Evaluation Method Evaluation Metric 1. Effectiveness Objective - Quantitative Measure • Task Success Rates Objective - Quantitative Measure • Task Completion Times 2. Efficiency “ • Number of Mouse Clicks “ • Number of Window Switches 3. Satisfaction Subjective - Questionnaire • 5-level Likert scale 24 24
  • 25. 1. User Effectiveness metric: Task success rate Conventional System SDArch System Task Successful Completions % Successful Completions % 1 18 100 18 100 2 17 94.44 18 100 3 15 83.33 17 94.44 4 17 94.44 18 100 5 14 77.77 16 88.88 25 25
  • 26. 2. User Efficiency metric: Task execution time metric: Number of mouse clicks metric: Number of window switches T-Test results: Task Task p-value p-value 1 1 1.6*10-12 0.00071 0.00004 2 2 1.22*10-7 0.00011 0.0041 3 3 6.91*10-8 9.17*10-6 0.00016 4 4 3.67*10-7 0.00034 0.00009 5 5 4.82*10-10 2.6*10-6 0.00004 If p < 0.05  results are statistically significant 26 26
  • 27. 3. User Satisfaction metric: 5-level Likert Scale Internal consistency (reliability) test: Dimension Cronbach’s α Usefulness 0.85 Strongly agree  Ease-of-Use 0.78 Ease-of-Learning 0.92 Overall Satisfaction 0.83 Recommended α values > 0.75 Strongly disagree  27 27
  • 28. Outline Motivation Semantic Document Model - SDM Semantic Document Architecture - SDArch Prototype Thesis Valiadtion Conclusions
  • 29. Conclusions  Main contributions  Introducing the Semantic Document Model – SDM  Designing the Semantic Document Architecture – SDArch  Providing the SDArch Prototype Implementation  Experimental and Usability evaluations  Future directions:  Document units versioning  Document units privacy and security  Decentralized storage of shared semantic documents 29
  • 30. Publications Journals:  S. Nešić, "Semantic Document Model to Enhance Data and Knowledge Interoperability," Annals of Information Systems - Special Issue on Semantic Web & Web 2.0, Springer US, pp. 135 – 160, 2009. Conferences:  S. Nešić, F. Crestani, D. Gašević , M. Jazayeri, "Search and Navigation in Semantically Integrated Document Collections," 4th International Conference on Advances in Semantic Processing - SEMAPRO, pp. 123 – 129, Firenze, Italy, 2010.  S. Nešić, D. Gašević , M. Jazayeri, "Semantic Document Architecture for Desktop Data Integration and Management," The 22nd International Conference on Software Engineering and Knowledge Engineering - SEKE, pp. 73 – 78, San Francisco, USA, 2010.  S. Nešić, D. Gašević , M. Jazayeri, M. Landoni, "Using Semantic Documents and Social Networking in Authoring Course Material: An Empirical Study," 10th IEEE International Conference on Advanced Learning Technologies - ICALT, pp. 666 – 670, Sousse,Tunisia, 2010. (Best paper award)  S. Nešić, F. Crestani, D. Gašević , M. Jazayeri, "Concept-Based Semantic Annotation, Indexing and Retrieval of Office-Like Document Units," 9th RIAO Conference, pp. 234 – 237 Paris, France, 2010.  S. Nešić, D. Gašević, M. Jazayeri, "Extending MS Office for sharing Document Content Units over the Semantic Web," 8th International Conference on Web Engineering - ICWE, Yorktown Heights, pp. 350 – 353, New York, USA, 2008.  S. Nešić, D. Gašević, M. Jazayeri, "Semantic Document Management for Collaborative Learning Object Authoring," 8th IEEE International Conference on Advanced Learning Technologies - ICALT, pp. 751 – 755, Santander, Spain, 2008.  S. Nešić, D. Gašević, M. Jazayeri, "An ontology-based framework for author-learning content interaction," 6th International Conference on Web-based Education - WBE, Chamonix, France, 2007.  S. Nešić, D. Gašević, M. Jazayeri, "An Ontology-Based Framework for Authoring Assisted by Recommendation," 7th IEEE International Conference on Advanced Learning Technologies - ICALT, pp. 227 – 231, Niigata, Japan, 2007.  S. Nešić, J. Jovanović, D. Gašević, M. Jazayeri, "Ontology-Based Content Model for Scalable Content Reuse," 4th ACM SIGART International Conference on Knowledge Capture - K-CAP, pp. 195 – 198, Whistler, Canada, 2007. Workshops:  S. Nešić, M. Jazayeri, F. Lelli, S. Nešić, "Towards Efficient Document Content Sharing in Social Networks” 2nd Workshop on Social Software Engineering and Applications, co-located with ESEC/FSE, pp. 1- 8, Amsterdam, Netherlands, 2009. 30