This presentation provides an introduction to linked patent data, including:
- An overview of what linked data is and how it can be used in patents by connecting related data from different sources.
- Details on how linked data builds upon existing standards to represent information as graphs of interconnected data rather than isolated tables.
- Examples of linked data implementations in patents by intellectual property offices and the opportunities it provides for integrating and analyzing patent information.
We propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through a workflow and/or provenance of all the computations executed; 2) Persistent entries for key entities involved for data, software versions, and workflows; 3) A set of narrative accounts that are automatically generated human-consumable renderings of the record and entities and can be included in a paper. Different narrative accounts can be used for different audiences with different content and details, based on the level of interest or expertise of the reader. Data narratives can make science more transparent and reproducible, because they ensure that the text description of the computational experiment reflects with high fidelity what was actually done. Data narratives can be incorporated in papers, either in the methods section or as supplementary materials. We introduce DANA, a prototype that illustrates how to generate data narratives automatically, and describe the information it uses from the computational records. We also present a formative evaluation of our approach and discuss potential uses of automated data narratives.
In this Webinar Lorenz Bühmann presents the ontology repair and enrichment tool ORE and also the DL-Learner , a machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge. Those two beneighbored tools in the LOD2 Stack are for classification and the following quality analysis of Linked Data.
UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to
*) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version.
Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data.
*) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts.
*) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store.
*) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
*) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube.
*) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface.
*) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
We propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through a workflow and/or provenance of all the computations executed; 2) Persistent entries for key entities involved for data, software versions, and workflows; 3) A set of narrative accounts that are automatically generated human-consumable renderings of the record and entities and can be included in a paper. Different narrative accounts can be used for different audiences with different content and details, based on the level of interest or expertise of the reader. Data narratives can make science more transparent and reproducible, because they ensure that the text description of the computational experiment reflects with high fidelity what was actually done. Data narratives can be incorporated in papers, either in the methods section or as supplementary materials. We introduce DANA, a prototype that illustrates how to generate data narratives automatically, and describe the information it uses from the computational records. We also present a formative evaluation of our approach and discuss potential uses of automated data narratives.
In this Webinar Lorenz Bühmann presents the ontology repair and enrichment tool ORE and also the DL-Learner , a machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge. Those two beneighbored tools in the LOD2 Stack are for classification and the following quality analysis of Linked Data.
UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to
*) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version.
Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data.
*) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts.
*) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store.
*) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
*) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube.
*) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface.
*) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
This webinar in the course of the LOD2 webinar series will present use cases and live demos of D2R (Free University Berlin) and Sparqlify (University of Leipzig).
D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language.
Sparqlify is a tool enabling one to define expressive RDF views on relational databases and query them with a subset of the SPARQL query language. By featuring a novel RDF view definition syntax, it aims at simplifying the RDB-RDF mapping process.
more to be found at:
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
IP LodB project (for more details see iplod.io ) capitalizes on LOD database thinking, to build bridges between patented information and scientific knowledge, whilst focusing on individuals who codify new knowledge and their connected organizations, including those who apply patents in new products and services.
As main outputs the IP LodB produced an intellectual property rights (IPR) linked open data (LOD) map (IP LOD map), and has tested the linkability of the European patent (EP) LOD database, whilst increasing the uniqueness of data using different harmonization techniques.
These slides were developed for NIPO workshop
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
Presentation for the NexTech Experts Panel II during the NexTech 2021 Congress (https://www.iaria.org/conferences2021/NexTech21.html).
Discusses the emerging and versatile role of the Knowledge Scientist in designing and developing explainable SemanticAI applications.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
Interoperability for IoT is a challenging problem
because it requires us to tackle (i) cross-system interoperability
issues at the IoT platform sides as well as relevant network
functions and clouds in the edge systems and data centers
and (ii) cross-layer interoperability, e.g., w.r.t. data formats,
communication protocols, data delivery mechanisms, and perfor-
mance. However, existing solutions are quite static w.r.t software
deployment and provisioning for interoperability. Many middle-
ware, services and platforms have been built and deployed as
interoperability bridges but they are not dynamically provisioned
and reconfigured for interoperability at runtime. Furthermore,
they are often not considered together with other services as a
whole in application-specific contexts. In this paper, we focus
on dynamic aspects by introducing the concept of Resource
Slice Interoperability Hub (rsiHub). Our approach leverages
existing software artifacts and services for interoperability to
create and provision dynamic resource slices, including IoT,
network functions and clouds, for addressing application-specific
interoperability requirements. We will present our key concepts,
architectures and examples toward the realization of rsiHub.
This webinar in the course of the LOD2 webinar series will present use cases and live demos of D2R (Free University Berlin) and Sparqlify (University of Leipzig).
D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language.
Sparqlify is a tool enabling one to define expressive RDF views on relational databases and query them with a subset of the SPARQL query language. By featuring a novel RDF view definition syntax, it aims at simplifying the RDB-RDF mapping process.
more to be found at:
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
IP LodB project (for more details see iplod.io ) capitalizes on LOD database thinking, to build bridges between patented information and scientific knowledge, whilst focusing on individuals who codify new knowledge and their connected organizations, including those who apply patents in new products and services.
As main outputs the IP LodB produced an intellectual property rights (IPR) linked open data (LOD) map (IP LOD map), and has tested the linkability of the European patent (EP) LOD database, whilst increasing the uniqueness of data using different harmonization techniques.
These slides were developed for NIPO workshop
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
Presentation for the NexTech Experts Panel II during the NexTech 2021 Congress (https://www.iaria.org/conferences2021/NexTech21.html).
Discusses the emerging and versatile role of the Knowledge Scientist in designing and developing explainable SemanticAI applications.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
Interoperability for IoT is a challenging problem
because it requires us to tackle (i) cross-system interoperability
issues at the IoT platform sides as well as relevant network
functions and clouds in the edge systems and data centers
and (ii) cross-layer interoperability, e.g., w.r.t. data formats,
communication protocols, data delivery mechanisms, and perfor-
mance. However, existing solutions are quite static w.r.t software
deployment and provisioning for interoperability. Many middle-
ware, services and platforms have been built and deployed as
interoperability bridges but they are not dynamically provisioned
and reconfigured for interoperability at runtime. Furthermore,
they are often not considered together with other services as a
whole in application-specific contexts. In this paper, we focus
on dynamic aspects by introducing the concept of Resource
Slice Interoperability Hub (rsiHub). Our approach leverages
existing software artifacts and services for interoperability to
create and provision dynamic resource slices, including IoT,
network functions and clouds, for addressing application-specific
interoperability requirements. We will present our key concepts,
architectures and examples toward the realization of rsiHub.
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
PlanetData project was presented by Elena Simperl and Barry Norton from Karlsruhe Institute of Technology at the 1st International Symposium on Data-driven Process Discovery and Analysis on June 30, 2011 in Campione d’Italia, Italy
Fluid Network Planes – An overview of Network Refactoring and Offloading Trends. Keynote at IEEE Netsoft'19, Paris, 2019.
Keynote Description
10 years have passed since the term SDN was coined in 2009. Since then, the three letter acronym has kept evolving through broadening definitions until our state of affairs where SDN means little unless adequately technically elaborated.
Among the key aspects of SDN is the refactoring of the network control plane. At the crossroads, NFV introduces new ways to refactor network functions, be them control or data plane related.
Despite the softwarization flag of SDN and NFV, the hardware/software continuum is as relevant as ever, offering new offloading opportunities at node and network-wide scales.
In this talk, we will review evolving transformations behind network softwarization with a special focus on network refactoring and offloading trends leading to “fluid network planes”, where the location and HW/SW embodiments of network functions becomes blurry, from the edge to core, from one administrative provider to another, from programmable silicon to portable lightweight virtualized containers.
Stefano Nativi presents the RDA.
Workshop title: Organising high-quality research data management services
Workshop abstract:
Open science needs high quality data management where researchers can create, use and share data according to well defined standards and practices. this is one of the pillars of Open Science. In the data management landscape we find quite a few organisations that aim at achieving this, however to get it right, a collaboration is called for where all can play a suitable role and present this in a consistent way to the researcher.
The proposed workshop brings together representatives of standard organisation (RDA), eInfrastructures (EUDAT) and Libraries (LIBER) that together can organise the high quality data management for research.
DAY 1 - PARALLEL SESSION 2
http://opensciencefair.eu/workshops/organising-high-quality-research-data-management-services
This module supported the training on Linked Open Data delivered to the EU Institutions on 30 November 2015 in Brussels. https://joinup.ec.europa.eu/community/ods/news/ods-onsite-training-european-commission
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
Python is open source and has so many libraries for data wrangling and visualization that makes life of data scientists easier. For data wrangling pandas is used as it represent tabular data and it has other function to parse data from different sources, data cleaning, handling missing values, merging data sets etc. To visualize data, low level matplotlib can be used. But it is a base package for other high level packages such as seaborn, that draw well customized plot in just one line of code. Python has dash framework that is used to make interactive web application using python code without javascript and html. These dash application can be published on any server as well as on clouds like google cloud but freely on heroku cloud.
Logical Data Fabric: Architectural ComponentsDenodo
Watch full webinar here: https://bit.ly/39MWm7L
Is the Logical Data Fabric one monolithic technology or does it comprise of various components? If so, what are they? In this presentation, Denodo CTO Alberto Pan will elucidate what components make up the logical data fabric.
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESijdpsjournal
By programming both the data plane and the control plane, network operators can adapt their networks to
their needs. Thanks to research over the past decade, this concept has more formulized and more
technologically feasible. However, since control plane programmability came first, it has already been
successfully implemented in the real network and is beginning to pay off. Today, the data plane
programmability is evolving very rapidly to reach this level, attracting the attention of researchers and
developers: Designing data plane languages, application development on it, formulizing software switches
and architecture that can run data plane codes and the applications, increasing performance of software
switch, and so on. As the control plane and data plane become more open, many new innovations and
technologies are emerging, but some experts warn that consumers may be confused as to which of the many
technologies to choose. This is a testament to how much innovation is emerging in the network. This paper
outlines some emerging applications on the data plane and offers opportunities for further improvement
and optimization. Our observations show that most of the implementations are done in a test environment
and have not been tested well enough in terms of performance, but there are many interesting works, for
example, previous control plane solutions are being implemented in the data plane.
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES ijdpsjournal
By programming both the data plane and the control plane, network operators can adapt their networks to
their needs. Thanks to research over the past decade, this concept has more formulized and more
technologically feasible. However, since control plane programmability came first, it has already been
successfully implemented in the real network and is beginning to pay off. Today, the data plane
programmability is evolving very rapidly to reach this level, attracting the attention of researchers and
developers: Designing data plane languages, application development on it, formulizing software switches
and architecture that can run data plane codes and the applications, increasing performance of software
switch, and so on. As the control plane and data plane become more open, many new innovations and
technologies are emerging, but some experts warn that consumers may be confused as to which of the many
technologies to choose. This is a testament to how much innovation is emerging in the network. This paper
outlines some emerging applications on the data plane and offers opportunities for further improvement
and optimization. Our observations show that most of the implementations are done in a test environment
and have not been tested well enough in terms of performance, but there are many interesting works, for
example, previous control plane solutions are being implemented in the data plane.
This is module 12 in the EDI Data Publishing training course. In this module, you will learn how to correctly cite an EDI dataset and create provenance metadata for a derived dataset.
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
The current movement toward openness and sharing of data is likely to have a profound effect on the speed of scientific research and the complexity of questions we can answer. However, a fundamental problem with currently available datasets (and their metadata) is heterogeneity in terms of implementation, organization, and representation.
To address this issue we have developed a generic scientific data model (SDM) to organize and annotate raw and processed data, and the associated metadata. This paper will present the current status of the SDM, implementation of the SDM in JSON-LD, and the associated scientific data model ontology (SDMO). Example usage of the SDM to store data from a variety of sources with be discussed along with future plans for the work.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Alberto Ciaramella: "Linked patent data: opportunities and challenges for patent professional searchers"
1. Linked Patent Data:
opportunities and challenges
for patent professionals
CEPIUG Conference
Milano, 9 - 11 September 2018
Alberto.Ciaramella @intellisemantic.com
2. This presentation and its objectives
n This presentation is
n a lean tutorial about linked data
n a status of the art about linked data in patents
n a position paper
n to identify how today technology can cover
patent professionals needs
n to encourage patent offices to extend the
adoption of this paradigm in patent information
they publish
a network effect can occur
IntelliSemantic 2
3. Index
n The framework:
n what are Linked Data?
n why and when to use them?
n A Linked Data technical introduction
n Linked Data in patents
n EPO
n Others
n Opportunities and challenges in patents
n Conclusions
IntelliSemantic 3
5. What are Linked Data?
n Linked Data (LD) is a new more powerful and flexible
technology of publishing and accessing data bases
n Linked data is backed by a solid background of
international standards, developed by the Word Wide
Web Consortium (W3C)
n “Linked data” were originally proposed by Berners
Lee (2006), as Linked Open Data (LOD), but it is now
well recognized that this is only a specific case of use:
n Linked data can also be private, of course.
n More recently, the focus of Linked Data research
moved to engineering aspects as privacy, security,
efficiency, quality of data
n Linked data is one of the enabling technologies for
Big Data applications, more specifically to solve the
challenge of the variety of data
IntelliSemantic 5
6. Who, where, why linked data (e.g. in patents)
IntelliSemantic 6
Who Where
(layer) Why
Data
base
supplier
Data
base To
facilitate the
verification
and
cleaning
of
a
data
base
Application
developer
Data
bases
integration
To
facilitate
the integration
of
different
data
bases
(different
patent
offices,
non patent
literature
..)
Application
developer
/
users
Query
expression To
express
more
powerful
queries
Application
developer
/users
Results
analysis To
facilitate the
inclusion
of
more
intelligent
applications
7. Evolutions of Linked Data in patents
n At EPOPIC Kilkenny (2011) an invited speech by
Nigel Shadbolt presented the results achieved and/or
achievable with open data in different fields
n since that, Open Data and Linked Open Data
entered as a discussion topic in patent conferences
n some LOD demo were released from 2013 to 2016,
typically including a patent set sample
n notable demos by KIPO, USPTO and EPO
n in April 2018 the EPO launched its product-level
solution LOD, regularly updated (weekly)
n this seems to be a turning point!
IntelliSemantic 7
8. A step back -
Data base models: from tables to graphs
n A well established and used model in data bases is
the relational model, defined in the ‘70s
n in these systems (RDBMS), data are represented as
a set of related tables. As an example:
n a first table associates to any patent its data,
including applicant and inventor
n other related tables include further details about
applicants and inventors
n this paradigm (table-based) is efficient, but, once
designed, it is more difficult to evolve and scale
n in any case, it is straightforward to convert tables to
graphs (see next slide), and this adds a lot of flexibility:
this conversion is the basic concept of Linked Data
IntelliSemantic 8
9. From tables to (triple) graphs: an example
If we have this table chunk:
IntelliSemantic 9
Patent id Application date applicant -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
XYZ12488 2018/09/01 Company
A -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
Any cell in the table can be represented as a triple of values:
a) Subject (the heading of a row, as XYZ12488): represented as a graph node
b) Predicate (the heading of a column, as application date): represented as a
graph edge
c) Object (the cell value, as Company A): represented as a graph node
Hence these two cells in the table can also be represented by this graph chunk
If we apply this trasformation to any cell.
the table is transformed to a labeled
graph, which includes nodes (subjects,
objects) and connections (predicates)
10. From triple graphs to LD on the web
Linked Data (LD) implement on the net (internet or
intranet) the graph representation just described. To do
that:
n reuse well known internet standards
n to name things (with URI)
n to access them (with HTTP)
n define new kind of information, i.e. triples
n triples are not required in the “web of documents”
n include external references to easily reuse
knowledge already defined in other graphs, as:
n “same level” complementary data, e.g. data from
scientific papers to integrate patent data
n “higher level” common vocabularies, e.g. the
definition of author.
IntelliSemantic 10
11. SPARQL to query LD: what and why
n what is
n SPARQL is the standard query language for LD
n it means “SPARQL Protocol and RDF Query Language”
n SPARQL has been defined by W3C
n the last version 1.1 has being released on 2013
n why to use
n to facilitate the integration of different data bases,
independently from their structure
n to facilitate the identification of eventual missing or bad
data in the original data bases
n to implement more efficiently advanced applications on
the top of your data
n how to use
n a LD is typically provided by a SPARQL end point, i.e.
an internet address which accepts SPARQL queries
IntelliSemantic 11
12. SPARQL: the query
n The query is represented by a subgraph on the whole
graph, having defined fixed point(s) in the whole graph
n e.g. the patent applicant
n Different kind of queries can be represented, from
“usual” queries to more involved, still interesting queries,
as:
n example 1: identify all patents of the applicant x
published by the authority y in the years zi to zn
n this is a more usual example
n example 2: identify all patents which are co-cited by
at least a member of the patent family A and by at least
a member of the patent family B
n this is a little more advanced example
IntelliSemantic 12
13. SPARQL: results
n SPARQL results are typically presented by tables
n e.g. the table of patents satisfying a query
n native SPARQL commands allow
n to select the columns in the results table
n to order rows in the results table
n to extract summary values from rows,
n hence to extract statistics
n it is also possible to make inferences from your data,
n in any case under your control
IntelliSemantic 13
14. SPARQL: a simplified example
n the question to solve: identify all patents citing patent X,
directly or indirectly (i.e. through a chain of citations)
n the kernel of the SPARQL query is
SELECT DISTINCT ?s
WHERE {
?s cites+ patentX. }
n the WHERE clause identifies the subgraph in which
n the patentX is the object: it is the patent you know
n the predicate is “cites”, which includes in this case
the recursion symbol +
n the ?s are the unknown subjects to identify
n the SELECT DISTINCT clause produces the output
IntelliSemantic 14
15. Linked Data accesses
Linked Data are accessed at specific web addresses (end-
point). Depending from the end-point, they can be:
n linked public data, used for
n open and reliable vocabularies and ontologies
n open and reliable information, e.g. DBpedia
provided that the performance is adequate for you
n linked restricted data
n data restricted to you and to your business partners
n linked internal data (behind your firewall) to access
n your internal company data
Whilst Linked Data emerged first as Linked Open Data, it
is misleading to assume that Linked Data have to be
necessarily open as well.
IntelliSemantic 15
17. The EPO LOD: an overview
n It is one of the three kind of services provided by EPO to
application integrators
n It is available since April 2018, at
https://www.epo.org/searching-for-patents/data/linked-
open-data.html#tab-1
n It is provided under the Creative Common Attribution 4.0
License, hence the integrator:
n is free to share and adapt these data
n giving credit and without adding restrictions
n User manual and support forum available
n Panel access at https://data.epo.org/linked-data/
n it provides a SPARQL query suggestion interface
n it provides access to a SPARQL endpoint
IntelliSemantic 17
18. The EPO LOD: technical details
n The data base is EPO-specific and includes:
n data provided by EPO, i.e.
n EPO application and publication bibliographic
information, including text
n weekly updated
n the CPCs tree
n a vocabulary of patent concepts defined by EPO
n links to other LOD key vocabularies/concepts
n most of which originated by W3C
n Text search also available
n which is an interesting extension
n The data base does not include (yet?):
n information about the legal status
IntelliSemantic 18
19. An example: the EPO LOD patent vocabulary
IntelliSemantic 19
This
vocabulary
developed
by
the
EPO
defines
some
key
concepts
in
patents
(e.g.
application,
publication)
and
can
be
shared
with
other
patent
information
provided
as
linked
data.
This
is
only
a
simplified
sketch,
indeed.
20. The EPO LOD: the business model
n free data and access
n to be used according to the fair use policy, hence for
casual users and experiments:
n best effort policy is applied for results:1 minute
after the query, results are terminated in any case
n for production level applications, the LOD data base
has to be transferred to the server of the application
integrator
n to the purists of the open web, this is a limit
n for the real business, this is the example of a
realistic compromise, which can enable some
more flexibility on the business model
IntelliSemantic 20
21. Other LOD examples for patents
n KIPO LOD
n info and data at http://lod.kipo.kr/
n SPARQL query access at http://lod.kipo.kr/data/sparql
n the data base includes bibliographic and legal KIPO info
n LOD support is mentioned the annual KIPO report
n it is a beta now
n US LOD
n info and data at https://old.datahub.io/dataset/linked-‐uspto-‐
patent-‐data
n SPARQL query access at
http://us.patents.aksw.org/sparql
n it seems still a demo, since the data content is not
updated recently, and the kind of licence is not mentioned
IntelliSemantic 21
23. Bulk data base providers (patent offices)
IntelliSemantic 23
n Opportunities
n technical: the adoption of the LD paradigm
facilitates the verification and cleaning of your own
data base, hence it should be promoted also as an
internal best practice
n Challenges
n business: if a provider delivers LD to its customers,
it has to define its license and its commercial
conditions
n beware: the provider is not forced by the
technology itself to offer its data for free! This a
misleading objection.
24. Application developers
n Technical opportunities:
n the integration of patent data bases provided by
different patent officies will become easier
n the integration of patent data with other information
(e.g. scientific or business) becomes easier
n this will support the extension of patent
platforms to business intelligence platforms
n more advanced functions can be provided to patent
professionals and their environment (i.e. colleagues
and managers)
n Business challenges:
n all the previous mentioned opportunities will
materialize easier if the LD paradigm will suitably
adopted by data base providers
IntelliSemantic 24
25. Patent professionals
n Technical opportunities: have to opportunity to rely on:
n more powerful and diversified queries, e.g. to
suitable explore the citations network
n more advanced and flexible analytics tools, for
deeper patent sets analyses
n more transparent and well agreed definition of
concepts, to support a new generation of semantic
solutions in patent informatics
n I define this semantic 3.0 in patent informatics
n more information rich tools, which can help them
to support better a broader set of users
n Challenges:
n motivations and willingness to improve the
profession
IntelliSemantic 25
26. Semantics in Patent Informatics: a view
Semantics in Patent Informatics can be categorized also by
generations:
n Semantics 1.0 (implicit): Solutions and best practices in
patent informatics include some semantic concepts, but
they are not mentioned under this term. For example: IPCs
and CPCs are actually ontologies.
n Semantics 2.0 (explicit): Natural language semantics is
included in solutions, to add new ways of searching and/or
to analyze results. The word “semantic” is now explicitly
advertised as such, for very diversified solutions indeed.
Some concerns arise too, e.g. about “which kind of
semantic” is in place (for searching? for analyzing?).
n Semantics 3.0 (standardized): The inclusion of explicit,
agreed and transparent standard knowledge bases will
establish semantics as a true value added for the
professional, not as a question mark issue.
IntelliSemantic 26
27. Conclusions
n The adoption of Linked Data is a win / win
opportunity for all players:
n patent offices
n application developers
n professional users
n their environment (colleagues and managers)
n but the awareness of this possible evolution in
patent informatics must sparkle now!
n Linked Data in fact:
n uses well established international standards
n facilitates the reuse of existing knowledge
data bases (e.g. vocabularies)
IntelliSemantic 27
28. Questions
n technology-related questions?
n application-related questions?
n business model-related questions?
n bulk data base providers?
n application developers?
n other questions?
IntelliSemantic 28
29. Acknowledgment and extensions
n since the topic is definitely challenging to be
summarized in a 20 minute presentation, an open
access paper with the same title and coauthored with
Marco Ciaramella will extends this presentation with:
n more details
n patent specific examples
n bibliography
n the paper will be accessed at https://arxiv.org/
n in the section computers / digital libraries
n I do acknowledge Marco for the contribution to this
paper as well to this presentation
IntelliSemantic 29
30. Contact information
a) For specific questions and information requests
n e-mail: alberto.ciaramella@intellisemantic.com
n tel. +39 011 9550 380
b) For registering to the IntelliSemantic newsletter,
including references to new IntelliSemantic papers and
webinars announcements
n e-mail: newsletter@intellisemantic.com
n next issue: September 2018
c) For general information and first level overview:
n site http://www.intellisemantic.com
IntelliSemantic 30