Companies or organisations of any size, either public or private, have a large amount of data available into isolated data silos. They are created independently for the specific needs of the organisational unit and mainly contain textual data in multiple formats. In order to unleash the power of the relevant information available in such data sources, it is necessary to collect and organize them in an homogenous data structure, easy to access and extend.
The presentation starts identifying the business needs, then drives the audience through the journey of (i) creating a knowledge graph that represents a single highly connected source of truth for the entire organisation, (ii) enriching it using multiple external sources of knowledge and machine learning algorithms, (iii) and evolving it accordingly to the mutating needs of the company.
Furthermore, this session highlights the role of graphs as a new "access pattern" for textual data, compared with the more classical inverted index approach. It concludes with the presentation of a complete end-to-end infrastructure for unstructured data processing workflow where Neo4j is the core of a complex ecosystem integrated with other tools like Elasticsearch, Apache Kafka, Stanford NLP, OpenNLP, Apache Spark, and Tensorflow.
***
Talk at GraphTour Washington D.C., April 14, 2018
2. BUSINESS NEEDS
GraphAware®
→ Convert Data in Actionable Knowledge
Data
‣ Organisations store vast amounts
of content
‣ Collect and organise past
experience or mistakes
‣ Multiple distributed data silos or
data sources
Goals
‣ Do you know what your
customers are going to need in 12
months’ time
‣ How are you going to provide it?
‣ Are you making the best use of
information you already have in
building the next generation of
solutions?
4. CHALLENGES
GraphAware®
The challenge with knowledge are:
‣ Data and information are not consistent
‣ The amount of data
‣ Sources spread across many systems
‣ Data are generated at high speed
Organisational leadership want a solution for data to be:
‣ Integrated at speed
‣ Enable knowledge workers to be more efficient, effective, and consistent
12. SEARCH: INVERTED INDEX
GraphAware®
Pros:
‣ Easy to implement, deploy and maintain
‣ High scalable approach related to the sharding capabilities
‣ Incredibly fast
Cons:
‣ Tuning results is an hard task
‣ Documents are isolated (no explicit connection between them)
‣ No navigation through documents
‣ Difficult to extend
‣ Issue to change the list of synonyms
‣ Only textual search available
28. GRAPH APPROACH
GraphAware®
Pros:
‣ The documents are not considered isolated
‣ Multiple, flexible and unpredictable access patterns
‣ Can be integrated with other ML approaches (i.e. recommendation)
‣ Easy to integrate with other tools
‣ Can create a Knowledge Graph
‣ Enable AI
Cons:
‣ Textual search performance
‣ No sharding
‣ Difficult to Implement
29. KNOWLEDGE GRAPH?
GraphAware®
What is it?
‣ mainly describes real world entities
and their interrelations, organized
in a graph
‣ defines possible classes and
relations of entities in a schema
‣ allows for potentially interrelating
arbitrary entities with each other
‣ covers various topical domains
Some famous Knowledge Graphs:
‣ Google
‣ NASA
‣ Ebay
‣ Facebook
‣ Yahoo
‣ Microsoft
A Knowledge Graph is the only way to manage the whole of enterprise
data in full generality
33. THE GRAPHAWARE KNOWLEDGE
PLATFORM
GraphAware®
Features:
‣ Import information from your
internal sources in one centralised
location
‣ Enrich your data with external or
internal source of knowledge
‣ Analyse information and
Discover business insights using
deep analysis
How it works:
‣ Data Ingestion
‣ Smart Entity Extraction
‣ Augmented Knowledge
‣ Deep Text Analysis
‣ Distributed Processing
‣ Multiple Integration
→ A platform specifically designed for managing textual data
35. THE ROLE OF NEO4J
GraphAware®
‣ Knowledge Graph store
‣ Single source of truth
‣ Fast access to connected data
‣ Query
‣ Merging External Data
‣ Existing Data Augmentation
‣ Scalability
37. ‣ Converting Data in actionable knowledge is a complex task
‣ It’s worth it
‣ A knowledge graph approach gives you a lot of advantages
‣ The GraphAware Knowledge Platform simplify the entire process
CONCLUSION
GraphAware®