SlideShare una empresa de Scribd logo
1 de 3
Descargar para leer sin conexión
Correlation Technology Solutions
Compared to
“Massive Semantic Infrastructure” Solutions
When enterprise or government face difficult, critical problems – intractable problems
that simply must be dealt with – the solutions that are constructed in response to these
problems are often “non-optimal”. Typically, the solutions are expensive, and stunningly
complex. Also typically, these solutions do not perform very well – despite their expense
and complexity. These attributes – expense, complexity, poor outcomes – are especially
prevalent in those cases where computer software is the basis for such solutions. These
“non-optimal” software solutions can be found in many of the 1200 vertical market
sectors for enterprise (identified by NAICS[2007]), and in every sphere of government
operations. Research from Make Sence Florida, Inc. has shown that non-optimal
software solutions are likely to use one or more of three approaches:
“Massive Semantic Infrastructure Solutions” – Systems that require large natural
language databases, ontologies, taxonomies, and concept repositories, and utilize tagging,
threading, entity recognition, and other similar corpus analysis techniques in preparation
for answering user queries.
“Subjective Statistical Model Solutions” – Systems that rely upon statistical models
influenced by subjective human judgments in establishing base or conditional
probabilities of events or outcomes – particularly those which purport to capture *all*
possible events in a complex real-world domain. Such systems typically utilize Bayesian
statistical techniques and include Neural Networks.
“Brute Force Computing Solutions” – Systems which achieve results from the power of
modern day computers to perform a relatively simple process at high speed against large
volumes of data. Keyword searches are a typical example.
The purpose of this document is limited to the examination of how “Massive Semantic
Infrastructure Solutions” differ from Correlation Technology Solutions. A large well
known enterprise software company which is referred to below as “Company A” and that
company’s primary product, which we call “M-Technology” is the example used in this
discussion.
Company A is in fact our "poster-child" for what we call "non-optimal, massive semantic
infrastructure solutions". We like to begin with the practical issues, because the practical
aspects of a Company A solution illustrate perfectly why Company A's "M-Technology"
compares so poorly to Correlation Technology.

1
We often like to recount this true story. At a NYC Search Engine Expo at which we
presented in 2008, a senior staff member of a “Major US Government Financial
Institution” stopped by our booth and, after listening to our explanation of Correlation
Technology, started to complain about Company A - which his organization had
purchased. He said, "for Company A to find a 21-word email I sent (in the past), I had to
remember and enter into the search interface 20 of the words."
Here's why this happens. Before the Company A system can answer a single question, an
enormous set of massive Natural Language databases must be installed and verified.
Then, equally massive dictionaries, thesauri, "concept" repositories, ontologies, lexicons,
and other semantic infrastructure components must be installed and linked. Then, the
corpus (all of the documents) is subjected to indexing, threading, entity recognition, and
other "associative" and "tagging" processes. These require days or weeks of dedicated
server time and huge amounts of memory and data storage. Finally, the system is ready
to do some work. But despite all of this effort, complexity and expense, the Company A
system appears "stupid".
The Company A system appears "stupid" because Company A software is based entirely
on an externally imposed "formal" construct of human language. The "meaning" part of
"M-technology" in fact is constrained to those standard meanings and uses of words
consistent with established academic models. Words are fixed in their allowed use as
only specific parts of speech. The word proximities examined in texts are disregarded if
they do not meet pre-set statistical thresholds of confidence. Syntactically modeled
sentence decomposition is rigidly adhered to, and indexing schemes for "organic"
keyword search are not much improved from their original implementations in the 1990's.
All of these formalisms are observed despite the fact that human expression is riotously,
deliriously, chaotic and adaptive on a moment by moment basis. Writers of even the
shortest communication incorporate cultural memes that no dictionary, no ontology, no
concept map, no semantic infrastructure component could keep current or sort out.
Humans create and utilize idiomatic, vernacular, and colloquial terms and uses for terms
with astounding rapidity and ease, and with astounding confidence in the belief that such
terms and every nuance of meaning carried by such terms will be perfectly understood
and appreciated by the recipients of their expression (and they usually are). Trouble is,
Company A (and its peers) can not make sense of anything not hardwired into the
software's semantic components.
While it is certainly true that a corpus of only very formal documents – such as
government reports, academic papers, and so on - will with the proper lexicons be well
served by a Company A type approach, and while it is also true that Company A has
obliged to provide facilities to users to "make their own lexicons" and to "define their
own concepts" (so, with massive and amazingly time consuming and costly
customization Company A’s product will work better), the fact remains that wherever
human expression and comprehension is informal (such as the majority of email in an
enterprise, human speech captured from transcripts, almost all the other categories of text

2
produced by amateur and professional writers for any purpose), Company A subjects its
users to the possibility for the type of frustrations described above.
If the original text doesn't contain text which conforms to or is confined to the formal
parameters of the academic models used, Company A can often have a lot of trouble in
locating that text. In the last resort, a super-majority of "word matches" was required by
Company A to find the employee's email, because all the "M-technology" was worthless.
The same result could have been achieved with a universally available - and free - Unix
text search utility.
Correlation Technology, in contrast, "permits" a far more "relaxed" and "natural" model
of human language. Our one way, exhaustive transform of data into Knowledge
Fragments (which we call "Acquisition") captures all the significant relations between
words - as they are actually expressed in the text. Unlike "M-technology", Correlation
Technology does not coerce the text into conformity with a set of formalisms or analyze
the text using such formalisms. We "allow" every nuance to be captured without concern
that some artificial rule is observed.
The Correlation process discovers knowledge from the corpus by constructing chains of
iteratively associated Knowledge Fragments, and then analyzing the "Answer Space"
(like the “result set” for RDBMS/SQL) of Correlations. Associations between words can
be as formal or informal as desired or required for the application. We provide in the
Correlation Technology Platform the ability to "dial in" more than 20 differing levels of
"fuzzy association" that actually capture - without imposing any rules which prevent the
discovery of knowledge - all the types of formalisms "understood" by "M-technology".
Further, any additional "reference" preferred for associating words can be "plugged in".
By means of Correlation, knowledge is "emergent", meaning that the analysis of the
Answer Space (a process we call "Refinement") will reveal the desired solutions - if they
exist in the corpus. When the task is Enterprise Search, our Acquisition, Correlation and
Refinement functions will reveal those emails, memos, or documents that the user wants.
Correlation Technology solutions are possible for every product offered by Company A.
In each of these solutions, we believe the Correlation Technology approach will prove far
more effective, far more flexible, and far more straightforward in implementation. While
Correlation Technology solutions can be large scale, every Company A implementation
dwarfs Correlation Technology implementations for an equivalent corpus. While the
complexity of the Correlation Technology solution is obvious, that complexity does not
flow from the hopeless attempt to capture in stone the torrent of human expression and
comprehension, and in fact, Correlation Technology is intrinsically "simple".

For Business Inquiries:
Contact: Carl Wimmer
carl@makesence.us
Mobile: (702) 767-7001

For Technical Inquiries:
Contact: Mark Bobick
m.bobick@correlationconcepts.com
Mobile: (702) 882-5664

3

Más contenido relacionado

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Destacado

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Essay: Comparison of Semantic Solutions to Correlation Technology Solutions

  • 1. Correlation Technology Solutions Compared to “Massive Semantic Infrastructure” Solutions When enterprise or government face difficult, critical problems – intractable problems that simply must be dealt with – the solutions that are constructed in response to these problems are often “non-optimal”. Typically, the solutions are expensive, and stunningly complex. Also typically, these solutions do not perform very well – despite their expense and complexity. These attributes – expense, complexity, poor outcomes – are especially prevalent in those cases where computer software is the basis for such solutions. These “non-optimal” software solutions can be found in many of the 1200 vertical market sectors for enterprise (identified by NAICS[2007]), and in every sphere of government operations. Research from Make Sence Florida, Inc. has shown that non-optimal software solutions are likely to use one or more of three approaches: “Massive Semantic Infrastructure Solutions” – Systems that require large natural language databases, ontologies, taxonomies, and concept repositories, and utilize tagging, threading, entity recognition, and other similar corpus analysis techniques in preparation for answering user queries. “Subjective Statistical Model Solutions” – Systems that rely upon statistical models influenced by subjective human judgments in establishing base or conditional probabilities of events or outcomes – particularly those which purport to capture *all* possible events in a complex real-world domain. Such systems typically utilize Bayesian statistical techniques and include Neural Networks. “Brute Force Computing Solutions” – Systems which achieve results from the power of modern day computers to perform a relatively simple process at high speed against large volumes of data. Keyword searches are a typical example. The purpose of this document is limited to the examination of how “Massive Semantic Infrastructure Solutions” differ from Correlation Technology Solutions. A large well known enterprise software company which is referred to below as “Company A” and that company’s primary product, which we call “M-Technology” is the example used in this discussion. Company A is in fact our "poster-child" for what we call "non-optimal, massive semantic infrastructure solutions". We like to begin with the practical issues, because the practical aspects of a Company A solution illustrate perfectly why Company A's "M-Technology" compares so poorly to Correlation Technology. 1
  • 2. We often like to recount this true story. At a NYC Search Engine Expo at which we presented in 2008, a senior staff member of a “Major US Government Financial Institution” stopped by our booth and, after listening to our explanation of Correlation Technology, started to complain about Company A - which his organization had purchased. He said, "for Company A to find a 21-word email I sent (in the past), I had to remember and enter into the search interface 20 of the words." Here's why this happens. Before the Company A system can answer a single question, an enormous set of massive Natural Language databases must be installed and verified. Then, equally massive dictionaries, thesauri, "concept" repositories, ontologies, lexicons, and other semantic infrastructure components must be installed and linked. Then, the corpus (all of the documents) is subjected to indexing, threading, entity recognition, and other "associative" and "tagging" processes. These require days or weeks of dedicated server time and huge amounts of memory and data storage. Finally, the system is ready to do some work. But despite all of this effort, complexity and expense, the Company A system appears "stupid". The Company A system appears "stupid" because Company A software is based entirely on an externally imposed "formal" construct of human language. The "meaning" part of "M-technology" in fact is constrained to those standard meanings and uses of words consistent with established academic models. Words are fixed in their allowed use as only specific parts of speech. The word proximities examined in texts are disregarded if they do not meet pre-set statistical thresholds of confidence. Syntactically modeled sentence decomposition is rigidly adhered to, and indexing schemes for "organic" keyword search are not much improved from their original implementations in the 1990's. All of these formalisms are observed despite the fact that human expression is riotously, deliriously, chaotic and adaptive on a moment by moment basis. Writers of even the shortest communication incorporate cultural memes that no dictionary, no ontology, no concept map, no semantic infrastructure component could keep current or sort out. Humans create and utilize idiomatic, vernacular, and colloquial terms and uses for terms with astounding rapidity and ease, and with astounding confidence in the belief that such terms and every nuance of meaning carried by such terms will be perfectly understood and appreciated by the recipients of their expression (and they usually are). Trouble is, Company A (and its peers) can not make sense of anything not hardwired into the software's semantic components. While it is certainly true that a corpus of only very formal documents – such as government reports, academic papers, and so on - will with the proper lexicons be well served by a Company A type approach, and while it is also true that Company A has obliged to provide facilities to users to "make their own lexicons" and to "define their own concepts" (so, with massive and amazingly time consuming and costly customization Company A’s product will work better), the fact remains that wherever human expression and comprehension is informal (such as the majority of email in an enterprise, human speech captured from transcripts, almost all the other categories of text 2
  • 3. produced by amateur and professional writers for any purpose), Company A subjects its users to the possibility for the type of frustrations described above. If the original text doesn't contain text which conforms to or is confined to the formal parameters of the academic models used, Company A can often have a lot of trouble in locating that text. In the last resort, a super-majority of "word matches" was required by Company A to find the employee's email, because all the "M-technology" was worthless. The same result could have been achieved with a universally available - and free - Unix text search utility. Correlation Technology, in contrast, "permits" a far more "relaxed" and "natural" model of human language. Our one way, exhaustive transform of data into Knowledge Fragments (which we call "Acquisition") captures all the significant relations between words - as they are actually expressed in the text. Unlike "M-technology", Correlation Technology does not coerce the text into conformity with a set of formalisms or analyze the text using such formalisms. We "allow" every nuance to be captured without concern that some artificial rule is observed. The Correlation process discovers knowledge from the corpus by constructing chains of iteratively associated Knowledge Fragments, and then analyzing the "Answer Space" (like the “result set” for RDBMS/SQL) of Correlations. Associations between words can be as formal or informal as desired or required for the application. We provide in the Correlation Technology Platform the ability to "dial in" more than 20 differing levels of "fuzzy association" that actually capture - without imposing any rules which prevent the discovery of knowledge - all the types of formalisms "understood" by "M-technology". Further, any additional "reference" preferred for associating words can be "plugged in". By means of Correlation, knowledge is "emergent", meaning that the analysis of the Answer Space (a process we call "Refinement") will reveal the desired solutions - if they exist in the corpus. When the task is Enterprise Search, our Acquisition, Correlation and Refinement functions will reveal those emails, memos, or documents that the user wants. Correlation Technology solutions are possible for every product offered by Company A. In each of these solutions, we believe the Correlation Technology approach will prove far more effective, far more flexible, and far more straightforward in implementation. While Correlation Technology solutions can be large scale, every Company A implementation dwarfs Correlation Technology implementations for an equivalent corpus. While the complexity of the Correlation Technology solution is obvious, that complexity does not flow from the hopeless attempt to capture in stone the torrent of human expression and comprehension, and in fact, Correlation Technology is intrinsically "simple". For Business Inquiries: Contact: Carl Wimmer carl@makesence.us Mobile: (702) 767-7001 For Technical Inquiries: Contact: Mark Bobick m.bobick@correlationconcepts.com Mobile: (702) 882-5664 3