LangChain + Docugami Webinar

The Document Engineering Company
LangChain Webinar: Lessons from
Deploying LLMs with LangSmith
Jean Paoli @jeanpaoli
Taqi Jaffri @tjaffri
Mike Palmer
Zubin Wadia @zubinwadia
Topics
1. Intros
2. Real-World Challenges using LLMs with Documents
i. Real documents are more than flat Text
ii. Documents are Knowledge Graphs
iii. Building Complex Chains with the LangChain Expression Language
iv. Debugging Complex Chain Failures in Production
3. Summary: End to end LLM Ops
… deploy / run / trace / correct / finetune / repeat
Who we are...
• Generative AI for Business Documents. Founded 2018.
• Leverage open source and our own LLMs trained on millions of business documents to create a full
XML data representation of complex documents in their entirety, delivering immediate value to frontline
business users.
• Document as data can be used to generate insights and reports, create new documents, or drive line-
of-business applications. Free trial at www.docugami.com/trial
• Customers in many sectors, including insurance, real estate, health, professional services.
Jean Paoli
Co-founder, CEO
Pioneer in document engineering;
co-creator of XML, .docx, .xlsx, .pptx;
started four $1B+ businesses at Microsoft
Mike Palmer
Co-founder, Head of Technologies
Senior engineering manager at
Microsoft; co-created Microsoft
InfoPath.
Zubin Wadia
Product Manager
Making documents matter to people and
systems – 100+ products over 20 years. MIT,
ImageWork, CiviGuard.
Taqi Jaffri
Co-founder, Head of Product
Principal Product Manager at
Microsoft; co-created the AI driving
Microsoft presence in physical stores.
Challenge 1: Real documents are more than flat text
Real Documents are Structurally Complex: Headings, Lists, Tables, Headers/Footers, Complex Reading Orders, Figures, etc.
Mitigations:
1. Structurally chunk the
document to find
structural elements.
2. Stitch together reading
order flow with language
models.
3. Sub-chunk for RAG using
document structural
elements rather than
simple tiling or text
splitting.
Challenge 2: Documents are Knowledge Graphs
Document structures and spatial relationships contain semantics that are often lost in simple text-based retrieval
Mitigations:
1. XML hierarchical
Knowledge Graph
representation for
inherent per-
document
semantics added to
vector store for
RAG
2. Schema
normalization
across sets of
similar documents
Example XML hierarchical Knowledge Graph
Represents structural relationships as well as per-chunk semantic labels
Retrieval Augmented Generation (RAG) w/ Semantic Chunk
Metadata from Docugami XML
Vector DB
Knowledge Base e.g.,
business documents
Step 1: Indexing / Ingestion
Embedding Model
converts chunks to
embeddings
Data Loader reads
text from
Knowledge Base
and creates chunks
Vector DB offers
fast lookup of
chunks
semantically
similar to any
given string, e.g.
a question
Question
Step 2: Querying
Similar Chunks in Knowledge
Base (possibly contain
answers)
-- improved with additional
information from Docugami
XML Knowledge Graph
Chat
History
Answer
(sourced in
Knowledge
Base)
Metadata from
Docugami XML
LangSmith
Run
traces
Challenges 1 & 2
CODE WALK-THROUGH
https://rebrand.ly/docugami_semantic_rag
Challenge 3: Building Complex Chains with the LangChain
Expression Language
Real-world chains can get complicated with parallel branches, output parsers, few shot examples, conditional sub-chains, etc.
Question
“What were the
total midmarket
gross sales for
Mexico in
2014?”
Generate
SQL Query
SQL DB
(read-only)
Table
schema and
sample rows
Run SQL
Query
Explainers
(in parallel)
Result
Explainer
Query
Explainer
(for non-
technical
users)
SELECT SUM("Gross
Sales") FROM
financial_data WHERE
Segment = "Midmarket"
AND Country = "Mexico"
AND Year = 2014;
Syntax
Error?
(pysqlite3.dbapi2.O
perationalError) no
such table:
financial_data
Attempt
Fixup
SELECT SUM("Gross
Sales") FROM "Financial
Data" WHERE Segment =
"Midmarket" AND Country
= "Mexico" AND Year =
2014;
"[(451890.0,)]"
The total
midmarket gross
sales for Mexico
in 2014 were
$451,890.
Sum of 'Gross
Sales' for the
'Midmarket' segment
in Mexico for the
year 2014.
Sample code: https://rebrand.ly/docugami_complex_chain
Challenge 3
LANGSMITH WALK-THROUGH
https://smith.langchain.com/public/ab8ef1ec-46d1-4d1f-980a-
6dcc8f1943e0/r
Challenge 4: Debugging Complex Chain Failures in Production
Identifying what went wrong, where…
Common Failures:
1. Invalid SQL that could not
be automatically fixed,
leading to chain failure
2. Token overflow (table
schema too large)
3. LLM rate limit reached
4. LLM call failed with GPU
OOM (for self-hosted
models)
5. Exception in custom python
RunnableLambda
6. Exception in custom output
parser
TIP: If you have a link to a detailed call stack for an exception, you can add it to the failed run as
metadata for later investigation. You can also add other information useful to investigate the
failure later.
TIP: Name all runnable lambdas and pass config to conditionally-invoked runnables to correctly link and
name. Reference: cookbook | example
1. Deploy Model (we self-host a custom LLM)
2. Regularly look at failed and user-disliked runs
3. Add some problematic runs to dataset
a) Failed runs e.g. with syntax errors
b) Runs with negative user feedback
4. Fix runs in dataset
a) Tip: use larger LLM to propose fixes
b) Tip: validate that fixes are syntactically correct e.g. by checking SQL syntax
5. Add some examples to few shot set for in-context learning
6. Fine tune model on updated dataset
7. Redeploy (go back to #1)
Summary: Docugami’s End to end LLM Ops with LangChain
+ LangSmith
Q&A
The Document Engineering Company
Docugami API
1.Docs: https://help.docugami.com/home/docugami-api
2.Things to try:
a) Upload docs (DOCX, DOC, digital or scanned PDF)
b) Download processed XML (with semantic metadata tags)
c) Build reports to identify key chunks
d) Use the LlamaHub Docugami loader to load docs for RAG, with
report metadata
3.Free trials available!
Example: Extracting Custom Data from a Document Set
Improved significantly via structural chunking and key metadata associated with all chunks
Example: Authoring Assistance Based on Your Documents
Improved significantly via structural chunking and key metadata associated with all chunks
Example: Workflows Triggered by Auto-Extracted Data
Improved significantly via structural chunking and key metadata associated with all chunks
1 de 18

Recomendados

5010 por
50105010
5010Arjun Kumar Marya
103 vistas7 diapositivas
Reengineering PDF-Based Documents Targeting Complex Software Specifications por
Reengineering PDF-Based Documents Targeting Complex Software SpecificationsReengineering PDF-Based Documents Targeting Complex Software Specifications
Reengineering PDF-Based Documents Targeting Complex Software SpecificationsMoutasm Tamimi
163 vistas30 diapositivas
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s... por
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
255 vistas16 diapositivas
Advantages And Disadvantages Of Text Clustering por
Advantages And Disadvantages Of Text ClusteringAdvantages And Disadvantages Of Text Clustering
Advantages And Disadvantages Of Text ClusteringSandra Anderson
3 vistas50 diapositivas
Introductionto Xm Lmessaging por
Introductionto Xm LmessagingIntroductionto Xm Lmessaging
Introductionto Xm LmessagingLiquidHub
416 vistas22 diapositivas
Tech challenges in a large scale agile project por
Tech challenges in a large scale agile projectTech challenges in a large scale agile project
Tech challenges in a large scale agile projectHarald Soevik
565 vistas10 diapositivas

Más contenido relacionado

Similar a LangChain + Docugami Webinar

Application Of A Solid Foundation por
Application Of A Solid FoundationApplication Of A Solid Foundation
Application Of A Solid FoundationDawn Robertson
2 vistas77 diapositivas
Use Case Patterns for LLM Applications (1).pdf por
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
243 vistas50 diapositivas
Document Based Data Modeling Technique por
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
4 vistas77 diapositivas
Google File System por
Google File SystemGoogle File System
Google File Systemvivatechijri
52 vistas6 diapositivas
A Case Study Of A Reusable Component Collection por
A Case Study Of A Reusable Component CollectionA Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionJennifer Strong
5 vistas6 diapositivas
Tech leaders guide to effective building of machine learning products por
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
2.2K vistas58 diapositivas

Similar a LangChain + Docugami Webinar(20)

Application Of A Solid Foundation por Dawn Robertson
Application Of A Solid FoundationApplication Of A Solid Foundation
Application Of A Solid Foundation
Dawn Robertson2 vistas
Use Case Patterns for LLM Applications (1).pdf por M Waleed Kadous
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
M Waleed Kadous243 vistas
Document Based Data Modeling Technique por Carmen Sanborn
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
Carmen Sanborn4 vistas
A Case Study Of A Reusable Component Collection por Jennifer Strong
A Case Study Of A Reusable Component CollectionA Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component Collection
Jennifer Strong5 vistas
Tech leaders guide to effective building of machine learning products por Gianmario Spacagna
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna2.2K vistas
Practical machine learning por Faizan Javed
Practical machine learningPractical machine learning
Practical machine learning
Faizan Javed250 vistas
SURE Research Report por Alex Sumner
SURE Research ReportSURE Research Report
SURE Research Report
Alex Sumner94 vistas
Adopting AnswerModules ModuleSuite por AnswerModules
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
AnswerModules426 vistas
Mit302 web technologies por smumbahelp
Mit302 web technologiesMit302 web technologies
Mit302 web technologies
smumbahelp125 vistas
Designing a Generative AI QnA solution with Proprietary Enterprise Business K... por IRJET Journal
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal28 vistas
Building and deploying LLM applications with Apache Airflow por Kaxil Naik
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik101 vistas

Último

How Workforce Management Software Empowers SMEs | TraQSuite por
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteTraQSuite
6 vistas3 diapositivas
Bootstrapping vs Venture Capital.pptx por
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptxZeljko Svedic
15 vistas17 diapositivas
Page Object Model por
Page Object ModelPage Object Model
Page Object Modelartembondar5
6 vistas5 diapositivas
JioEngage_Presentation.pptx por
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
8 vistas4 diapositivas
tecnologia18.docx por
tecnologia18.docxtecnologia18.docx
tecnologia18.docxnosi6702
5 vistas5 diapositivas
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... por
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
35 vistas124 diapositivas

Último(20)

How Workforce Management Software Empowers SMEs | TraQSuite por TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 vistas
Bootstrapping vs Venture Capital.pptx por Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 vistas
JioEngage_Presentation.pptx por admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254558 vistas
tecnologia18.docx por nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67025 vistas
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... por Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 vistas
Transport Management System - Shipment & Container Tracking por Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 5 vistas
Quality Engineer: A Day in the Life por John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 vistas
aATP - New Correlation Confirmation Feature.pptx por EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1205 vistas
Ports-and-Adapters Architecture for Embedded HMI por Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert33 vistas
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... por NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 vistas
Dapr Unleashed: Accelerating Microservice Development por Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski15 vistas
Understanding HTML terminology por artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar57 vistas
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... por Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers42 vistas
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... por TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 vistas

LangChain + Docugami Webinar

  • 1. The Document Engineering Company LangChain Webinar: Lessons from Deploying LLMs with LangSmith Jean Paoli @jeanpaoli Taqi Jaffri @tjaffri Mike Palmer Zubin Wadia @zubinwadia
  • 2. Topics 1. Intros 2. Real-World Challenges using LLMs with Documents i. Real documents are more than flat Text ii. Documents are Knowledge Graphs iii. Building Complex Chains with the LangChain Expression Language iv. Debugging Complex Chain Failures in Production 3. Summary: End to end LLM Ops … deploy / run / trace / correct / finetune / repeat
  • 3. Who we are... • Generative AI for Business Documents. Founded 2018. • Leverage open source and our own LLMs trained on millions of business documents to create a full XML data representation of complex documents in their entirety, delivering immediate value to frontline business users. • Document as data can be used to generate insights and reports, create new documents, or drive line- of-business applications. Free trial at www.docugami.com/trial • Customers in many sectors, including insurance, real estate, health, professional services. Jean Paoli Co-founder, CEO Pioneer in document engineering; co-creator of XML, .docx, .xlsx, .pptx; started four $1B+ businesses at Microsoft Mike Palmer Co-founder, Head of Technologies Senior engineering manager at Microsoft; co-created Microsoft InfoPath. Zubin Wadia Product Manager Making documents matter to people and systems – 100+ products over 20 years. MIT, ImageWork, CiviGuard. Taqi Jaffri Co-founder, Head of Product Principal Product Manager at Microsoft; co-created the AI driving Microsoft presence in physical stores.
  • 4. Challenge 1: Real documents are more than flat text Real Documents are Structurally Complex: Headings, Lists, Tables, Headers/Footers, Complex Reading Orders, Figures, etc. Mitigations: 1. Structurally chunk the document to find structural elements. 2. Stitch together reading order flow with language models. 3. Sub-chunk for RAG using document structural elements rather than simple tiling or text splitting.
  • 5. Challenge 2: Documents are Knowledge Graphs Document structures and spatial relationships contain semantics that are often lost in simple text-based retrieval Mitigations: 1. XML hierarchical Knowledge Graph representation for inherent per- document semantics added to vector store for RAG 2. Schema normalization across sets of similar documents
  • 6. Example XML hierarchical Knowledge Graph Represents structural relationships as well as per-chunk semantic labels
  • 7. Retrieval Augmented Generation (RAG) w/ Semantic Chunk Metadata from Docugami XML Vector DB Knowledge Base e.g., business documents Step 1: Indexing / Ingestion Embedding Model converts chunks to embeddings Data Loader reads text from Knowledge Base and creates chunks Vector DB offers fast lookup of chunks semantically similar to any given string, e.g. a question Question Step 2: Querying Similar Chunks in Knowledge Base (possibly contain answers) -- improved with additional information from Docugami XML Knowledge Graph Chat History Answer (sourced in Knowledge Base) Metadata from Docugami XML LangSmith Run traces
  • 8. Challenges 1 & 2 CODE WALK-THROUGH https://rebrand.ly/docugami_semantic_rag
  • 9. Challenge 3: Building Complex Chains with the LangChain Expression Language Real-world chains can get complicated with parallel branches, output parsers, few shot examples, conditional sub-chains, etc. Question “What were the total midmarket gross sales for Mexico in 2014?” Generate SQL Query SQL DB (read-only) Table schema and sample rows Run SQL Query Explainers (in parallel) Result Explainer Query Explainer (for non- technical users) SELECT SUM("Gross Sales") FROM financial_data WHERE Segment = "Midmarket" AND Country = "Mexico" AND Year = 2014; Syntax Error? (pysqlite3.dbapi2.O perationalError) no such table: financial_data Attempt Fixup SELECT SUM("Gross Sales") FROM "Financial Data" WHERE Segment = "Midmarket" AND Country = "Mexico" AND Year = 2014; "[(451890.0,)]" The total midmarket gross sales for Mexico in 2014 were $451,890. Sum of 'Gross Sales' for the 'Midmarket' segment in Mexico for the year 2014. Sample code: https://rebrand.ly/docugami_complex_chain
  • 11. Challenge 4: Debugging Complex Chain Failures in Production Identifying what went wrong, where… Common Failures: 1. Invalid SQL that could not be automatically fixed, leading to chain failure 2. Token overflow (table schema too large) 3. LLM rate limit reached 4. LLM call failed with GPU OOM (for self-hosted models) 5. Exception in custom python RunnableLambda 6. Exception in custom output parser TIP: If you have a link to a detailed call stack for an exception, you can add it to the failed run as metadata for later investigation. You can also add other information useful to investigate the failure later. TIP: Name all runnable lambdas and pass config to conditionally-invoked runnables to correctly link and name. Reference: cookbook | example
  • 12. 1. Deploy Model (we self-host a custom LLM) 2. Regularly look at failed and user-disliked runs 3. Add some problematic runs to dataset a) Failed runs e.g. with syntax errors b) Runs with negative user feedback 4. Fix runs in dataset a) Tip: use larger LLM to propose fixes b) Tip: validate that fixes are syntactically correct e.g. by checking SQL syntax 5. Add some examples to few shot set for in-context learning 6. Fine tune model on updated dataset 7. Redeploy (go back to #1) Summary: Docugami’s End to end LLM Ops with LangChain + LangSmith
  • 13. Q&A
  • 15. Docugami API 1.Docs: https://help.docugami.com/home/docugami-api 2.Things to try: a) Upload docs (DOCX, DOC, digital or scanned PDF) b) Download processed XML (with semantic metadata tags) c) Build reports to identify key chunks d) Use the LlamaHub Docugami loader to load docs for RAG, with report metadata 3.Free trials available!
  • 16. Example: Extracting Custom Data from a Document Set Improved significantly via structural chunking and key metadata associated with all chunks
  • 17. Example: Authoring Assistance Based on Your Documents Improved significantly via structural chunking and key metadata associated with all chunks
  • 18. Example: Workflows Triggered by Auto-Extracted Data Improved significantly via structural chunking and key metadata associated with all chunks