Using RAG to create your own Podcast conversations.pdf

Retrieval
Augmented
Generation
Making a Podcast
“interactive”
Richard Rodger, Voxgig

The Culture’s inhabitants “could
record their mind-states,
effectively taking a reading of the
person’s personality which could be
stored, duplicated, read, transmitted
and installed …”
Iain M Banks, Look to Windward

1. Design
2. Code
3. Practicalities

“Using the podcast audio
recordings, build a chat
interface that responds like a
podcast guest”

- ~130 episodes
- ~30 minutes per episode
- ~7000 words per transcript
- 2 new episodes per week
- Metadata:
- guest details
- show notes

1. Ingestion
Get the audio and metadata into
the “AI”
2. Querying
Get the conversational responses
out of the “AI”

Seems like hard work…?
Why not just concatenate all
transcripts, and use that as
the context prompt?

“Vector Embedding”
Text -> Concepts
(using a “Model”)

The vector database does the
embedding for you

Large Language Models
(LLMs)
- Very large neural networks
- Use the Transformer* architecture
- A vector embedding
- Cares about word order
- Cares about word context
* Attention is All You Need, Vaswani et al. 2017

Retrieval Augmented Generation (RAG)

// Microservice messages
aim: ingest: {
// Get transcription
transcribe: episode: {}
// Do “embedding”
ingest: transcript: {}
}
aim: chat: {
// Use prompt context to reply
chat: query: {}
}

// aim:ingest, transcribe:episode
// lambda called when there’s
// a new audio file in s3
const audio = loadAudio(event)
// Use deepgram.com to get the conversation text
const result = await deepgram
.listen
.prerecorded
.transcribeFile(
audio.content,
options
)
// Save transcript to s3
saveTranscript(result)

// aim:ingest, ingest:transcript
// lambda called when there’s
// a new transcript file in s3
const transcript = loadTranscript(event)
// Split into “chunks”, each to be added to
// an OpenSearch vector collection
const chunks = chunkify(transcript)
for(let chunk in chunks) {
// Call AWS Bedrock, specify model
const embedding =
bedrockClient.embed(chunk, model)
// Store embedding vectors in OpenSearch
openSearchClient.store(embedding)
}

// aim:chat, chat:query
const query = event.body.query // HTTP POST
// Call AWS Bedrock, specify model
const embedding =
bedrockClient.embed(query, model)
// Use embedding to get context chunk text
const context = openSearchClient
.search(embedding)
// Do prompt engineering here!
const prompt = "Answer question with Context: "
+ context + "nQuestion: " + query
// Get answer!
const answer = bedrockClient.invoke(prompt)

// Pro Tip: use a REPL!
wovs/pdm-local> aim:chat,chat:query,query:"what
is developer relations?"
{
ok: true,
why: '',
answer: "Developer Relations is the practice of
building and maintaining relationships between
companies and developers...The goal of Developer
Relations is to make the company's products as
easy to use, understand, and integrate into a
developer's workflow as possible."
}

// Open Source
// Reference implementation
github.com/mikaelvesavuori/bedrock-rag-demo
// Voxgig microservice implementation
github.com/voxgig/podmind
// Blog post (next week)
richardrodger.com

Practicalities
…, you are face to face with the
champion privy builder of
Sangamon County

It’s the Wild West
“I’ve got my stuff rigged to hit mixtral-8x7, and dolphin locally, and
3.5-turbo, and the 4-series preview all with easy comparison in
emacs and stuff, and in fairness the 4.5-preview is starting to show
some edge on 8x7 …
Until I realized Perplexity will give you a decent amount of Mistral
Medium for free ….
Who is sama kidding they’re still leading here? Mistral Medium
destroys the 4.5 preview. And Perplexity wouldn’t be giving it away
in any quantity if it had a cost structure like 4.5 …
Mistral is the new “RenTech of AI”, DPO and Alibi and sliding window
and modern mixtures are well-understood so the money is in the lag
between some new edge and TheBloke having it quantized for a Mac
Mini or 4070 Super …
https://news.ycombinator.com/item?id=38948291

It’s the Wild West
Here's a glossary to understand this post:
- mixtral-8x7 or 8x7: Open source model by Mistral AI.
- Dolphin: An uncensored version of the mistral model
- 3.5-turbo: GPT-3.5 Turbo, the cheapest API from OpenAI
- 4-series preview OR "4.5 preview": GPT-4 Turbo, the most capable API from OpenAI
- mistral-medium: A new model by Mistral AI that they are only serving through AI. It's in
private beta and there's a waiting list to access it.
- Perplexity: A new search engine that is challenging Google by applying LLM to search
- Sama: Sam Altman, CEO of OpenAI
- RenTech: Renaissance Technologies, a secretive hedge fund known for delivering
impressive returns improving on the work of others
- DPO: Direct Preference Optimization. It is a technique that leverages AI feedback to
optimize the performance of smaller, open-source models like Zephyr-7B1.
- Alibi: a Python library that provides tools for machine learning model inspection and
interpretation2. It can be used to explain the predictions of any black-box model,
including LLMs.
- Sliding window: a type of attention mechanism introduced by Mistral-7B3. It is used to
support longer sequences in LLMs.
- Modern mixtures: The process of using multiple models together, like "mixtral" is a
mixture of several mistral models.
- TheBloke: Open source developer that is very quick at quantizing all new models that
come out
- Quantize: Decreasing memory requirements of a new model by decreasing the precision
of weights, typically with just minor performance degradation.
- 4070 Super: NVIDIA 4070 Super, new graphics card announced just a week ago

https://github.com/mikaelves
avuori/bedrock-rag-demo

So you want to deliver a RAG
project…

Do you want high quality
answers?
ann-benchmarks.com

Do you like unrealistic
expectations?

Do you like being unable to solve
fundamental limitations? (maybe)
…my experience over the past few
months suggests that for system
programming, LLMs almost never
provide acceptable solutions…
antirez.com/news/140
(Salvatore Sanfilippo - wrote Redis)

I am not an animal brain, I am
not even some attempt to
produce an AI through
software running on a
computer. I am a Culture Mind.
We are close to gods, … we
are quicker; we live faster and
more completely than you do,
with so many more senses,
such a greater store of
memories and at such a fine
level of detail.
- Look to Windward,
Iain M Banks

Using RAG to create your own Podcast conversations.pdf

Recomendados

Recomendados

Más contenido relacionado

Similar a Using RAG to create your own Podcast conversations.pdf

Similar a Using RAG to create your own Podcast conversations.pdf (20)

Más de Richard Rodger

Más de Richard Rodger (20)

Último

Último (20)

Using RAG to create your own Podcast conversations.pdf