SlideShare una empresa de Scribd logo
1 de 22
Mapping the Pubmed data
under different sub-topics
Email: venkykasprov@gmail.com
Venkatasubramani Karthikeyan
PROBLEM STATEMENT
Analogy Implementation
PROBLEM SOLVING APPROACH
Traditional approach
Data cleaning
Bag of words
Classification and clustering
Pre-Trained Model approach
No data cleaning required
BERT, BART & DEBARTA
ORIGINAL CATEGORIES CATEGORIES CONSIDERED
Traditional
approach
• Bag of words
Traditional
approach
• Bag of words
• After Remove stop words and stemming
• Using count vectorizer
Traditional
approach
• Classification
• Logistic regression
Traditional
approach
• Classification
• Logistic regression (cont)
Traditional
approach
• Classification (cont)
• Decision Tree
Entropy
Information Gain
Traditional
approach
• Classification (cont)
• Decision Tree
Traditional
approach
• Classification (cont)
• Random Forest
Traditional
approach
• Classification (cont)
• Random Forest
Traditional
approach
• Clustering
Traditional
approach
• Clustering
Traditional
approach
• Clustering (cont)
Hierarchical clustering HDBSCAN
Traditional
approach
• Clustering (cont)
Pre-trained
model approach Transformer
Pre-trained
model approach HuggingFace Transformers
Pre-trained
model approach
• BERT (Bidirectional Encoder Representations
from Transformers)
• Developed by Google in 2018.
• Revolutionary for its bidirectional training approach.
• BERT is pre-trained on a large corpus of unlabeled text
data.
id parent_title level_3 labels scores
126 293Big Data 0Bio-IT 0.645831
127 293Big Data 1Big Data 0.612736
128 293Big Data 2
Healthcare
Technology
0.602229
129 293Big Data 3
Disease
Processes
0.521784
• 🎉 40th Anniversary Special: IBM unveils the
eServer zSeries 890 (z890) mainframe, celebrating four
decades of their System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking
tech aimed at simplifying IT environments, tailored especially
for medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in
flexibility, virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with
28 capacity settings, letting businesses align server capacity
with specific needs.
• 📦 Advanced Storage: Introduction of
IBM TotalStorage Enterprise Storage Server 750, bringing
enterprise-grade storage capabilities to mid-sized businesses.
Pre-trained
model approach
• BART (Bidirectional and Auto-Regressive
Transformers)
• Developed by Facebook in 2019.
• BART is a denoising autoencoder for pretraining
sequence-to-sequence models.
• It corrupts the input by masking and then learns to
reconstruct the original data.
• 🎉 40th Anniversary Special: IBM unveils the eServer zSeries
890 (z890) mainframe, celebrating four decades of their
System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking tech
aimed at simplifying IT environments, tailored especially for
medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in flexibility,
virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with 28
capacity settings, letting businesses align server capacity with
specific needs.
• 📦 Advanced Storage: Introduction of IBM TotalStorage
Enterprise Storage Server 750, bringing enterprise-grade
storage capabilities to mid-sized businesses.
id parent_title level_3 labels scores
126 293Big Data 0Big Data 0.677244
127 293Big Data 1Proteomics 0.636867
128 293Big Data 2
Disease
Processes
0.511485
129 293Big Data 3Bio-IT 0.480203
Pre-trained
model approach
• DeBERTa (Decoding-enhanced BERT with
disentangled attention)
• Developed by Microsoft in 2020.
• Improves BERT by disentangling the content and position
information in the self-attention mechanism.
• 🎉 40th Anniversary Special: IBM unveils the
eServer zSeries 890 (z890) mainframe, celebrating four decades
of their System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking
tech aimed at simplifying IT environments, tailored especially
for medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in
flexibility, virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with
28 capacity settings, letting businesses align server capacity
with specific needs.
• 📦 Advanced Storage: Introduction of
IBM TotalStorage Enterprise Storage Server 750, bringing
enterprise-grade storage capabilities to mid-sized businesses.
id parent_title
level_
3
labels scores
126 293Big Data 0Big Data 0.808621
127 293Big Data 1Cell Biology 0.764249
128 293Big Data 2
Food
Bioscience
0.754545
129 293Big Data 3Green Biology 0.700146
if questions==True:
Ask()
else:
Thank_you()

Más contenido relacionado

Similar a Mapping the pubmed data under different suptopics using NLP.pptx

Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseStorage Switzerland
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahMatěj Jakimov
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation Joe Krotz
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3xKinAnx
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesAshnikbiz
 
SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)ITCamp
 
Sirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehSirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehEric Herzog
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryMemVerge
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataDenodo
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire NetApp
 
Techgate solution sets 2014
Techgate solution sets 2014Techgate solution sets 2014
Techgate solution sets 2014Techgate plc
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Ali Mirfallah
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...Brian Boyd
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? Hitachi Vantara
 
Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Joe Krotz
 

Similar a Mapping the pubmed data under different suptopics using NLP.pptx (20)

Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the Enterprise
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson Darkwah
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional Databases
 
SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)
 
Sirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehSirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 eh
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than Memory
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP Data
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire
 
FS900 Data Sheet.PDF
FS900 Data Sheet.PDFFS900 Data Sheet.PDF
FS900 Data Sheet.PDF
 
Techgate solution sets 2014
Techgate solution sets 2014Techgate solution sets 2014
Techgate solution sets 2014
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
 
Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016
 

Último

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 

Último (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 

Mapping the pubmed data under different suptopics using NLP.pptx

  • 1. Mapping the Pubmed data under different sub-topics Email: venkykasprov@gmail.com Venkatasubramani Karthikeyan
  • 3. PROBLEM SOLVING APPROACH Traditional approach Data cleaning Bag of words Classification and clustering Pre-Trained Model approach No data cleaning required BERT, BART & DEBARTA
  • 6. Traditional approach • Bag of words • After Remove stop words and stemming • Using count vectorizer
  • 9. Traditional approach • Classification (cont) • Decision Tree Entropy Information Gain
  • 19. Pre-trained model approach • BERT (Bidirectional Encoder Representations from Transformers) • Developed by Google in 2018. • Revolutionary for its bidirectional training approach. • BERT is pre-trained on a large corpus of unlabeled text data. id parent_title level_3 labels scores 126 293Big Data 0Bio-IT 0.645831 127 293Big Data 1Big Data 0.612736 128 293Big Data 2 Healthcare Technology 0.602229 129 293Big Data 3 Disease Processes 0.521784 • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses.
  • 20. Pre-trained model approach • BART (Bidirectional and Auto-Regressive Transformers) • Developed by Facebook in 2019. • BART is a denoising autoencoder for pretraining sequence-to-sequence models. • It corrupts the input by masking and then learns to reconstruct the original data. • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses. id parent_title level_3 labels scores 126 293Big Data 0Big Data 0.677244 127 293Big Data 1Proteomics 0.636867 128 293Big Data 2 Disease Processes 0.511485 129 293Big Data 3Bio-IT 0.480203
  • 21. Pre-trained model approach • DeBERTa (Decoding-enhanced BERT with disentangled attention) • Developed by Microsoft in 2020. • Improves BERT by disentangling the content and position information in the self-attention mechanism. • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses. id parent_title level_ 3 labels scores 126 293Big Data 0Big Data 0.808621 127 293Big Data 1Cell Biology 0.764249 128 293Big Data 2 Food Bioscience 0.754545 129 293Big Data 3Green Biology 0.700146