SlideShare una empresa de Scribd logo
1 de 22
Mapping the Pubmed data
under different sub-topics
Email: venkykasprov@gmail.com
Venkatasubramani Karthikeyan
PROBLEM STATEMENT
Analogy Implementation
PROBLEM SOLVING APPROACH
Traditional approach
Data cleaning
Bag of words
Classification and clustering
Pre-Trained Model approach
No data cleaning required
BERT, BART & DEBARTA
ORIGINAL CATEGORIES CATEGORIES CONSIDERED
Traditional
approach
• Bag of words
Traditional
approach
• Bag of words
• After Remove stop words and stemming
• Using count vectorizer
Traditional
approach
• Classification
• Logistic regression
Traditional
approach
• Classification
• Logistic regression (cont)
Traditional
approach
• Classification (cont)
• Decision Tree
Entropy
Information Gain
Traditional
approach
• Classification (cont)
• Decision Tree
Traditional
approach
• Classification (cont)
• Random Forest
Traditional
approach
• Classification (cont)
• Random Forest
Traditional
approach
• Clustering
Traditional
approach
• Clustering
Traditional
approach
• Clustering (cont)
Hierarchical clustering HDBSCAN
Traditional
approach
• Clustering (cont)
Pre-trained
model approach Transformer
Pre-trained
model approach HuggingFace Transformers
Pre-trained
model approach
• BERT (Bidirectional Encoder Representations
from Transformers)
• Developed by Google in 2018.
• Revolutionary for its bidirectional training approach.
• BERT is pre-trained on a large corpus of unlabeled text
data.
id parent_title level_3 labels scores
126 293Big Data 0Bio-IT 0.645831
127 293Big Data 1Big Data 0.612736
128 293Big Data 2
Healthcare
Technology
0.602229
129 293Big Data 3
Disease
Processes
0.521784
• 🎉 40th Anniversary Special: IBM unveils the
eServer zSeries 890 (z890) mainframe, celebrating four
decades of their System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking
tech aimed at simplifying IT environments, tailored especially
for medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in
flexibility, virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with
28 capacity settings, letting businesses align server capacity
with specific needs.
• 📦 Advanced Storage: Introduction of
IBM TotalStorage Enterprise Storage Server 750, bringing
enterprise-grade storage capabilities to mid-sized businesses.
Pre-trained
model approach
• BART (Bidirectional and Auto-Regressive
Transformers)
• Developed by Facebook in 2019.
• BART is a denoising autoencoder for pretraining
sequence-to-sequence models.
• It corrupts the input by masking and then learns to
reconstruct the original data.
• 🎉 40th Anniversary Special: IBM unveils the eServer zSeries
890 (z890) mainframe, celebrating four decades of their
System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking tech
aimed at simplifying IT environments, tailored especially for
medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in flexibility,
virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with 28
capacity settings, letting businesses align server capacity with
specific needs.
• 📦 Advanced Storage: Introduction of IBM TotalStorage
Enterprise Storage Server 750, bringing enterprise-grade
storage capabilities to mid-sized businesses.
id parent_title level_3 labels scores
126 293Big Data 0Big Data 0.677244
127 293Big Data 1Proteomics 0.636867
128 293Big Data 2
Disease
Processes
0.511485
129 293Big Data 3Bio-IT 0.480203
Pre-trained
model approach
• DeBERTa (Decoding-enhanced BERT with
disentangled attention)
• Developed by Microsoft in 2020.
• Improves BERT by disentangling the content and position
information in the self-attention mechanism.
• 🎉 40th Anniversary Special: IBM unveils the
eServer zSeries 890 (z890) mainframe, celebrating four decades
of their System/360 mainframe legacy.
• 💡 Breakthrough Tech: z890 introduces groundbreaking
tech aimed at simplifying IT environments, tailored especially
for medium-sized businesses.
• 💪 Powerhouse Performance: z890 offers almost double the
processing power of the preceding z800 series but starts 30%
smaller in capacity.
• 🔒 Enhanced Features: Elevated standards in
flexibility, virtualization, automation, security, and scalability.
• 🔄 Customized Capacity: Available as a single model with
28 capacity settings, letting businesses align server capacity
with specific needs.
• 📦 Advanced Storage: Introduction of
IBM TotalStorage Enterprise Storage Server 750, bringing
enterprise-grade storage capabilities to mid-sized businesses.
id parent_title
level_
3
labels scores
126 293Big Data 0Big Data 0.808621
127 293Big Data 1Cell Biology 0.764249
128 293Big Data 2
Food
Bioscience
0.754545
129 293Big Data 3Green Biology 0.700146
if questions==True:
Ask()
else:
Thank_you()

Más contenido relacionado

Similar a Mapping the pubmed data under different suptopics using NLP.pptx

Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseStorage Switzerland
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahMatěj Jakimov
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation Joe Krotz
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3xKinAnx
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesAshnikbiz
 
SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)ITCamp
 
Sirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehSirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehEric Herzog
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryMemVerge
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataDenodo
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire NetApp
 
Techgate solution sets 2014
Techgate solution sets 2014Techgate solution sets 2014
Techgate solution sets 2014Techgate plc
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Ali Mirfallah
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...Brian Boyd
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? Hitachi Vantara
 
Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Joe Krotz
 

Similar a Mapping the pubmed data under different suptopics using NLP.pptx (20)

Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the Enterprise
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson Darkwah
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional Databases
 
SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)SQL Server 2014 for Developers (Cristian Lefter)
SQL Server 2014 for Developers (Cristian Lefter)
 
Sirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 ehSirius ibm storage & platform computing solutions 080515 eh
Sirius ibm storage & platform computing solutions 080515 eh
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than Memory
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP Data
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire
 
FS900 Data Sheet.PDF
FS900 Data Sheet.PDFFS900 Data Sheet.PDF
FS900 Data Sheet.PDF
 
Techgate solution sets 2014
Techgate solution sets 2014Techgate solution sets 2014
Techgate solution sets 2014
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor? VSP G1000 Checklist - 7 Q's to ask your storage vendor?
VSP G1000 Checklist - 7 Q's to ask your storage vendor?
 
Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016Storage Cloud and Spectrum deck March 2016
Storage Cloud and Spectrum deck March 2016
 

Último

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Payal Garg #K09
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样wsppdmt
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxStephen266013
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 

Último (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 

Mapping the pubmed data under different suptopics using NLP.pptx

  • 1. Mapping the Pubmed data under different sub-topics Email: venkykasprov@gmail.com Venkatasubramani Karthikeyan
  • 3. PROBLEM SOLVING APPROACH Traditional approach Data cleaning Bag of words Classification and clustering Pre-Trained Model approach No data cleaning required BERT, BART & DEBARTA
  • 6. Traditional approach • Bag of words • After Remove stop words and stemming • Using count vectorizer
  • 9. Traditional approach • Classification (cont) • Decision Tree Entropy Information Gain
  • 19. Pre-trained model approach • BERT (Bidirectional Encoder Representations from Transformers) • Developed by Google in 2018. • Revolutionary for its bidirectional training approach. • BERT is pre-trained on a large corpus of unlabeled text data. id parent_title level_3 labels scores 126 293Big Data 0Bio-IT 0.645831 127 293Big Data 1Big Data 0.612736 128 293Big Data 2 Healthcare Technology 0.602229 129 293Big Data 3 Disease Processes 0.521784 • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses.
  • 20. Pre-trained model approach • BART (Bidirectional and Auto-Regressive Transformers) • Developed by Facebook in 2019. • BART is a denoising autoencoder for pretraining sequence-to-sequence models. • It corrupts the input by masking and then learns to reconstruct the original data. • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses. id parent_title level_3 labels scores 126 293Big Data 0Big Data 0.677244 127 293Big Data 1Proteomics 0.636867 128 293Big Data 2 Disease Processes 0.511485 129 293Big Data 3Bio-IT 0.480203
  • 21. Pre-trained model approach • DeBERTa (Decoding-enhanced BERT with disentangled attention) • Developed by Microsoft in 2020. • Improves BERT by disentangling the content and position information in the self-attention mechanism. • 🎉 40th Anniversary Special: IBM unveils the eServer zSeries 890 (z890) mainframe, celebrating four decades of their System/360 mainframe legacy. • 💡 Breakthrough Tech: z890 introduces groundbreaking tech aimed at simplifying IT environments, tailored especially for medium-sized businesses. • 💪 Powerhouse Performance: z890 offers almost double the processing power of the preceding z800 series but starts 30% smaller in capacity. • 🔒 Enhanced Features: Elevated standards in flexibility, virtualization, automation, security, and scalability. • 🔄 Customized Capacity: Available as a single model with 28 capacity settings, letting businesses align server capacity with specific needs. • 📦 Advanced Storage: Introduction of IBM TotalStorage Enterprise Storage Server 750, bringing enterprise-grade storage capabilities to mid-sized businesses. id parent_title level_ 3 labels scores 126 293Big Data 0Big Data 0.808621 127 293Big Data 1Cell Biology 0.764249 128 293Big Data 2 Food Bioscience 0.754545 129 293Big Data 3Green Biology 0.700146