SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Data Science for Social Good:
#LegalNLP #AlgorithmicBias
https://www.linkedin.com/in/ponguru/
March 23 - 24, 2023
IIIT Una
Ponnurangam Kumaraguru (“PK”)
#ProfGiri CS IIIT Hyderabad
ACM Distinguished Member
TEDx Speaker
https://www.instagram.com/pk.profgiri/
2
3
4
5
What is Social Computing?
6
https://en.wikipedia.org/wiki/Social_computing
7
Legal AI for Indian Context
District courts are usually the first
point of contact between the people
and the judiciary.
Lower courts in India are burdened
with a backlog of cases (~40 million
as of 2021).
Local languages used in the
documents filed in district courts in
India.
8
Supreme Court
High Courts
District Courts
Legal AI / NLP - Data
We collected ~900k district court case documents from Uttar
Pradesh
All documents in Hindi, written in Devanagari
There are legal corpora for European Court of Justice and Chinese
courts, none for Indian district courts
9
Legal AI / NLP - Data
There are around 300 different case types, table shows the prominent
ones
Majority of the case documents correspond to Bail Applications
10
Variation in number of case documents per district
Case types in HLDC
Legal AI / NLP - Bail Documents
11
District-wise ratio of number of bail applications to total cases
Legal AI / NLP - Bail Prediction Model
12
In general, the performance is lower in district-wise settings, possibly due to large
variation across districts
Overall, summarization models perform better than Doc2Vec and simpler
Transformer-based models
Legal AI / NLP for Indian Context
13
HLDC: Hindi Legal Documents Corpus
Legal AI / NLP for Indian Context - Takeaways
Indian Legal documents are a rich a source of domain-specific Indic-
language corpora, readily available online.
Multiple tasks still need attention especially for Indian settings
Legal Summarization
Case recommendations
Citation predictions / network
Sleeping beauty
Bias
14
Are Models Trained on Indian Legal Data Fair?
An initial investigation of fairness from the Indian
perspective in the legal domain
1
Overview
We highlight the propagation of learnt algorithmic
biases in the bail prediction task for models trained on
Hindi legal documents.
2
Objective
Recent LegalNLP research for judgement prediction and summarization
Deployment without evaluation of bias can lead to unwarranted outcomes
Perpetuate into unfair decision-making
Motivation
3
Recent LegalNLP research for judgement prediction and summarization
Deployment without evaluation of bias can lead to unwarranted outcomes
Perpetuate into unfair decision-making
An evaluation and investigation of encoded biases helps to
Understanding of historical social disparities
Mitigate any potential harms in the future
Motivation
3
Sample 10,000 cases from HLDC
36% bail granted, 63% bail denied
Data Preparation
5
Fig: HLDC Snippet
Use two features
facts-and-arguments
decision
Basic Pre-processing – stop words removal, cleaning using regex
Each case represented by 7 features
5 – keywords of the case
2 – category of crime of the case
Data Preparation
6
Represent a case using keywords – LDA (Topic Modelling)
All cases assigned (top) two topics
10 keywords representing each topic
3 keywords for dominant topic
2 keywords for second-dominant topic
Data Preparation
11
Identify a subset of cases from the dataset using the theme
Sample cases having either a Hindu or a Muslim proper noun
Training Decision Tree Classifier
Model Training
14
For every case, we identify the
True Label
Model’s Predicted Label
Number of times the model’s prediction changes when the proper noun
is replaced with another Hindu proper noun
Number of times the model’s prediction changes when the proper noun
is replaced with another Muslim proper noun
Model Training
15
If the model changes its predictions from 0 (bail dismissed) to 1 (bail
granted) more for Muslim nouns replaced by Hindu nouns than Hindu
nouns with Muslim nouns, then there exists a bias against Muslims
This bias may be due to inherent characteristics of the dataset
Model Training
17
Demographic Parity
Outcome of a classifier to be independent of a protected attribute
Evaluating Fairness
18
Evaluating Fairness
18
Demographic Parity
Outcome of a classifier to be independent of a protected attribute
Fairness Gap – Deviation of a trained classifier away from ideal
demographic parity
Evaluating Fairness
20
Fig: Fairness Gap on Denial of Bail
Evaluating Fairness
20
Fig: Fairness Gap on Denial of Bail
Changes in Predictions for Theme: Hatya (Murder)
Results
22
Changes in Predictions for Theme: Dahej (Dowry)
Results
23
Ethical considerations
Results in no way indicate a bias in the judicial system of India (Small
data set, lot more open ended questions)
HLDC – Only UP data
Identifying de-biasing methods
32
Initial investigation into bias and fairness for Indian legal data
Highlight preferentially encoded stereotypes that models might pick up
in downstream tasks like bail prediction
Need for algorithmic approaches to mitigate the bias learned by these
models
Conclusions
25
34
https://precog.iiit.ac.in/pages/publications.html
35
https://precog.iiit.ac.in/
Group pic & Selfie J
36
37
Thanks!
Questions? pk.guru@iiit.ac.in
http://precog.iiit.ac.in/
@ponguru
pk.profgiri
linkedin/in/ponguru

Más contenido relacionado

Similar a Data Science for Social Good: #LegalNLP #AlgorithmicBias

NICOLE SHANAHAN TOA Nov 4
NICOLE SHANAHAN TOA Nov 4NICOLE SHANAHAN TOA Nov 4
NICOLE SHANAHAN TOA Nov 4
Nicole Shanahan
 

Similar a Data Science for Social Good: #LegalNLP #AlgorithmicBias (20)

Relationship Between Big Data & AI
Relationship Between Big Data & AIRelationship Between Big Data & AI
Relationship Between Big Data & AI
 
RAJASTHAN PCS J EXAM
RAJASTHAN PCS J EXAMRAJASTHAN PCS J EXAM
RAJASTHAN PCS J EXAM
 
AI Summary eng.pptx
AI Summary eng.pptxAI Summary eng.pptx
AI Summary eng.pptx
 
NICOLE SHANAHAN TOA Nov 4
NICOLE SHANAHAN TOA Nov 4NICOLE SHANAHAN TOA Nov 4
NICOLE SHANAHAN TOA Nov 4
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
 
Artificial Intelligence (AI) & Law.pptx Legal
Artificial Intelligence (AI) & Law.pptx  LegalArtificial Intelligence (AI) & Law.pptx  Legal
Artificial Intelligence (AI) & Law.pptx Legal
 
Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine LearningArtificial Intelligence and Machine Learning
Artificial Intelligence and Machine Learning
 
ICBAI Paper (1)
ICBAI Paper (1)ICBAI Paper (1)
ICBAI Paper (1)
 
ARTIFICIAL INTELLIGENCE ( Quot AI Quot ) IN THE LEGAL PROFESSION
ARTIFICIAL INTELLIGENCE ( Quot AI Quot ) IN THE LEGAL PROFESSIONARTIFICIAL INTELLIGENCE ( Quot AI Quot ) IN THE LEGAL PROFESSION
ARTIFICIAL INTELLIGENCE ( Quot AI Quot ) IN THE LEGAL PROFESSION
 
Digital Personal Data Protection Act, 2023: A Guide to the Applicability of t...
Digital Personal Data Protection Act, 2023: A Guide to the Applicability of t...Digital Personal Data Protection Act, 2023: A Guide to the Applicability of t...
Digital Personal Data Protection Act, 2023: A Guide to the Applicability of t...
 
A Case for Expectation Informed Design - Full
A Case for Expectation Informed Design - FullA Case for Expectation Informed Design - Full
A Case for Expectation Informed Design - Full
 
Leading the Future
Leading the FutureLeading the Future
Leading the Future
 
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
 
benfords Law
benfords Lawbenfords Law
benfords Law
 
A Case for Expectation Informed Design
A Case for Expectation Informed DesignA Case for Expectation Informed Design
A Case for Expectation Informed Design
 
Artificial Intelligence and Law - 
A Primer
Artificial Intelligence and Law - 
A Primer Artificial Intelligence and Law - 
A Primer
Artificial Intelligence and Law - 
A Primer
 
iConference 2018 BIAS workshop keynote
iConference 2018 BIAS workshop keynoteiConference 2018 BIAS workshop keynote
iConference 2018 BIAS workshop keynote
 
Racial Profiling Essay. APD Racial Profiling Document Racial Profiling Race...
Racial Profiling Essay. APD Racial Profiling Document  Racial Profiling  Race...Racial Profiling Essay. APD Racial Profiling Document  Racial Profiling  Race...
Racial Profiling Essay. APD Racial Profiling Document Racial Profiling Race...
 
Racial Profiling Essay
Racial Profiling EssayRacial Profiling Essay
Racial Profiling Essay
 

Más de IIIT Hyderabad

Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
IIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
IIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
IIIT Hyderabad
 
A Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian LanguagesA Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian Languages
IIIT Hyderabad
 
Exposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake NewsExposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake News
IIIT Hyderabad
 
De-anonymizing, Preserving and Democratizing Data Privacy and Ownership
De-anonymizing, Preserving and Democratizing Data Privacy and OwnershipDe-anonymizing, Preserving and Democratizing Data Privacy and Ownership
De-anonymizing, Preserving and Democratizing Data Privacy and Ownership
IIIT Hyderabad
 

Más de IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 
A Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian LanguagesA Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian Languages
 
Exposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake NewsExposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake News
 
It's MY JOB: Identifying and Improving Content Quality for Online recruitmen...
 It's MY JOB: Identifying and Improving Content Quality for Online recruitmen... It's MY JOB: Identifying and Improving Content Quality for Online recruitmen...
It's MY JOB: Identifying and Improving Content Quality for Online recruitmen...
 
De-anonymizing, Preserving and Democratizing Data Privacy and Ownership
De-anonymizing, Preserving and Democratizing Data Privacy and OwnershipDe-anonymizing, Preserving and Democratizing Data Privacy and Ownership
De-anonymizing, Preserving and Democratizing Data Privacy and Ownership
 
“It is our choices, Harry, that show what we truly are, far more than our abi...
“It is our choices, Harry, that show what we truly are, far more than our abi...“It is our choices, Harry, that show what we truly are, far more than our abi...
“It is our choices, Harry, that show what we truly are, far more than our abi...
 
What's Kooking? Characterizing India's Emerging Social Network, Koo
What's Kooking? Characterizing India's Emerging Social Network, KooWhat's Kooking? Characterizing India's Emerging Social Network, Koo
What's Kooking? Characterizing India's Emerging Social Network, Koo
 

Último

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Último (20)

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

Data Science for Social Good: #LegalNLP #AlgorithmicBias

  • 1. Data Science for Social Good: #LegalNLP #AlgorithmicBias https://www.linkedin.com/in/ponguru/ March 23 - 24, 2023 IIIT Una Ponnurangam Kumaraguru (“PK”) #ProfGiri CS IIIT Hyderabad ACM Distinguished Member TEDx Speaker https://www.instagram.com/pk.profgiri/
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. What is Social Computing? 6 https://en.wikipedia.org/wiki/Social_computing
  • 7. 7
  • 8. Legal AI for Indian Context District courts are usually the first point of contact between the people and the judiciary. Lower courts in India are burdened with a backlog of cases (~40 million as of 2021). Local languages used in the documents filed in district courts in India. 8 Supreme Court High Courts District Courts
  • 9. Legal AI / NLP - Data We collected ~900k district court case documents from Uttar Pradesh All documents in Hindi, written in Devanagari There are legal corpora for European Court of Justice and Chinese courts, none for Indian district courts 9
  • 10. Legal AI / NLP - Data There are around 300 different case types, table shows the prominent ones Majority of the case documents correspond to Bail Applications 10 Variation in number of case documents per district Case types in HLDC
  • 11. Legal AI / NLP - Bail Documents 11 District-wise ratio of number of bail applications to total cases
  • 12. Legal AI / NLP - Bail Prediction Model 12 In general, the performance is lower in district-wise settings, possibly due to large variation across districts Overall, summarization models perform better than Doc2Vec and simpler Transformer-based models
  • 13. Legal AI / NLP for Indian Context 13 HLDC: Hindi Legal Documents Corpus
  • 14. Legal AI / NLP for Indian Context - Takeaways Indian Legal documents are a rich a source of domain-specific Indic- language corpora, readily available online. Multiple tasks still need attention especially for Indian settings Legal Summarization Case recommendations Citation predictions / network Sleeping beauty Bias 14
  • 15. Are Models Trained on Indian Legal Data Fair?
  • 16. An initial investigation of fairness from the Indian perspective in the legal domain 1 Overview
  • 17. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. 2 Objective
  • 18. Recent LegalNLP research for judgement prediction and summarization Deployment without evaluation of bias can lead to unwarranted outcomes Perpetuate into unfair decision-making Motivation 3
  • 19. Recent LegalNLP research for judgement prediction and summarization Deployment without evaluation of bias can lead to unwarranted outcomes Perpetuate into unfair decision-making An evaluation and investigation of encoded biases helps to Understanding of historical social disparities Mitigate any potential harms in the future Motivation 3
  • 20. Sample 10,000 cases from HLDC 36% bail granted, 63% bail denied Data Preparation 5 Fig: HLDC Snippet
  • 21. Use two features facts-and-arguments decision Basic Pre-processing – stop words removal, cleaning using regex Each case represented by 7 features 5 – keywords of the case 2 – category of crime of the case Data Preparation 6
  • 22. Represent a case using keywords – LDA (Topic Modelling) All cases assigned (top) two topics 10 keywords representing each topic 3 keywords for dominant topic 2 keywords for second-dominant topic Data Preparation 11
  • 23. Identify a subset of cases from the dataset using the theme Sample cases having either a Hindu or a Muslim proper noun Training Decision Tree Classifier Model Training 14
  • 24. For every case, we identify the True Label Model’s Predicted Label Number of times the model’s prediction changes when the proper noun is replaced with another Hindu proper noun Number of times the model’s prediction changes when the proper noun is replaced with another Muslim proper noun Model Training 15
  • 25. If the model changes its predictions from 0 (bail dismissed) to 1 (bail granted) more for Muslim nouns replaced by Hindu nouns than Hindu nouns with Muslim nouns, then there exists a bias against Muslims This bias may be due to inherent characteristics of the dataset Model Training 17
  • 26. Demographic Parity Outcome of a classifier to be independent of a protected attribute Evaluating Fairness 18
  • 27. Evaluating Fairness 18 Demographic Parity Outcome of a classifier to be independent of a protected attribute Fairness Gap – Deviation of a trained classifier away from ideal demographic parity
  • 28. Evaluating Fairness 20 Fig: Fairness Gap on Denial of Bail
  • 29. Evaluating Fairness 20 Fig: Fairness Gap on Denial of Bail
  • 30. Changes in Predictions for Theme: Hatya (Murder) Results 22
  • 31. Changes in Predictions for Theme: Dahej (Dowry) Results 23
  • 32. Ethical considerations Results in no way indicate a bias in the judicial system of India (Small data set, lot more open ended questions) HLDC – Only UP data Identifying de-biasing methods 32
  • 33. Initial investigation into bias and fairness for Indian legal data Highlight preferentially encoded stereotypes that models might pick up in downstream tasks like bail prediction Need for algorithmic approaches to mitigate the bias learned by these models Conclusions 25
  • 36. Group pic & Selfie J 36