SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Justice Delayed is Justice Denied
Enabling Legal Artificial Intelligence via Bail
Prediction on Hindi Case Documents
Arnav Kapoor (20171040)
Advisor - Prof. Ponnurangam Kumaraguru
Examiners
Prof. Srijoni Sen
Prof. Saptarshi Ghosh
2
https://www.ndtv.com/india-news/artificial-intelligence-help-reduce-backlog-of-pending-cases-law-minister-kiren-rijiju-2627342
https://www.livemint.com/opinion/columns/judicial-delays-and-the-need-for-intervention-at-the-district-level-11653237233189.html
https://www.financialexpress.com/opinion/tech-to-the-aid-of-justice-delivery/2242658/
Enabling Legal Artificial Intelligence via Bail
Prediction on Hindi Case Documents
Legal Artificial Intelligence
Legal Artificial Intelligence (LegalAI) is concerned with the use of artificial
intelligence technologies particularly NLP, to legal task.
Bail Prediction
An automated system to predict if the Bail (judicial interim release of a person
suspected of a crime) would be granted or denied.
Hindi Case Documents
First work to focus on Hindi Legal documents
3
Motivation
The legal system plays a critical role in the functioning of a nation. It is paramount
that cases are handled judiciously and efficiently
In recent times, the legal system in many populous countries (e.g., India) has been
inundated with a large number of pending cases.
There is an imminent need to build systems to process legal documents and help
augment the legal procedure
4
- How many such pending cases are there and where are the majority of such cases
concentrated ?
- According to India’s National Judicial Data Grid, as of November 2021, there are
approximately 40 million cases pending in District Courts (National Judicial Data
Grid, 2021) as opposed to 5 million cases (1/8th) pending in High Courts.
- Need for developing models that could address the problems at the grass-root levels
of the Indian legal system.
‘Courts inundated with a large number of pending cases’
5
High Courts
District Courts
Judicial Structure in India
District Courts
District Court
High Courts
High Courts
SC
Supreme Court
- Similar to a few other countries like US and UK, India has adopted a tiered court-
system to improve productivity and streamline the judicial process.
6
District Court Challenges
Language - The case documents in District court use the local language of the
state, as opposed to high courts and supreme courts that consistently use English.
‘Article 348(1) of the Constitution of India provides that all proceedings in the
Supreme Court and in every High Court shall be in English language until
Parliament by law otherwise provides.’
Structure - Since each district has their own conventions and quality control check,
there is no consistency. Additionally district case documents are highly
unstructured and noisy (spelling and grammar mistakes since these are typed)
7
Prominent Legal Corpus
[Chalkidis et al., 2019] have developed an English corpus of European Court of
Justice documents. (Non-Indian context, English)
[Xiao et al., 2018] have developed Chinese Legal Document corpus. (Non-Indian
context, Chinese)
[Malik et al., 2021b] have developed an English corpus of Indian Supreme Court
documents. (Indian context, English)
To the best of our knowledge, there does not exist any legal document corpus for the
Hindi language 8
Contribution #1 - HLDC - Hindi Legal Document Corpus
About 40 million pending cases in the district courts of India. Out of the 40 million
pending cases, approximately 20 million are from courts where the official language is
Hindi.
No legal corpora exists for Hindi – HLDC (Hindi Legal Document Corpus)
Hindi Legal Documents Corpus (HLDC) is a corpus of 912,568 Indian legal case
documents in the Hindi language. The corpus is created by crawling data from the e-
Courts website , a GoI initiative to file, track and analyse case progress and documents
online.
HLDC - Corpus Pipeline
OCR
Document Segmentation (Header / Body )
Final Cleaning
10
12
Anonymisation Challenge
Anonymisation is a critical step of document processing. It has a two-fold purpose.
- Unbiased Models
- Privacy Concerns
We used a gazetteer along with regex-based rules for NER to anonymize the data.
List of first names, last names, middle names, locations, titles like पंिडत (Pandit: title of
Priest), सरजी (Sir), month names and day names were normalized to नाम (Naam:
<name>). Phone numbers were detected using regex patterns and replaced with a
<फ़ोन नंबर> (<phone-number>) tag, numbers written in both English and Hindi were
considered.
13
Standardisation Challenge - Case Type Identification
HLDC documents were processed to obtain different case types (e.g., Bail applications,
Criminal Cases)
However for the case documents, the uniformity for case type across districts and even
within district is little.
Bail applications was categorised as ‘Bail Appl’ or ‘Bail Application’ in Kanpur Nagar
district. Similarly, there were variations of ‘Civil Revision’ as ‘Civil Reivision’ and ‘Civill
Revision’ in multiple districts.
Resolved using a combination of manual verification along with edit distance metrics
14
15
Case Type Statistics
16
Case Type % in HLDC
Bail Applications 31.71
Criminal Cases 20.41
Original Suits 6.54
Warrants or Summons in Criminal Cases 5.24
Civil Misc 3.4
Final Report 3.32
Civil Cases 3.23
Contribution #2 - Bail Prediction
Showcase utility of the HLDC corpus on a relevant real world task.
The most prevalent case-type amongst the cases in the corpus was Bail cases.
Additionally bail cases are very time-sensitive in nature as the hearing and decision
needs to be completed in a short period. Bail cases on average are resolved within 7-
15 days as compared to most other cases which can takes years.
17
Bail Prediction
Bail Prediction Task
Input – Facts of the case
System – Bail Prediction ML Model
Output – Bail Granted || Bail Denied
Input
Facts and Arguments
of the case
System
Bail prediction model
Output
Bail is granted or
denied
18
Bail Corpus Creation Pipeline
19
Outcome Identification
Binary - Identify if bail is granted or denied
To extract the bail decision we manually identified keywords in result section.
Keywords like ख़ािरज (dismissed) and िनरस्त (invalidated/ cancelled) show rejection of
bail application and words like स्वीकार (accepted) identified acceptance of bail
application.
20
Regex Tokens - Outcome Identification
21
Judge’s Opinion and Facts Extraction
Judge’s Opinion Extraction - Most of the bail documents have a concluding
paragraph where the judge summarizes their viewpoints of the case. To
extract this, we first constructed certain regex which often precedes judge’s
summary.
Facts Extraction - The remaining case document after removal of judge’s
summary, final result paragraph and initial header information formed the
core facts of the case.
23
Formalised Problem Statement
Formally, consider a corpus of bail documents D = b1, b2, · · · , bi, where each bail
document is segmented as bi = (hi, fi, ji, yi).
Here, hi, fi, ji and yi represent the header, facts, judge’s summary and bail decision of
the document respectively. Additionally, the facts of every document contain k
sentences, more formally, fi = (si
1 , si
2, · · · , si
k), where si
k represents the kth
sentence of the ith bail document. We formulate the bail prediction task as a binary
classification problem.
We are interested in modelling pθ(yi|fi), which is the probability of the outcome yi
given the facts of a case fi. Here, yi ∈ {0, 1}, i.e., 0 if bail is denied or 1 if bail is
granted.
24
Models
We explore a multitude of models to solve the bail prediction problem. The proposed
Multi-Task Learning model performs the best. The different categories of models we
progressively explored were as follows :-
- Embedding Based Models
- Summarisation Based Models
- Multi Task Learning Model
25
Embedding Based Models
Doc2Vec
Classical embedding based model. Doc2Vec embeddings, in our case, is trained on the
train set of our corpus. The embeddings go as input to SVM and XGBoost classifiers.
IndicBERT
IndicBERT is a transformer language model trained on 12 major Indian languages.
However, IndicBERT, akin to other transformer LMs, has a limitation on the input’s
length (max 512 tokens). Experimented with 2 settings.
The first 512 tokens of the document.
The last 512 tokens of the document.
26
Summarisation Based Models
The models described in the previous section can lose important input signals due to
truncation which can be detrimental to the bail prediction task
Lengthy Documents —> Summarisation
Two fold benefits
a) Extractive summary of document as input i.e reducing the length of the
document, without losing important signals.
b) Summarization helps to evaluate the salient sentences in a legal document
and is a step towards developing explainable models 27
Summarisation Methods
Unsupervised - No need for ground truth summaries.
a) TF-IDF+IndicBERT: We rely on the statistical measures of term frequency
and inverse document frequency to recognise the most salient words and
sentences.
b) TextRank+IndicBERT: We use the graph based method TextRank to extract
the most important sentences from the documents
Semi-Supervised - Use the judge’s summary as a truth indicator to train the model.
- Salience Classifier + IndicBERT
28
Semi-Supervised Summarisation - Saliency Classifier
- Saliency Classifier - A binary sentence level classifier to know if the sentence is
important or not.
- We compute embeddings (using a multilingual sentence encoder) for each sentence in
the facts (source text) and the summarised judgement (silver standard), and assign a
saliency score to each sentence in the facts using cosine similarity.
- Formally – Salience(si
k ) = cos_sim( ji , si
k)
- The cosine similarities provides ranked list of sentences - top 50% salient, bottom half
as non-salient. We use this data to train the saliency classifier.
- The salient sentences are used to train (and fine-tune) IndicBERT based classifier.
29
Multi-Task Learning Model
Summarization based models show improvement in results. Instead of a 2 step -
summarisation + prediction, a multi task framework.
Bail prediction is the main task and sentence salience classification is the
auxiliary task.
31
Experiments and Results
We evaluate the models in two settings
a) All-district performance - 70:10:20 - train, validation and test split. Documents from all
districts used together.
b) District-wise performance (measure generalizability): Training on documents from 44
districts. Testing is done on a different set of 17 districts. The validation set has another set
of 10 districts. The number of documents in each category correspond to an approx
70:10:20 ratio.
All models are evaluated using standard accuracy and F1-score metric
32
33
34
Results
The performance of models is lower in the case of district-wise settings.
Possible Reason - Lexical variation across districts, difficult for the model to
generalize.
In general, summarization based models outperform better than Doc2Vec and
transformer-based models, highlighting the importance of the summarization step.
The multi-task model outperforms all the baselines in the district-wise setting with
78.53% accuracy. The auxiliary task of sentence salience classification helps learn
robust features during training.
However, in the case of an all-district split, the MTL model fails to beat simpler
baselines like TF-IDF+IndicBERT. The sentence salience training data based on
the cosine similarity heuristic, induce some noise for the auxiliary task.
35
Error Analysis
Kernel Density Estimate
(KDE) plots of our proposed
bail prediction model. The
majority of errors (incorrectly
dismissed / granted) are
borderline cases with model
output score around 0.5
36
Challenges and Limitations
Domain Specific Knowledge - Court documents contain very specific lexicon. Pre-
trained models built on other corpus don’t perform well.
Low Resource Language - Hindi, despite being the third most spoken language in
the world remains a low resource language in the context of NLP with limited high
quality training data and pre-trained models.
Length of Documents - Court cases can be extremely long in nature, the token
length constraint of transformer based models makes it challenging to process legal
documents.
37
Summary
We introduced a large corpus of legal documents for the under-
resourced language Hindi: Hindi Legal Documents Corpus (HLDC)
As a use-case for HLDC, we introduce the task of Bail Prediction - a
time-sensitive and impactful problem to tackle.
We experimented with several models and proposed a multi-task
learning based model that predicts salient sentences as an auxiliary task
and bail prediction as the main task
38
Future Work
Multiple Languages - India has 22 official languages, but for the majority of
languages, there are no legal corpora. The use of such corpora is not restricted only
to the legal domain. It can also act as a source of unsupervised data in low resource
languages to train transformer models.
Infusing Domain Specific Knowledge: Develop deep models infused with legal
knowledge so that model is able to capture legal nuances. For Indian legal
documents, we can incorporate information from the sections and acts in the IPC
(Indian Penal Code).
Other LegalNLP tasks: Apart from judgement prediction, a number of other tasks
like legal summarisation and prior case retrieval can help expedite judicial process.
39
Publications
Kapoor, A., Dhawan, M., Goel, A., Arjun, T.H, Agrawal, V., Agrawal, A.,
Bhattacharya, A., Kumaraguru, P., and Modi, A.
HLDC: Hindi Legal Documents Corpus. In Findings of the Association for
Computational Linguistics: ACL-IJCNLP 2022 , Association for Computational
Linguistics.
40
Acknowledgements
Prof. Ashutosh Modi and Prof. Arnab Bhattacharya - The work was a collaborative
effort between universities of IIIT Hyderabad, IIT Kanpur and IIIT Delhi. Thanks
for the opportunity to work on such an impactful problem.
IHUB-Data IIIT Hyderabad for funding my Masters.
The Precog group for the constant support and encouragement.
Family - For everything.
41
Questions
42

Más contenido relacionado

La actualidad más candente

Jones v Saudi Arabia - summary
Jones v Saudi Arabia - summaryJones v Saudi Arabia - summary
Jones v Saudi Arabia - summaryFAROUQ
 
Principle of prevention of corruption and anti corruption regulatory framewor...
Principle of prevention of corruption and anti corruption regulatory framewor...Principle of prevention of corruption and anti corruption regulatory framewor...
Principle of prevention of corruption and anti corruption regulatory framewor...DrShamsulArefin
 
Juvenile Justice in India Policy and Implementation Dilemmas
Juvenile Justice in India Policy and Implementation DilemmasJuvenile Justice in India Policy and Implementation Dilemmas
Juvenile Justice in India Policy and Implementation DilemmasHAQ: Centre for Child Rights
 
Application of internet computer in legal research
Application of internet computer in legal researchApplication of internet computer in legal research
Application of internet computer in legal researchProf. (Dr.) Tabrez Ahmad
 
Probation officer
Probation officerProbation officer
Probation officermbelanger32
 
Jail manual By Fari Aftab
Jail manual By Fari AftabJail manual By Fari Aftab
Jail manual By Fari AftabFariAftab
 
Misuse of Section 498A of IPC
Misuse of Section 498A of IPCMisuse of Section 498A of IPC
Misuse of Section 498A of IPCMansi Agarwal
 
The indian penal code
The indian penal codeThe indian penal code
The indian penal codeIrfan Ahmad
 
Juvenile Justice Act,2000
Juvenile Justice Act,2000Juvenile Justice Act,2000
Juvenile Justice Act,2000mohini vig
 
Anti money laundering laws Pakistan with comparison of International laws
Anti money laundering laws Pakistan with comparison of International lawsAnti money laundering laws Pakistan with comparison of International laws
Anti money laundering laws Pakistan with comparison of International lawsShehroz Adil
 
PAKISTAN POLICE IS INFACT GREAT (must read)
PAKISTAN POLICE IS  INFACT GREAT (must read)PAKISTAN POLICE IS  INFACT GREAT (must read)
PAKISTAN POLICE IS INFACT GREAT (must read)Malik Tariq Sarwar Awan
 
Introduction to Moot Court
Introduction to Moot CourtIntroduction to Moot Court
Introduction to Moot CourtJaMshed AhMed
 
Origin and development of equity
Origin and development of equityOrigin and development of equity
Origin and development of equityA K DAS's | Law
 

La actualidad más candente (20)

Introduction of ICT and legal research :
Introduction of ICT and legal research :Introduction of ICT and legal research :
Introduction of ICT and legal research :
 
Jones v Saudi Arabia - summary
Jones v Saudi Arabia - summaryJones v Saudi Arabia - summary
Jones v Saudi Arabia - summary
 
Area sabha
Area sabhaArea sabha
Area sabha
 
Principle of prevention of corruption and anti corruption regulatory framewor...
Principle of prevention of corruption and anti corruption regulatory framewor...Principle of prevention of corruption and anti corruption regulatory framewor...
Principle of prevention of corruption and anti corruption regulatory framewor...
 
Juvenile Justice in India Policy and Implementation Dilemmas
Juvenile Justice in India Policy and Implementation DilemmasJuvenile Justice in India Policy and Implementation Dilemmas
Juvenile Justice in India Policy and Implementation Dilemmas
 
Application of internet computer in legal research
Application of internet computer in legal researchApplication of internet computer in legal research
Application of internet computer in legal research
 
Probation officer
Probation officerProbation officer
Probation officer
 
Cyber law
Cyber lawCyber law
Cyber law
 
Jail manual By Fari Aftab
Jail manual By Fari AftabJail manual By Fari Aftab
Jail manual By Fari Aftab
 
Misuse of Section 498A of IPC
Misuse of Section 498A of IPCMisuse of Section 498A of IPC
Misuse of Section 498A of IPC
 
Freedom of expression and Bangladesh constitution
Freedom of expression and Bangladesh constitutionFreedom of expression and Bangladesh constitution
Freedom of expression and Bangladesh constitution
 
The indian penal code
The indian penal codeThe indian penal code
The indian penal code
 
Juvenile Justice Act,2000
Juvenile Justice Act,2000Juvenile Justice Act,2000
Juvenile Justice Act,2000
 
Anti money laundering laws Pakistan with comparison of International laws
Anti money laundering laws Pakistan with comparison of International lawsAnti money laundering laws Pakistan with comparison of International laws
Anti money laundering laws Pakistan with comparison of International laws
 
PAKISTAN POLICE IS INFACT GREAT (must read)
PAKISTAN POLICE IS  INFACT GREAT (must read)PAKISTAN POLICE IS  INFACT GREAT (must read)
PAKISTAN POLICE IS INFACT GREAT (must read)
 
Introduction to Moot Court
Introduction to Moot CourtIntroduction to Moot Court
Introduction to Moot Court
 
Qso
QsoQso
Qso
 
Origin and development of equity
Origin and development of equityOrigin and development of equity
Origin and development of equity
 
Will and Testament
Will and TestamentWill and Testament
Will and Testament
 
Child rights
Child rightsChild rights
Child rights
 

Similar a Justice Delayed is Justice Denied: Enabling Legal Artificial Intelligence via Bail Prediction on Hindi Case Documents

NLP / Language Research at Precog
NLP / Language Research at PrecogNLP / Language Research at Precog
NLP / Language Research at PrecogIIIT Hyderabad
 
An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...IJECEIAES
 
Surviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The ParalegalSurviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The ParalegalAubrey Owens
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...
Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...
Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...IRJET Journal
 
Total Evidence White Paper
Total Evidence White PaperTotal Evidence White Paper
Total Evidence White PaperKevin Featherly
 
i-Lit Paralegals Brochure (email)
i-Lit Paralegals Brochure (email)i-Lit Paralegals Brochure (email)
i-Lit Paralegals Brochure (email)Mike Taylor
 
Judiciary information management system
Judiciary information management systemJudiciary information management system
Judiciary information management systemKIRTIDEV MOHAPATRA
 
Lexhawk headnote writing services 1
Lexhawk headnote writing services 1Lexhawk headnote writing services 1
Lexhawk headnote writing services 1Arulsagai Arulsamy
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
The Role of AI Technology for Legal Research and Decision Making
The Role of AI Technology for Legal Research and Decision MakingThe Role of AI Technology for Legal Research and Decision Making
The Role of AI Technology for Legal Research and Decision MakingIRJET Journal
 
Malware analysis
Malware analysisMalware analysis
Malware analysisAnne ndolo
 
Legal Research Outsourcing: Our Means and Your Ends
Legal Research Outsourcing: Our Means and Your EndsLegal Research Outsourcing: Our Means and Your Ends
Legal Research Outsourcing: Our Means and Your Endsrajni_minhas
 
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023Scott Kveton
 
Analysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewAnalysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewCynthia King
 
Automated Discovery of Logical Fallacies in Legal Argumentation
Automated Discovery of Logical Fallacies in Legal ArgumentationAutomated Discovery of Logical Fallacies in Legal Argumentation
Automated Discovery of Logical Fallacies in Legal Argumentationgerogepatton
 
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONAUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONijaia
 
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONAUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONgerogepatton
 

Similar a Justice Delayed is Justice Denied: Enabling Legal Artificial Intelligence via Bail Prediction on Hindi Case Documents (20)

NLP / Language Research at Precog
NLP / Language Research at PrecogNLP / Language Research at Precog
NLP / Language Research at Precog
 
An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...An overview of information extraction techniques for legal document analysis ...
An overview of information extraction techniques for legal document analysis ...
 
Surviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The ParalegalSurviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The Paralegal
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...
Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...
Review and Approaches to Develop Legal Assistance for Lawyers and Legal Profe...
 
Total Evidence White Paper
Total Evidence White PaperTotal Evidence White Paper
Total Evidence White Paper
 
i-Lit Paralegals Brochure (email)
i-Lit Paralegals Brochure (email)i-Lit Paralegals Brochure (email)
i-Lit Paralegals Brochure (email)
 
Judiciary information management system
Judiciary information management systemJudiciary information management system
Judiciary information management system
 
Lexhawk headnote writing services 1
Lexhawk headnote writing services 1Lexhawk headnote writing services 1
Lexhawk headnote writing services 1
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
ICBAI Paper (1)
ICBAI Paper (1)ICBAI Paper (1)
ICBAI Paper (1)
 
The Role of AI Technology for Legal Research and Decision Making
The Role of AI Technology for Legal Research and Decision MakingThe Role of AI Technology for Legal Research and Decision Making
The Role of AI Technology for Legal Research and Decision Making
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
Legal Research Outsourcing: Our Means and Your Ends
Legal Research Outsourcing: Our Means and Your EndsLegal Research Outsourcing: Our Means and Your Ends
Legal Research Outsourcing: Our Means and Your Ends
 
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
 
Analysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewAnalysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document Review
 
Computer forencis
Computer forencisComputer forencis
Computer forencis
 
Automated Discovery of Logical Fallacies in Legal Argumentation
Automated Discovery of Logical Fallacies in Legal ArgumentationAutomated Discovery of Logical Fallacies in Legal Argumentation
Automated Discovery of Logical Fallacies in Legal Argumentation
 
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONAUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
 
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONAUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATION
 

Más de IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 
A Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian LanguagesA Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 
Exposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake NewsExposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake NewsIIIT Hyderabad
 

Más de IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 
A Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian LanguagesA Framework For Automatic Question Answering in Indian Languages
A Framework For Automatic Question Answering in Indian Languages
 
Exposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake NewsExposing, Examining and Intervening Fake News
Exposing, Examining and Intervening Fake News
 

Último

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Último (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 

Justice Delayed is Justice Denied: Enabling Legal Artificial Intelligence via Bail Prediction on Hindi Case Documents

  • 1. Justice Delayed is Justice Denied Enabling Legal Artificial Intelligence via Bail Prediction on Hindi Case Documents Arnav Kapoor (20171040) Advisor - Prof. Ponnurangam Kumaraguru Examiners Prof. Srijoni Sen Prof. Saptarshi Ghosh
  • 3. Enabling Legal Artificial Intelligence via Bail Prediction on Hindi Case Documents Legal Artificial Intelligence Legal Artificial Intelligence (LegalAI) is concerned with the use of artificial intelligence technologies particularly NLP, to legal task. Bail Prediction An automated system to predict if the Bail (judicial interim release of a person suspected of a crime) would be granted or denied. Hindi Case Documents First work to focus on Hindi Legal documents 3
  • 4. Motivation The legal system plays a critical role in the functioning of a nation. It is paramount that cases are handled judiciously and efficiently In recent times, the legal system in many populous countries (e.g., India) has been inundated with a large number of pending cases. There is an imminent need to build systems to process legal documents and help augment the legal procedure 4
  • 5. - How many such pending cases are there and where are the majority of such cases concentrated ? - According to India’s National Judicial Data Grid, as of November 2021, there are approximately 40 million cases pending in District Courts (National Judicial Data Grid, 2021) as opposed to 5 million cases (1/8th) pending in High Courts. - Need for developing models that could address the problems at the grass-root levels of the Indian legal system. ‘Courts inundated with a large number of pending cases’ 5 High Courts District Courts
  • 6. Judicial Structure in India District Courts District Court High Courts High Courts SC Supreme Court - Similar to a few other countries like US and UK, India has adopted a tiered court- system to improve productivity and streamline the judicial process. 6
  • 7. District Court Challenges Language - The case documents in District court use the local language of the state, as opposed to high courts and supreme courts that consistently use English. ‘Article 348(1) of the Constitution of India provides that all proceedings in the Supreme Court and in every High Court shall be in English language until Parliament by law otherwise provides.’ Structure - Since each district has their own conventions and quality control check, there is no consistency. Additionally district case documents are highly unstructured and noisy (spelling and grammar mistakes since these are typed) 7
  • 8. Prominent Legal Corpus [Chalkidis et al., 2019] have developed an English corpus of European Court of Justice documents. (Non-Indian context, English) [Xiao et al., 2018] have developed Chinese Legal Document corpus. (Non-Indian context, Chinese) [Malik et al., 2021b] have developed an English corpus of Indian Supreme Court documents. (Indian context, English) To the best of our knowledge, there does not exist any legal document corpus for the Hindi language 8
  • 9. Contribution #1 - HLDC - Hindi Legal Document Corpus About 40 million pending cases in the district courts of India. Out of the 40 million pending cases, approximately 20 million are from courts where the official language is Hindi. No legal corpora exists for Hindi – HLDC (Hindi Legal Document Corpus) Hindi Legal Documents Corpus (HLDC) is a corpus of 912,568 Indian legal case documents in the Hindi language. The corpus is created by crawling data from the e- Courts website , a GoI initiative to file, track and analyse case progress and documents online.
  • 10. HLDC - Corpus Pipeline OCR Document Segmentation (Header / Body ) Final Cleaning 10
  • 11. 12
  • 12. Anonymisation Challenge Anonymisation is a critical step of document processing. It has a two-fold purpose. - Unbiased Models - Privacy Concerns We used a gazetteer along with regex-based rules for NER to anonymize the data. List of first names, last names, middle names, locations, titles like पंिडत (Pandit: title of Priest), सरजी (Sir), month names and day names were normalized to नाम (Naam: <name>). Phone numbers were detected using regex patterns and replaced with a <फ़ोन नंबर> (<phone-number>) tag, numbers written in both English and Hindi were considered. 13
  • 13. Standardisation Challenge - Case Type Identification HLDC documents were processed to obtain different case types (e.g., Bail applications, Criminal Cases) However for the case documents, the uniformity for case type across districts and even within district is little. Bail applications was categorised as ‘Bail Appl’ or ‘Bail Application’ in Kanpur Nagar district. Similarly, there were variations of ‘Civil Revision’ as ‘Civil Reivision’ and ‘Civill Revision’ in multiple districts. Resolved using a combination of manual verification along with edit distance metrics 14
  • 14. 15
  • 15. Case Type Statistics 16 Case Type % in HLDC Bail Applications 31.71 Criminal Cases 20.41 Original Suits 6.54 Warrants or Summons in Criminal Cases 5.24 Civil Misc 3.4 Final Report 3.32 Civil Cases 3.23
  • 16. Contribution #2 - Bail Prediction Showcase utility of the HLDC corpus on a relevant real world task. The most prevalent case-type amongst the cases in the corpus was Bail cases. Additionally bail cases are very time-sensitive in nature as the hearing and decision needs to be completed in a short period. Bail cases on average are resolved within 7- 15 days as compared to most other cases which can takes years. 17
  • 17. Bail Prediction Bail Prediction Task Input – Facts of the case System – Bail Prediction ML Model Output – Bail Granted || Bail Denied Input Facts and Arguments of the case System Bail prediction model Output Bail is granted or denied 18
  • 18. Bail Corpus Creation Pipeline 19
  • 19. Outcome Identification Binary - Identify if bail is granted or denied To extract the bail decision we manually identified keywords in result section. Keywords like ख़ािरज (dismissed) and िनरस्त (invalidated/ cancelled) show rejection of bail application and words like स्वीकार (accepted) identified acceptance of bail application. 20
  • 20. Regex Tokens - Outcome Identification 21
  • 21. Judge’s Opinion and Facts Extraction Judge’s Opinion Extraction - Most of the bail documents have a concluding paragraph where the judge summarizes their viewpoints of the case. To extract this, we first constructed certain regex which often precedes judge’s summary. Facts Extraction - The remaining case document after removal of judge’s summary, final result paragraph and initial header information formed the core facts of the case.
  • 22. 23
  • 23. Formalised Problem Statement Formally, consider a corpus of bail documents D = b1, b2, · · · , bi, where each bail document is segmented as bi = (hi, fi, ji, yi). Here, hi, fi, ji and yi represent the header, facts, judge’s summary and bail decision of the document respectively. Additionally, the facts of every document contain k sentences, more formally, fi = (si 1 , si 2, · · · , si k), where si k represents the kth sentence of the ith bail document. We formulate the bail prediction task as a binary classification problem. We are interested in modelling pθ(yi|fi), which is the probability of the outcome yi given the facts of a case fi. Here, yi ∈ {0, 1}, i.e., 0 if bail is denied or 1 if bail is granted. 24
  • 24. Models We explore a multitude of models to solve the bail prediction problem. The proposed Multi-Task Learning model performs the best. The different categories of models we progressively explored were as follows :- - Embedding Based Models - Summarisation Based Models - Multi Task Learning Model 25
  • 25. Embedding Based Models Doc2Vec Classical embedding based model. Doc2Vec embeddings, in our case, is trained on the train set of our corpus. The embeddings go as input to SVM and XGBoost classifiers. IndicBERT IndicBERT is a transformer language model trained on 12 major Indian languages. However, IndicBERT, akin to other transformer LMs, has a limitation on the input’s length (max 512 tokens). Experimented with 2 settings. The first 512 tokens of the document. The last 512 tokens of the document. 26
  • 26. Summarisation Based Models The models described in the previous section can lose important input signals due to truncation which can be detrimental to the bail prediction task Lengthy Documents —> Summarisation Two fold benefits a) Extractive summary of document as input i.e reducing the length of the document, without losing important signals. b) Summarization helps to evaluate the salient sentences in a legal document and is a step towards developing explainable models 27
  • 27. Summarisation Methods Unsupervised - No need for ground truth summaries. a) TF-IDF+IndicBERT: We rely on the statistical measures of term frequency and inverse document frequency to recognise the most salient words and sentences. b) TextRank+IndicBERT: We use the graph based method TextRank to extract the most important sentences from the documents Semi-Supervised - Use the judge’s summary as a truth indicator to train the model. - Salience Classifier + IndicBERT 28
  • 28. Semi-Supervised Summarisation - Saliency Classifier - Saliency Classifier - A binary sentence level classifier to know if the sentence is important or not. - We compute embeddings (using a multilingual sentence encoder) for each sentence in the facts (source text) and the summarised judgement (silver standard), and assign a saliency score to each sentence in the facts using cosine similarity. - Formally – Salience(si k ) = cos_sim( ji , si k) - The cosine similarities provides ranked list of sentences - top 50% salient, bottom half as non-salient. We use this data to train the saliency classifier. - The salient sentences are used to train (and fine-tune) IndicBERT based classifier. 29
  • 29. Multi-Task Learning Model Summarization based models show improvement in results. Instead of a 2 step - summarisation + prediction, a multi task framework. Bail prediction is the main task and sentence salience classification is the auxiliary task.
  • 30. 31
  • 31. Experiments and Results We evaluate the models in two settings a) All-district performance - 70:10:20 - train, validation and test split. Documents from all districts used together. b) District-wise performance (measure generalizability): Training on documents from 44 districts. Testing is done on a different set of 17 districts. The validation set has another set of 10 districts. The number of documents in each category correspond to an approx 70:10:20 ratio. All models are evaluated using standard accuracy and F1-score metric 32
  • 32. 33
  • 33. 34
  • 34. Results The performance of models is lower in the case of district-wise settings. Possible Reason - Lexical variation across districts, difficult for the model to generalize. In general, summarization based models outperform better than Doc2Vec and transformer-based models, highlighting the importance of the summarization step. The multi-task model outperforms all the baselines in the district-wise setting with 78.53% accuracy. The auxiliary task of sentence salience classification helps learn robust features during training. However, in the case of an all-district split, the MTL model fails to beat simpler baselines like TF-IDF+IndicBERT. The sentence salience training data based on the cosine similarity heuristic, induce some noise for the auxiliary task. 35
  • 35. Error Analysis Kernel Density Estimate (KDE) plots of our proposed bail prediction model. The majority of errors (incorrectly dismissed / granted) are borderline cases with model output score around 0.5 36
  • 36. Challenges and Limitations Domain Specific Knowledge - Court documents contain very specific lexicon. Pre- trained models built on other corpus don’t perform well. Low Resource Language - Hindi, despite being the third most spoken language in the world remains a low resource language in the context of NLP with limited high quality training data and pre-trained models. Length of Documents - Court cases can be extremely long in nature, the token length constraint of transformer based models makes it challenging to process legal documents. 37
  • 37. Summary We introduced a large corpus of legal documents for the under- resourced language Hindi: Hindi Legal Documents Corpus (HLDC) As a use-case for HLDC, we introduce the task of Bail Prediction - a time-sensitive and impactful problem to tackle. We experimented with several models and proposed a multi-task learning based model that predicts salient sentences as an auxiliary task and bail prediction as the main task 38
  • 38. Future Work Multiple Languages - India has 22 official languages, but for the majority of languages, there are no legal corpora. The use of such corpora is not restricted only to the legal domain. It can also act as a source of unsupervised data in low resource languages to train transformer models. Infusing Domain Specific Knowledge: Develop deep models infused with legal knowledge so that model is able to capture legal nuances. For Indian legal documents, we can incorporate information from the sections and acts in the IPC (Indian Penal Code). Other LegalNLP tasks: Apart from judgement prediction, a number of other tasks like legal summarisation and prior case retrieval can help expedite judicial process. 39
  • 39. Publications Kapoor, A., Dhawan, M., Goel, A., Arjun, T.H, Agrawal, V., Agrawal, A., Bhattacharya, A., Kumaraguru, P., and Modi, A. HLDC: Hindi Legal Documents Corpus. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2022 , Association for Computational Linguistics. 40
  • 40. Acknowledgements Prof. Ashutosh Modi and Prof. Arnab Bhattacharya - The work was a collaborative effort between universities of IIIT Hyderabad, IIT Kanpur and IIIT Delhi. Thanks for the opportunity to work on such an impactful problem. IHUB-Data IIIT Hyderabad for funding my Masters. The Precog group for the constant support and encouragement. Family - For everything. 41