SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Methodological
innovations to estimate
illegal economy
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
0
o A research directed by Guido M. Rey has resulted in the volume «La
mafia come impresa. Analisi del sistema economico criminale e dele
politiche di contrasto» (2017)
o In the chapter «Dalle parole ai numeri : estrarre dati dalle sentenze della
magistratura» the results obtained from the analysis of about 5,000
judgements issued by the Corte di Cassazione are presented.
o Increase the results obtained from the text mining of sentences through
the interaction of multiple data sources.
o Evaluation of completeness and reliability of data.
o Organize database(s) aimed at estimating statistical models
1
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
1
Aims
Starting point
Goals
2
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
2
Exercise:
Integration of data from multiple sources
① Judgments issued by the Corte di Cassazione (www.italgiure.it) : Open Data PA
② Orbis : database of economic enterprises accessible with the resources of EMBeDS
(Economics and Management in the era of Data Science), project winner in the MIUR
selection of Departments of Excellence 2018-2022 http://embeds.santannapisa.it/
A subset of 308 sentences has been extracted from the selected 4,632 judgments (from 2012
to September 2016) with one or more of the words “corruzione”, “concussione”, “turbativa” e
“appalto”.
• Issued in 2014
• Containing references to professional roles held in the Public Administration
oCreation of a Corpus with the texts of the judgements
oVocabulary (words and lemma)
oGrammatical and semantic Tagging
oIdentification of Multiwords and segments
oText mining
Through the TalTaC2
package
3
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
3
Step 1:
Import texts of sentences and text mining
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
4
Information chart
Come si può vedere dalla figura seguente, il centro delle informazioni è costituito dal singolo
evento criminoso, che coinvolge attori (singoli o aggregati), che viene individuato / sanzionato,
che si svolge in un luogo geografico specifico, in una data (o periodo) certa, con determinate
modalità, con un valore economico determinato.
Fa parte
/lavora
per
Evento criminoso
persona
persona
 Tribunale
 Polizia

Sanzionato
/Individuato
Valore economico
Euro
coinvolge
quando
dove
come
Ai
danni
di
Insie
me a
Ass
criminale
Ente
Pubblico
Azienda
luogo
periodo
WHO
WHEN
WHERE
WHAT
HOW
Economic value
5
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
5
Guidelines followed for matching with Orbis
-- The matching procedure must be automatic or automatable: repeatable with lists
obtained from a higher number of judgments and without the intervention of
"manual" choices
-- The presence of data / information on natural persons in clear does not pose privacy
problems, because this information is not extracted "per se" but it constitutes the premise for
obtaining a correct and reliable matching: the data are still treated in a statistical way
(anonymously)
6
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
6
Step 2:
Matching with Orbis (1)
«Batch search» (automatic) in two consecutive steps:
 Companies : list obtained from Taltac2 by exporting name and identification of
the sentence
 Persons (defendants): list of defendants obtained by Taltac by exporting
graphic forms with semantic tagging «defendants» (multiword graphic form
with name and surname or surname and name) and date of birth
7
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
7
Step 2:
Matching with Orbis (2)
RESULTS of the «Batch search» (automatic) in two consecutive steps:
 Companies Input : 400 companies of wihich 228 with A score
186 unique companies
(due to the presence of the company name in several judgments
or the name written by judges with more variations)
 Person Input (defendants): 408 defendants (unique, no repetitions)
16 validated records (automatic comparison between date of birth and part of
the social security number) + 6 individual companies
A Excellent total score >= 95%
B Good total score between 85 and 94%
The automated process produces a
matching score for each record.
Our quality indicator uses the
following scoring criteria:
8
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
8
Step 3:
Information contribution from Orbis:
variables with high information potential
What data do we add to those already available?
 Company status
 Business size
 Statistical classification of activities
 Start year
 Budget data
 ….
BUT ALSO THE NAMES OF THE TOP MANAGEMENT AND OWNERS
Again with a view to anonymous treatment, they can be used to identify a network of
companies.
Not interesting "per se" (we are not a detective agency) but holders of other individual
companies and / or family (founded after the outcome of the judgment).
NB: the names of the defendants are clear in the source Corte di Cassazione, as it is the
last court level.
9
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
9
TalTaC results:
The automatic classification of judgments
10
Cluster 1 (n=119) :
presence of organized crime
Cluster 2 (n=177) :
concussion /corruption in the PA
cosca pubblico ufficiale
associazione mafiosa concussione
associazione privato
Nome1 costrizione
sodalizio corruzione
partecipazione induzione
conversazione servizio
estorsione CP
ndrangheta ufficio
clan abuso
Nome2 prescrizione
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
10
How to interpret clusters
First 11 words characterizing the 2 main identified clusters
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
11
Not just text mining but help in the interpretation
The interaction between the results of the textual analysis and the new
information that can be acquired with other databases (administrative or not) is
the novelty of the approach that is presented.
The questions we would like to answer:
Companies present in sentences have characteristics different from those not
present?
Do the companies, belonging to a cluster and present in the judgments, differ?
Example: Different by company size, economic sector, geographical location?
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
12
Regions and companies by cluster
Region
Cluster 1
Reati + org crim
Cluster 2
Reati e PA Total
# sentenze # imprese # sentenze # imprese # sentenze # imprese
Abruzzo 1 1 1 1
Calabria 11 33 1 1 12 34
Campania 6 21 8 13 14 34
Emilia-Romagna 2 7 2 7
Lazio 1 1 5 6 6 7
Liguria 1 1 1 2 2 3
Lombardia 1 13 6 17 7 30
Marche 3 8 3 8
Molise 1 1 1 1
Piemonte 1 10 1 10
Puglia 5 14 5 14
Sardegna 1 1 1 1
Sicilia 5 13 5 11 10 24
Toscana 1 1 4 10 5 11
Veneto 4 17 4 17
Total 26 83 48 119 74 202
Dati provvisori
e parziali
13
National legal form
Number of
companies
Consortium + Consortium with external activity 4
Cooperative company ( SCARL + SCARLPA) 4
Joint stock company - SPA 25
Limited liability company - SRL 121
Limited partnership - SAS 2
One-person company with limited liability - SRLU 21
One-person joint stock company - SPA 3
Sole proprietorship 2
n.d. 4
Total 186
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
13
Companies by national legal form
Provisional and partial data
To be added 22 one-
person companies
obtained from the
list of defendants
14
Status number of companies
Active 135
Active (default of payment) 1
Bankruptcy 1
Dissolved 5
Dissolved (bankruptcy) 16
Dissolved (liquidation) 5
Dissolved (merger or take-over) 6
In liquidation 11
Status unknown 6
Totale 186
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
14
Companies by status
Provisional and partial data
15
Areas
Status
Active Others Status unknown Total
ITC - Northwest 29 12 1 42
ITH - Northeast 22 12 34
ITI - Centre 33 9 42
ITF - South 26 8 4 38
ITG - Insular Italy 15 4 1 20
(blank) 10 0 10
Total 135 45 6 186
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
15
Companies by Geographical Areas and status
Provisional and partial data
Others:
Active (default of payment)
Bankruptcy
Dissolved
Dissolved (bankruptcy)
Dissolved (liquidation)
Dissolved (merger or take-over)
In liquidation
16
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
16
Discussion
The potential sources of data and information are many and each one is organized
according to its own purposes.
The use for statistical purposes obliges to have to take into account some aspects,
sometimes neglected when talking about Big Data or Open Data:
• The completeness of the information
• The time base of the information acquired or possibly acquired
17
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
17
Final goal : the «statistical» DataBase
The database thus obtained will allow reconstructions and analysis starting from
any element (Company, Public Body, persons, period, place, etc) provided that it
is correctly identified as such within the texts of the judgments.
It is, therefore, necessary to use several tools:
Text mining for processing the information contained in the texts of
the sentences and transform them into data that can be analysed
statistically
Validate and integrate this data with other information and data from
other administrative databases / records.
The greater the completeness and reliability of the other databases, the greater
the information value of the statistical analysis carried out on the statistical
database.
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
18
Credits
Un ringraziamento a:
Fabrizio Alboni
Daniela Arlia
Antonella Baldassarini
Lorenzo Bartalini
Pietro Battiston
Sergio Bolasco
Alberto di Martino
Giuseppe Di Vetta
Pasquale Pavone
Guido M. Rey

Más contenido relacionado

Similar a Maria F.Romano, Methodological innovations to estimate illegal economy

2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
Bruce Collins
 
Estimating The Size of the Irish Population
Estimating The Size of the Irish PopulationEstimating The Size of the Irish Population
Estimating The Size of the Irish Population
Alan McSweeney
 
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Marie Ivarsson
 

Similar a Maria F.Romano, Methodological innovations to estimate illegal economy (20)

Peta Pilot
Peta PilotPeta Pilot
Peta Pilot
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Forensic Technology & Discovery Services: The Intelligent Connection - EY India
Forensic Technology & Discovery Services: The Intelligent Connection - EY IndiaForensic Technology & Discovery Services: The Intelligent Connection - EY India
Forensic Technology & Discovery Services: The Intelligent Connection - EY India
 
Managing Information Risk in Financial Services
Managing Information Risk in Financial Services Managing Information Risk in Financial Services
Managing Information Risk in Financial Services
 
Factors of Doing Business, Case Study Kosovo
Factors of Doing Business, Case Study KosovoFactors of Doing Business, Case Study Kosovo
Factors of Doing Business, Case Study Kosovo
 
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
 
Estimating The Size of the Irish Population
Estimating The Size of the Irish PopulationEstimating The Size of the Irish Population
Estimating The Size of the Irish Population
 
CASE Network Studies and Analyses 340 - The Polish tax system - What has been...
CASE Network Studies and Analyses 340 - The Polish tax system - What has been...CASE Network Studies and Analyses 340 - The Polish tax system - What has been...
CASE Network Studies and Analyses 340 - The Polish tax system - What has been...
 
Just in case, December 2016
Just in case, December 2016Just in case, December 2016
Just in case, December 2016
 
Information Innovation: Turning Insights into Opportunities
Information Innovation: Turning Insights into OpportunitiesInformation Innovation: Turning Insights into Opportunities
Information Innovation: Turning Insights into Opportunities
 
Tisski Ltd Freedom of Information White Paper
Tisski Ltd Freedom of Information White PaperTisski Ltd Freedom of Information White Paper
Tisski Ltd Freedom of Information White Paper
 
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
 
Forward thinking q42016
Forward thinking q42016Forward thinking q42016
Forward thinking q42016
 
VAT fraud detection : the mysterious case of the missing trader
VAT fraud detection : the mysterious case of the missing traderVAT fraud detection : the mysterious case of the missing trader
VAT fraud detection : the mysterious case of the missing trader
 
Getting to grips with the BEPS Action Plan
Getting to grips with the BEPS Action PlanGetting to grips with the BEPS Action Plan
Getting to grips with the BEPS Action Plan
 

Más de Istituto nazionale di statistica

Más de Istituto nazionale di statistica (20)

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 

Último

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Último (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

Maria F.Romano, Methodological innovations to estimate illegal economy

  • 1. Methodological innovations to estimate illegal economy Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 0
  • 2. o A research directed by Guido M. Rey has resulted in the volume «La mafia come impresa. Analisi del sistema economico criminale e dele politiche di contrasto» (2017) o In the chapter «Dalle parole ai numeri : estrarre dati dalle sentenze della magistratura» the results obtained from the analysis of about 5,000 judgements issued by the Corte di Cassazione are presented. o Increase the results obtained from the text mining of sentences through the interaction of multiple data sources. o Evaluation of completeness and reliability of data. o Organize database(s) aimed at estimating statistical models 1 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 1 Aims Starting point Goals
  • 3. 2 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 2 Exercise: Integration of data from multiple sources ① Judgments issued by the Corte di Cassazione (www.italgiure.it) : Open Data PA ② Orbis : database of economic enterprises accessible with the resources of EMBeDS (Economics and Management in the era of Data Science), project winner in the MIUR selection of Departments of Excellence 2018-2022 http://embeds.santannapisa.it/ A subset of 308 sentences has been extracted from the selected 4,632 judgments (from 2012 to September 2016) with one or more of the words “corruzione”, “concussione”, “turbativa” e “appalto”. • Issued in 2014 • Containing references to professional roles held in the Public Administration
  • 4. oCreation of a Corpus with the texts of the judgements oVocabulary (words and lemma) oGrammatical and semantic Tagging oIdentification of Multiwords and segments oText mining Through the TalTaC2 package 3 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 3 Step 1: Import texts of sentences and text mining
  • 5. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 4 Information chart Come si può vedere dalla figura seguente, il centro delle informazioni è costituito dal singolo evento criminoso, che coinvolge attori (singoli o aggregati), che viene individuato / sanzionato, che si svolge in un luogo geografico specifico, in una data (o periodo) certa, con determinate modalità, con un valore economico determinato. Fa parte /lavora per Evento criminoso persona persona  Tribunale  Polizia  Sanzionato /Individuato Valore economico Euro coinvolge quando dove come Ai danni di Insie me a Ass criminale Ente Pubblico Azienda luogo periodo WHO WHEN WHERE WHAT HOW Economic value
  • 6. 5 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 5 Guidelines followed for matching with Orbis -- The matching procedure must be automatic or automatable: repeatable with lists obtained from a higher number of judgments and without the intervention of "manual" choices -- The presence of data / information on natural persons in clear does not pose privacy problems, because this information is not extracted "per se" but it constitutes the premise for obtaining a correct and reliable matching: the data are still treated in a statistical way (anonymously)
  • 7. 6 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 6 Step 2: Matching with Orbis (1) «Batch search» (automatic) in two consecutive steps:  Companies : list obtained from Taltac2 by exporting name and identification of the sentence  Persons (defendants): list of defendants obtained by Taltac by exporting graphic forms with semantic tagging «defendants» (multiword graphic form with name and surname or surname and name) and date of birth
  • 8. 7 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 7 Step 2: Matching with Orbis (2) RESULTS of the «Batch search» (automatic) in two consecutive steps:  Companies Input : 400 companies of wihich 228 with A score 186 unique companies (due to the presence of the company name in several judgments or the name written by judges with more variations)  Person Input (defendants): 408 defendants (unique, no repetitions) 16 validated records (automatic comparison between date of birth and part of the social security number) + 6 individual companies A Excellent total score >= 95% B Good total score between 85 and 94% The automated process produces a matching score for each record. Our quality indicator uses the following scoring criteria:
  • 9. 8 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 8 Step 3: Information contribution from Orbis: variables with high information potential What data do we add to those already available?  Company status  Business size  Statistical classification of activities  Start year  Budget data  …. BUT ALSO THE NAMES OF THE TOP MANAGEMENT AND OWNERS Again with a view to anonymous treatment, they can be used to identify a network of companies. Not interesting "per se" (we are not a detective agency) but holders of other individual companies and / or family (founded after the outcome of the judgment). NB: the names of the defendants are clear in the source Corte di Cassazione, as it is the last court level.
  • 10. 9 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 9 TalTaC results: The automatic classification of judgments
  • 11. 10 Cluster 1 (n=119) : presence of organized crime Cluster 2 (n=177) : concussion /corruption in the PA cosca pubblico ufficiale associazione mafiosa concussione associazione privato Nome1 costrizione sodalizio corruzione partecipazione induzione conversazione servizio estorsione CP ndrangheta ufficio clan abuso Nome2 prescrizione Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 10 How to interpret clusters First 11 words characterizing the 2 main identified clusters
  • 12. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 11 Not just text mining but help in the interpretation The interaction between the results of the textual analysis and the new information that can be acquired with other databases (administrative or not) is the novelty of the approach that is presented. The questions we would like to answer: Companies present in sentences have characteristics different from those not present? Do the companies, belonging to a cluster and present in the judgments, differ? Example: Different by company size, economic sector, geographical location?
  • 13. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 12 Regions and companies by cluster Region Cluster 1 Reati + org crim Cluster 2 Reati e PA Total # sentenze # imprese # sentenze # imprese # sentenze # imprese Abruzzo 1 1 1 1 Calabria 11 33 1 1 12 34 Campania 6 21 8 13 14 34 Emilia-Romagna 2 7 2 7 Lazio 1 1 5 6 6 7 Liguria 1 1 1 2 2 3 Lombardia 1 13 6 17 7 30 Marche 3 8 3 8 Molise 1 1 1 1 Piemonte 1 10 1 10 Puglia 5 14 5 14 Sardegna 1 1 1 1 Sicilia 5 13 5 11 10 24 Toscana 1 1 4 10 5 11 Veneto 4 17 4 17 Total 26 83 48 119 74 202 Dati provvisori e parziali
  • 14. 13 National legal form Number of companies Consortium + Consortium with external activity 4 Cooperative company ( SCARL + SCARLPA) 4 Joint stock company - SPA 25 Limited liability company - SRL 121 Limited partnership - SAS 2 One-person company with limited liability - SRLU 21 One-person joint stock company - SPA 3 Sole proprietorship 2 n.d. 4 Total 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 13 Companies by national legal form Provisional and partial data To be added 22 one- person companies obtained from the list of defendants
  • 15. 14 Status number of companies Active 135 Active (default of payment) 1 Bankruptcy 1 Dissolved 5 Dissolved (bankruptcy) 16 Dissolved (liquidation) 5 Dissolved (merger or take-over) 6 In liquidation 11 Status unknown 6 Totale 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 14 Companies by status Provisional and partial data
  • 16. 15 Areas Status Active Others Status unknown Total ITC - Northwest 29 12 1 42 ITH - Northeast 22 12 34 ITI - Centre 33 9 42 ITF - South 26 8 4 38 ITG - Insular Italy 15 4 1 20 (blank) 10 0 10 Total 135 45 6 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 15 Companies by Geographical Areas and status Provisional and partial data Others: Active (default of payment) Bankruptcy Dissolved Dissolved (bankruptcy) Dissolved (liquidation) Dissolved (merger or take-over) In liquidation
  • 17. 16 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 16 Discussion The potential sources of data and information are many and each one is organized according to its own purposes. The use for statistical purposes obliges to have to take into account some aspects, sometimes neglected when talking about Big Data or Open Data: • The completeness of the information • The time base of the information acquired or possibly acquired
  • 18. 17 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 17 Final goal : the «statistical» DataBase The database thus obtained will allow reconstructions and analysis starting from any element (Company, Public Body, persons, period, place, etc) provided that it is correctly identified as such within the texts of the judgments. It is, therefore, necessary to use several tools: Text mining for processing the information contained in the texts of the sentences and transform them into data that can be analysed statistically Validate and integrate this data with other information and data from other administrative databases / records. The greater the completeness and reliability of the other databases, the greater the information value of the statistical analysis carried out on the statistical database.
  • 19. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 18 Credits Un ringraziamento a: Fabrizio Alboni Daniela Arlia Antonella Baldassarini Lorenzo Bartalini Pietro Battiston Sergio Bolasco Alberto di Martino Giuseppe Di Vetta Pasquale Pavone Guido M. Rey