SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
DEALING WITH POOR DATA QUALITY OF
OSINT DATA IN FRAUD RISK ANALYSIS
MAURICE VAN KEULEN
Largest part [of money
to reclaim] is due to
payments to people who
were not entitled to it.
… earlier, it didn’t pay
off to reclaim the
money.
[Telegraaf Jan 2012]
Capelle a/d IJssel 2011
164 cases of fraud
yielded 1.2 million
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 2
“If a strong suspicion of fraud arises, social inspectors
start an investigation with the receiver of social security”
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 3
HOW DOES DIGITAL FRAUD DETECTION WORK?
(IN CASE OF SOCIAL SECURITY FRAUD)
• Data from
applicant
Application
• Data from
governmental
databases
Coupling
• Extraction of
“indicators”
• Data mining:
classification
Fraud risk
analysis
• Selection of
cases from
risk classes
Investigation
Municipalities are responsible for fraud detection.
Inspection ISZW (department of Ministry) assists
them with training the classifiers.
Doesn’t work as well as expected
• Estimation of fraud risk not accurate enough
Main cause: the data represents a “paper reality”
Solution: Enrich data with other independent ‘data traces’
 Independent indicators closer to real-world
 Discrepancy indicators
Where can data traces from the real world also be found?
• Websites, social media
 Open Source Intelligence (OSINT)
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 4
THE “BLIND SPOT”
Auditing (Unit-4)
• Fraudsters will disguise illegitimate transactions by keeping
them “out of the books”
• If you look only in the books, you find nothing missing
• Solution = Find indications of missing transactions
(involved people, goods, money) … all these leave data
traces somewhere …
Asbestos removal (ISZW)
• Less obligatory protection measures for a lower price
• Official price vs. advertised price
• Bad experiences or suspicions mentioned in web forums
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 5
OTHER EXAMPLES
Enriched data
Databases
/
Knowledge
bases
Information
on
websites
Text
fragments
from social
media
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 6
INFORMATION COMBINATION AND ENRICHMENT (ICE)
Web harvesting
• Search
• Navigate
• Extract
• Store
Information extraction
(NLP / IR)
• Entity extraction
• Entity disambig.
• Entity relationships
• Fact extraction
• Sentiment / class
Better indicators
Better risk analysis
Better fraud detection
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 7
WEB HARVESTING
http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 4 0 4 /1 4 -0 3 7 /
http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 4 0 4 /1 4 -0 3 3 /
http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 2 1 1 /1 2 -1 7 2 /
http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 2 0 9 /1 2 -1 2 6 /
http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 2 0 9 /1 2 -1 1 9 /
vid eo の検索結果 約1 1 ,9 0 3 件中 1 - 1 0 件を表示
W alkm an
VAIO
x-アプリ
FeliCa
KV-2 5 DA6 5
RC-S3 2 0
PS4
Xp eria
製品登録
Z2
video
Sony Jap an | ニュ ースリ リ ース | プロフ ェ ッ ショ ナルディ スク に対応するXDCAM ™ 商品…
ソ ニーは、 2 /3 型IT型CCDを搭載し 、 S/Nの向上(6 2 d B)など高画質化を実現し たXDCAM HD 4
2 2 カ ムコ ーダー「 PDW -8 5 0 」 と 、 2 /3 型FIT型CCDを搭載し た「 PDW -7 5 0 」 を発売し ます。
Sony Jap an | ニュ ースリ リ ース | 眼科検査用の顕微鏡に対応し 、 前眼部の映像を高精細に…
ソ ニーは、 眼科検査用の顕微鏡(スリ ッ ト ラ ンプ)に装着し て、 顕微鏡を覗く 医師と 同様の映像を
高精細なフルHDで撮影する、 CM OSフルHDビデオカ メ ラ 「 M CC-5 0 0 M D」 を発売し ます。
Sony Jap an | 5 ,2 0 0 ルーメ ンの高輝度と 設置自由度向上を実現 液晶データ プロジェ ク タ …
ソ ニーは、 6 ㎝未満のプロジェ ク タ ーと し て業界最高輝度の5 2 0 0 ルーメ ン/5 1 0 0 ルーメ ンを実
現し 、 かつレンズシフ ト 調整機能を備え、 設置の自由度を向上し た液晶データ プロジェ ク タ ー「
VPL-CX2 7 5 」 「 VPL-CW 2 7 5 」 をはじ め、 データ プロジェ ク タ ー計6 機種を発売し ます。
Sony Jap an | ニュ ースリ リ ース | 業務用カ メ ラ に装着し 映像・ 各種信号の長距離伝送が可…
ソ ニーは、 業務用カ メ ラ /カ ムコ ーダーにカ メ ラ アダプタ ーを 装着し 、 接続ケーブルを介し てカ メ
ラ コ ント ロールユニッ ト と 接続するこ と で映像・ 各種信号の長距離伝送を可能にし 、 ラ イ ブカ メ
ラ システムを構築可能なカ メ ラ アダプタ ーシステムを2 機種発売し ます。 本システムは、 業務用H
Dカ メ ラ 「 HXC-D7 0 」 、 メ モリ ーカ ムコ ーダー「 PM W -5 0 0 /3 5
Sony Jap an | ニュ ースリ リ ース | 新開発のEマウント 電動ズームレンズを搭載 レンズ交換…
ソ ニーは、 Eマウント システムを採用し 、 総画素数1 6 7 0 万画素APS-Cサイ ズのセンサーを搭載
し た、 レンズ交換式 業務用NXCAM カ ムコ ーダー「 NEX-EA5 0 JH」 を発売し ます。
Sony Jap an | ニュ ースリ リ ース | 幅広い映像制作をサポート するXDCAM HD4 2 2 シリ ー…
1 2 3 4 5 6 7 8 9 1 0 次へ>>
video
OUT BNC ピン HD HDM I ソ ニー IN 映像 Unlim ited 端子 サービス
記録 ミ ニ 対応 SD DVD GENLOCK Vp ジャ ッ ク 可能
:
:
Computers don’t understand:
• Layout of a page
• Meaning of text fragments
• The entities & facts we’re
looking for
Advertising & visual techniques
are very confusing!
Errors in
extracted data
 What is the name of the hotel?
“Essex House Hotel and Suites from $154 USD”
 Where is the hotel located? >60 Paris’s in the world
“This Hilton hotel in Paris looks soooo nice;))”
 Informal language
“Cancun is a MUST! Check this... Hotel Ocean Spa
Cancun 4d 3N w/2 adults from $199 usd”
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 8
INFORMATION EXTRACTION FROM UNSTRUCTURED TEXT
CHALLENGING TASK BECAUSE COMPUTERS CAN’T READ
• Extraction ambiguity
• Structure ambiguity
• Reference ambiguity
Errors in
extracted data
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 9
COMBINING DATA
Keulen, M. (2012) Managing Uncertainty: The Road
Towards Better Data Interoperability. IT - Information
Technology, 54 (3). pp. 138-146. ISSN 1611-2776
Car brand Sales
B.M.W. 25
Mercedes 32
Renault 10
Car brand Sales
BMW 72
Mercedes-Benz 39
Renault 20
Car brand Sales
Bayerische Motoren Werke 8
Mercedes 35
Renault 15
Car brand Sales
B.M.W. 25
Bayerische Motoren Werke 8
BMW 72
Mercedes 67
Mercedes-Benz 39
Renault 45
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 10
… AND THE PROBLEM OF SEMANTIC DUPLICATES
Car brand Sales
B.M.W. 25
Bayerische Motoren Werke 8
BMW 72
Mercedes 67
Mercedes-Benz 39
Renault 45
Preferred customers …
SELECT SUM(Sales)
FROM CarSales
WHERE Sales>100
0
‘No preferred customers’
Finding You on the Internet
Input: name, address(es), phone number(s), email address(es)
 How to find your on-line accounts (twitter, ebay, facebook, runkeeper, …)
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 11
BACK TO FRAUD DETECTION
Persons
ByNameFinder
ByLocationFinder
KnownAccount
Enumerator
other
PersonUpdater
Person
data
Person Pipeline
ProfileExtractor
PhotoExtractor
MsgExtractor
AccountPersister
other
Account
data
Twitter
Accounts
Account Pipeline
EmailExtractor
PhoneExtractor
Language
Extractor
other
MsgPersister
Message
data
Message Pipeline
attributes
Experiment:
• 22 sign up subjects
• 12 with / 10 without
• 15 iterations
 Avg 200 candidates
 11 out of 12 found
• ISZW : 85 subjects
Candidate
accounts
Additional
info found
• All activities involved in coupling and integration of
information systems
Data exchange, conversion, information extraction, integration,
analysis, cleaning, evolution, migration, etc.
• Focus: “in an imperfect world”
Structural heterogeneity, data conflicts, semantic duplicates,
incompleteness, inexactness, ambiguity, errors, etc.
• Clean correct data is only a special case
• Treat data quality problems as a fact of life,
not as something to be repaired afterwards
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 12
RESEARCH FOCUS: DATA INTEROPERABILITY
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 13
MOST DATA QUALITY PROBLEMS
CAN BE MODELED AS UNCERTAINTY IN DATA
Car brand Sales
B.M.W. 25
Bayerische Motoren Werke 8
BMW 72
Mercedes 67
Mercedes-Benz 39
Renault 45
Mercedes 106
Mercedes-Benz 106
1
2
3
4
5
6
X=0
X=0
X=1 Y=0
X=1 Y=1
X=0 4 and 5 different 0.2
X=1 4 and 5 the same 0.8
Y=0 “Mercedes”
correct name
0.5
Y=1 “Mercedes-Benz”
correct name
0.5
B.M.W. / BMW / Bayerische Motoren Werke analogously
Example: semantic duplicates
 Looks like ordinary database
 Several “possible” answers or approximate answers
to queries
 What I showed is discrete uncertainty only;
continuous uncertainty possible
Uncertainty orthogonal to data model
 Relational (SQL) / XML (XPath) / RDF (SPARQL)
/ Reasoning (DataLog)
 Important: Scalability (big data!)
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 14
IMPORTANT TOOL: PROBABILISTIC DATABASE
Sales of “preferred customers”
 SELECT SUM(sales)
FROM carsales
WHERE sales≥ 100
 Answer: 106
 Analyst only bothered with
problems that matter
 Risk = Probability * Impact
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis
INDETERMINISTIC DEDUPLICATION
QUERYING AND RISK ASSESSMENT
SUM(sales) P
0 14%
105 6%
106 56%
211 24%
Second most likely
answer at 24% with
impact factor 2 in
sales (211 vs 106)
Risk of substantially
wrong answer
15
 Web harvesting: layout/navigation/extraction ambiguity
 Possible values with probabilities and dependencies
 Information extraction: extr/structure/ref ambiguity
 Possible values with probabilities and dependencies
 Candidate accounts in finding you on the internet
 Possible (PersonID,AccID) pairs with probabilities
 Associated extracted data with dependencies
 Combining / coupling all this data
 Just more possibilities and dependencies
 Extraction of indicators = querying
 Probabilistic indicators: Possible values with probabilities
 Risk analysis and data mining
 It’s just statistics; they can easily work with probabilistic data
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 16
PROBABILISTIC DATABASES IN FRAUD DETECTION
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 17
PUTTING IT ALL TOGETHER
Person/C
ompany
data
Web / Social
media
Probabilistic
Database
OSINT
harvester
Interpretation
Combination
Indicator
extraction
Fraud Risk
Analysis
Raw
Evidence
Make data quality and
trust issues explicit as
uncertainty in data
Adapted to
probabilistic indicators
Batch-wise
autonomous
harvesting
/ monitoring
 Although data is public, one cannot use it for anything!
 Cooperation with ethicist: Aimee van Wynsberghe
Generic guidelines for working with social network data
To use or not to use: guidelines for researchers
using data from online social networking sites
van Wynsberghe, A. and Been, H. and van Keulen, M. (2013)
 Value trade-off
 People investigated
 People whose account is false positive
 The ISZW
 All Dutch citizens
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 18
INTERMEZZO ON ETHICS
 OSINT additional data source with traces close to the real
world … but hard to extract and produces less quality data
 OSINT requires more automation, autonomy and
robustness
 Modeling data quality problems as uncertainty in data
 Probabilistic database approach for scalability
 In terms of the V’s of Big Data
 Volume
 Velocity
 Variety
 Veracity
25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 19
CONCLUSIONS
my main object of study
(while not forgetting
about the other two)

Más contenido relacionado

La actualidad más candente

Open source intelligence information gathering (OSINT)
Open source intelligence information gathering (OSINT)Open source intelligence information gathering (OSINT)
Open source intelligence information gathering (OSINT)phexcom1
 
Using Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and InterventionUsing Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and InterventionJon Gosier
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Open source intelligence
Open source intelligenceOpen source intelligence
Open source intelligencebalakumaran779
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningHJ van Veen
 
osint - open source Intelligence
osint - open source Intelligenceosint - open source Intelligence
osint - open source IntelligenceOsama Ellahi
 
OSINT for Proactive Defense - RootConf 2019
OSINT for Proactive Defense - RootConf 2019OSINT for Proactive Defense - RootConf 2019
OSINT for Proactive Defense - RootConf 2019RedHunt Labs
 
Investigating online conducting pre-interview research
Investigating online   conducting pre-interview researchInvestigating online   conducting pre-interview research
Investigating online conducting pre-interview researchCase IQ
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringChris Gates
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Safe use of cloud - alternative cloud
Safe use of cloud - alternative cloudSafe use of cloud - alternative cloud
Safe use of cloud - alternative cloudTomppa Järvinen
 
Owasp osint presentation - by adam nurudini
Owasp osint presentation - by adam nurudiniOwasp osint presentation - by adam nurudini
Owasp osint presentation - by adam nurudiniAdam Nurudini
 
Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Kelly Finnerty
 
Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Kelly Finnerty
 

La actualidad más candente (20)

Open source intelligence information gathering (OSINT)
Open source intelligence information gathering (OSINT)Open source intelligence information gathering (OSINT)
Open source intelligence information gathering (OSINT)
 
OSINT - Open Source Intelligence
OSINT - Open Source IntelligenceOSINT - Open Source Intelligence
OSINT - Open Source Intelligence
 
Osint primer
Osint primerOsint primer
Osint primer
 
Using Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and InterventionUsing Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and Intervention
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Open source intelligence
Open source intelligenceOpen source intelligence
Open source intelligence
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine Learning
 
osint - open source Intelligence
osint - open source Intelligenceosint - open source Intelligence
osint - open source Intelligence
 
OSINT for Proactive Defense - RootConf 2019
OSINT for Proactive Defense - RootConf 2019OSINT for Proactive Defense - RootConf 2019
OSINT for Proactive Defense - RootConf 2019
 
Investigating online conducting pre-interview research
Investigating online   conducting pre-interview researchInvestigating online   conducting pre-interview research
Investigating online conducting pre-interview research
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information Gathering
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Safe use of cloud - alternative cloud
Safe use of cloud - alternative cloudSafe use of cloud - alternative cloud
Safe use of cloud - alternative cloud
 
From OSINT to Phishing presentation
From OSINT to Phishing presentationFrom OSINT to Phishing presentation
From OSINT to Phishing presentation
 
Owasp osint presentation - by adam nurudini
Owasp osint presentation - by adam nurudiniOwasp osint presentation - by adam nurudini
Owasp osint presentation - by adam nurudini
 
Osint ashish mistry
Osint ashish mistryOsint ashish mistry
Osint ashish mistry
 
Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...
 
Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...
 

Similar a Dealing with poor data quality of osint data in fraud risk analysis

Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015Neo4j
 
Business intelligence with web data gabc may
Business intelligence with web data   gabc mayBusiness intelligence with web data   gabc may
Business intelligence with web data gabc maySemetis
 
Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j
 
One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...
One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...
One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...Forcepoint LLC
 
Becoming your customer's security partner in the digital age
Becoming your customer's security partner in the digital ageBecoming your customer's security partner in the digital age
Becoming your customer's security partner in the digital ageExponential_e
 
Threat Ready Data: Protect Data from the Inside and the Outside
Threat Ready Data: Protect Data from the Inside and the OutsideThreat Ready Data: Protect Data from the Inside and the Outside
Threat Ready Data: Protect Data from the Inside and the OutsideDLT Solutions
 
Finance and Accounting BPM
Finance and Accounting BPMFinance and Accounting BPM
Finance and Accounting BPMBob Samuels
 
Adaptive Apps: Reimagining the Future - Forrester
Adaptive Apps: Reimagining the Future  - ForresterAdaptive Apps: Reimagining the Future  - Forrester
Adaptive Apps: Reimagining the Future - ForresterApigee | Google Cloud
 
London Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentationLondon Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentationKETL Limited
 
apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...
apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...
apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...apidays
 
Vendor Cybersecurity Governance: Scaling the risk
Vendor Cybersecurity Governance: Scaling the riskVendor Cybersecurity Governance: Scaling the risk
Vendor Cybersecurity Governance: Scaling the riskSarah Clarke
 
Office 365 Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...
Office 365  Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...Office 365  Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...
Office 365 Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...Edge Pereira
 
C01 office 365, DLP data loss preventions, privacy, compliance, regulations
C01 office 365, DLP data loss preventions, privacy, compliance, regulationsC01 office 365, DLP data loss preventions, privacy, compliance, regulations
C01 office 365, DLP data loss preventions, privacy, compliance, regulationsEdge Pereira
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 
Learning from Verizon 2017 Data Breach Investigations Report – The New Targets
Learning from Verizon 2017 Data Breach Investigations Report – The New TargetsLearning from Verizon 2017 Data Breach Investigations Report – The New Targets
Learning from Verizon 2017 Data Breach Investigations Report – The New TargetsUlf Mattsson
 
2019 Cybersecurity Retrospective and a look forward to 2020
2019 Cybersecurity Retrospective and a look forward to 20202019 Cybersecurity Retrospective and a look forward to 2020
2019 Cybersecurity Retrospective and a look forward to 2020Jonathan Cran
 
Splunk conf2014 - Operationalizing Advanced Threat Defense
Splunk conf2014 - Operationalizing Advanced Threat DefenseSplunk conf2014 - Operationalizing Advanced Threat Defense
Splunk conf2014 - Operationalizing Advanced Threat DefenseSplunk
 
Analytics Trends 2015: A below-the-surface look
Analytics Trends 2015: A below-the-surface lookAnalytics Trends 2015: A below-the-surface look
Analytics Trends 2015: A below-the-surface lookDeloitte Canada
 

Similar a Dealing with poor data quality of osint data in fraud risk analysis (20)

Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
 
05_Data.be.pptx
05_Data.be.pptx05_Data.be.pptx
05_Data.be.pptx
 
Business intelligence with web data gabc may
Business intelligence with web data   gabc mayBusiness intelligence with web data   gabc may
Business intelligence with web data gabc may
 
Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More Intelligent
 
One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...
One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...
One Year After WannaCry - Has Anything Changed? A Root Cause Analysis of Data...
 
Becoming your customer's security partner in the digital age
Becoming your customer's security partner in the digital ageBecoming your customer's security partner in the digital age
Becoming your customer's security partner in the digital age
 
Threat Ready Data: Protect Data from the Inside and the Outside
Threat Ready Data: Protect Data from the Inside and the OutsideThreat Ready Data: Protect Data from the Inside and the Outside
Threat Ready Data: Protect Data from the Inside and the Outside
 
Finance and Accounting BPM
Finance and Accounting BPMFinance and Accounting BPM
Finance and Accounting BPM
 
Adaptive Apps: Reimagining the Future - Forrester
Adaptive Apps: Reimagining the Future  - ForresterAdaptive Apps: Reimagining the Future  - Forrester
Adaptive Apps: Reimagining the Future - Forrester
 
London Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentationLondon Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentation
 
apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...
apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...
apidays LIVE Hong Kong 2021 - Federated Learning for Banking by Isaac Wong, W...
 
Vendor Cybersecurity Governance: Scaling the risk
Vendor Cybersecurity Governance: Scaling the riskVendor Cybersecurity Governance: Scaling the risk
Vendor Cybersecurity Governance: Scaling the risk
 
Office 365 Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...
Office 365  Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...Office 365  Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...
Office 365 Data Leakage Protection, DLP, Data Loss Prevention, Privacy, Comp...
 
C01 office 365, DLP data loss preventions, privacy, compliance, regulations
C01 office 365, DLP data loss preventions, privacy, compliance, regulationsC01 office 365, DLP data loss preventions, privacy, compliance, regulations
C01 office 365, DLP data loss preventions, privacy, compliance, regulations
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
Learning from Verizon 2017 Data Breach Investigations Report – The New Targets
Learning from Verizon 2017 Data Breach Investigations Report – The New TargetsLearning from Verizon 2017 Data Breach Investigations Report – The New Targets
Learning from Verizon 2017 Data Breach Investigations Report – The New Targets
 
2019 Cybersecurity Retrospective and a look forward to 2020
2019 Cybersecurity Retrospective and a look forward to 20202019 Cybersecurity Retrospective and a look forward to 2020
2019 Cybersecurity Retrospective and a look forward to 2020
 
Splunk conf2014 - Operationalizing Advanced Threat Defense
Splunk conf2014 - Operationalizing Advanced Threat DefenseSplunk conf2014 - Operationalizing Advanced Threat Defense
Splunk conf2014 - Operationalizing Advanced Threat Defense
 
Analytics Trends 2015: A below-the-surface look
Analytics Trends 2015: A below-the-surface lookAnalytics Trends 2015: A below-the-surface look
Analytics Trends 2015: A below-the-surface look
 

Último

2024 ECOSOC YOUTH FORUM -logistical information - United Nations Economic an...
2024 ECOSOC YOUTH FORUM -logistical information -  United Nations Economic an...2024 ECOSOC YOUTH FORUM -logistical information -  United Nations Economic an...
2024 ECOSOC YOUTH FORUM -logistical information - United Nations Economic an...Christina Parmionova
 
PPT Item # 2 -- Announcements Powerpoint
PPT Item # 2 -- Announcements PowerpointPPT Item # 2 -- Announcements Powerpoint
PPT Item # 2 -- Announcements Powerpointahcitycouncil
 
2023 Barangay Officials pre assumption PPT.pptx
2023 Barangay Officials pre assumption PPT.pptx2023 Barangay Officials pre assumption PPT.pptx
2023 Barangay Officials pre assumption PPT.pptxMariaFionaDuranMerqu
 
Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...
Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...
Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...Amil baba
 
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFOPEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFOMAIRIEORGERUS
 
23rd Infopoverty World Conference - Agenda programme
23rd Infopoverty World Conference - Agenda programme23rd Infopoverty World Conference - Agenda programme
23rd Infopoverty World Conference - Agenda programmeChristina Parmionova
 
Item # 4&5 - 415 & 423 Evans Ave. Replat
Item # 4&5 - 415 & 423 Evans Ave. ReplatItem # 4&5 - 415 & 423 Evans Ave. Replat
Item # 4&5 - 415 & 423 Evans Ave. Replatahcitycouncil
 
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.Christina Parmionova
 
UN DESA: Finance for Development 2024 Report
UN DESA: Finance for Development 2024 ReportUN DESA: Finance for Development 2024 Report
UN DESA: Finance for Development 2024 ReportEnergy for One World
 
Item # 1a --- March 25, 2024 CCM Minutes
Item # 1a --- March 25, 2024 CCM MinutesItem # 1a --- March 25, 2024 CCM Minutes
Item # 1a --- March 25, 2024 CCM Minutesahcitycouncil
 
NL-FR Partnership - Water management roundtable 20240403.pdf
NL-FR Partnership - Water management roundtable 20240403.pdfNL-FR Partnership - Water management roundtable 20240403.pdf
NL-FR Partnership - Water management roundtable 20240403.pdfBertrand Coppin
 
PPT Item # 7 - Demolition & Replacement Structure Processes
PPT Item # 7 - Demolition & Replacement Structure ProcessesPPT Item # 7 - Demolition & Replacement Structure Processes
PPT Item # 7 - Demolition & Replacement Structure Processesahcitycouncil
 
Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.
Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.
Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.Cristal Montañéz
 
Build Tomorrow’s India Today By Making Charity For Poor Students
Build Tomorrow’s India Today By Making Charity For Poor StudentsBuild Tomorrow’s India Today By Making Charity For Poor Students
Build Tomorrow’s India Today By Making Charity For Poor StudentsSERUDS INDIA
 
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.Christina Parmionova
 
ISEIDP in Chikkaballapura, Karnataka, India
ISEIDP in Chikkaballapura, Karnataka, IndiaISEIDP in Chikkaballapura, Karnataka, India
ISEIDP in Chikkaballapura, Karnataka, IndiaTrinity Care Foundation
 
Professional Conduct and ethics lecture.pptx
Professional Conduct and ethics lecture.pptxProfessional Conduct and ethics lecture.pptx
Professional Conduct and ethics lecture.pptxjennysansano2
 
PPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdf
PPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdfPPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdf
PPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdfahcitycouncil
 
Digital Transformation of the Heritage Sector and its Practical Implications
Digital Transformation of the Heritage Sector and its Practical ImplicationsDigital Transformation of the Heritage Sector and its Practical Implications
Digital Transformation of the Heritage Sector and its Practical ImplicationsBeat Estermann
 
ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.
ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.
ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.Christina Parmionova
 

Último (20)

2024 ECOSOC YOUTH FORUM -logistical information - United Nations Economic an...
2024 ECOSOC YOUTH FORUM -logistical information -  United Nations Economic an...2024 ECOSOC YOUTH FORUM -logistical information -  United Nations Economic an...
2024 ECOSOC YOUTH FORUM -logistical information - United Nations Economic an...
 
PPT Item # 2 -- Announcements Powerpoint
PPT Item # 2 -- Announcements PowerpointPPT Item # 2 -- Announcements Powerpoint
PPT Item # 2 -- Announcements Powerpoint
 
2023 Barangay Officials pre assumption PPT.pptx
2023 Barangay Officials pre assumption PPT.pptx2023 Barangay Officials pre assumption PPT.pptx
2023 Barangay Officials pre assumption PPT.pptx
 
Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...
Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...
Uk-NO1 Black magic Specialist Expert in Uk Usa Uae London Canada England Amer...
 
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFOPEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
 
23rd Infopoverty World Conference - Agenda programme
23rd Infopoverty World Conference - Agenda programme23rd Infopoverty World Conference - Agenda programme
23rd Infopoverty World Conference - Agenda programme
 
Item # 4&5 - 415 & 423 Evans Ave. Replat
Item # 4&5 - 415 & 423 Evans Ave. ReplatItem # 4&5 - 415 & 423 Evans Ave. Replat
Item # 4&5 - 415 & 423 Evans Ave. Replat
 
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -17 April.
 
UN DESA: Finance for Development 2024 Report
UN DESA: Finance for Development 2024 ReportUN DESA: Finance for Development 2024 Report
UN DESA: Finance for Development 2024 Report
 
Item # 1a --- March 25, 2024 CCM Minutes
Item # 1a --- March 25, 2024 CCM MinutesItem # 1a --- March 25, 2024 CCM Minutes
Item # 1a --- March 25, 2024 CCM Minutes
 
NL-FR Partnership - Water management roundtable 20240403.pdf
NL-FR Partnership - Water management roundtable 20240403.pdfNL-FR Partnership - Water management roundtable 20240403.pdf
NL-FR Partnership - Water management roundtable 20240403.pdf
 
PPT Item # 7 - Demolition & Replacement Structure Processes
PPT Item # 7 - Demolition & Replacement Structure ProcessesPPT Item # 7 - Demolition & Replacement Structure Processes
PPT Item # 7 - Demolition & Replacement Structure Processes
 
Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.
Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.
Phase 8 Hope For Venezuelan Refugees Soup Meal Program-Periods 4-6.
 
Build Tomorrow’s India Today By Making Charity For Poor Students
Build Tomorrow’s India Today By Making Charity For Poor StudentsBuild Tomorrow’s India Today By Making Charity For Poor Students
Build Tomorrow’s India Today By Making Charity For Poor Students
 
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.
ECOSOC YOUTH FORUM 2024 - Side Events Schedule -16 April.
 
ISEIDP in Chikkaballapura, Karnataka, India
ISEIDP in Chikkaballapura, Karnataka, IndiaISEIDP in Chikkaballapura, Karnataka, India
ISEIDP in Chikkaballapura, Karnataka, India
 
Professional Conduct and ethics lecture.pptx
Professional Conduct and ethics lecture.pptxProfessional Conduct and ethics lecture.pptx
Professional Conduct and ethics lecture.pptx
 
PPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdf
PPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdfPPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdf
PPT Item # 6 - TBG Partners Landscape Architectural Design Services.pdf
 
Digital Transformation of the Heritage Sector and its Practical Implications
Digital Transformation of the Heritage Sector and its Practical ImplicationsDigital Transformation of the Heritage Sector and its Practical Implications
Digital Transformation of the Heritage Sector and its Practical Implications
 
ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.
ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.
ECOSOC YOUTH FORUM 2024 Side Events Schedule-18 April.
 

Dealing with poor data quality of osint data in fraud risk analysis

  • 1. DEALING WITH POOR DATA QUALITY OF OSINT DATA IN FRAUD RISK ANALYSIS MAURICE VAN KEULEN
  • 2. Largest part [of money to reclaim] is due to payments to people who were not entitled to it. … earlier, it didn’t pay off to reclaim the money. [Telegraaf Jan 2012] Capelle a/d IJssel 2011 164 cases of fraud yielded 1.2 million 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 2
  • 3. “If a strong suspicion of fraud arises, social inspectors start an investigation with the receiver of social security” 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 3 HOW DOES DIGITAL FRAUD DETECTION WORK? (IN CASE OF SOCIAL SECURITY FRAUD) • Data from applicant Application • Data from governmental databases Coupling • Extraction of “indicators” • Data mining: classification Fraud risk analysis • Selection of cases from risk classes Investigation Municipalities are responsible for fraud detection. Inspection ISZW (department of Ministry) assists them with training the classifiers.
  • 4. Doesn’t work as well as expected • Estimation of fraud risk not accurate enough Main cause: the data represents a “paper reality” Solution: Enrich data with other independent ‘data traces’  Independent indicators closer to real-world  Discrepancy indicators Where can data traces from the real world also be found? • Websites, social media  Open Source Intelligence (OSINT) 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 4 THE “BLIND SPOT”
  • 5. Auditing (Unit-4) • Fraudsters will disguise illegitimate transactions by keeping them “out of the books” • If you look only in the books, you find nothing missing • Solution = Find indications of missing transactions (involved people, goods, money) … all these leave data traces somewhere … Asbestos removal (ISZW) • Less obligatory protection measures for a lower price • Official price vs. advertised price • Bad experiences or suspicions mentioned in web forums 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 5 OTHER EXAMPLES
  • 6. Enriched data Databases / Knowledge bases Information on websites Text fragments from social media 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 6 INFORMATION COMBINATION AND ENRICHMENT (ICE) Web harvesting • Search • Navigate • Extract • Store Information extraction (NLP / IR) • Entity extraction • Entity disambig. • Entity relationships • Fact extraction • Sentiment / class Better indicators Better risk analysis Better fraud detection
  • 7. 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 7 WEB HARVESTING http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 4 0 4 /1 4 -0 3 7 / http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 4 0 4 /1 4 -0 3 3 / http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 2 1 1 /1 2 -1 7 2 / http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 2 0 9 /1 2 -1 2 6 / http://www.sony.co.jp /SonyInfo/News/Press/2 0 1 2 0 9 /1 2 -1 1 9 / vid eo の検索結果 約1 1 ,9 0 3 件中 1 - 1 0 件を表示 W alkm an VAIO x-アプリ FeliCa KV-2 5 DA6 5 RC-S3 2 0 PS4 Xp eria 製品登録 Z2 video Sony Jap an | ニュ ースリ リ ース | プロフ ェ ッ ショ ナルディ スク に対応するXDCAM ™ 商品… ソ ニーは、 2 /3 型IT型CCDを搭載し 、 S/Nの向上(6 2 d B)など高画質化を実現し たXDCAM HD 4 2 2 カ ムコ ーダー「 PDW -8 5 0 」 と 、 2 /3 型FIT型CCDを搭載し た「 PDW -7 5 0 」 を発売し ます。 Sony Jap an | ニュ ースリ リ ース | 眼科検査用の顕微鏡に対応し 、 前眼部の映像を高精細に… ソ ニーは、 眼科検査用の顕微鏡(スリ ッ ト ラ ンプ)に装着し て、 顕微鏡を覗く 医師と 同様の映像を 高精細なフルHDで撮影する、 CM OSフルHDビデオカ メ ラ 「 M CC-5 0 0 M D」 を発売し ます。 Sony Jap an | 5 ,2 0 0 ルーメ ンの高輝度と 設置自由度向上を実現 液晶データ プロジェ ク タ … ソ ニーは、 6 ㎝未満のプロジェ ク タ ーと し て業界最高輝度の5 2 0 0 ルーメ ン/5 1 0 0 ルーメ ンを実 現し 、 かつレンズシフ ト 調整機能を備え、 設置の自由度を向上し た液晶データ プロジェ ク タ ー「 VPL-CX2 7 5 」 「 VPL-CW 2 7 5 」 をはじ め、 データ プロジェ ク タ ー計6 機種を発売し ます。 Sony Jap an | ニュ ースリ リ ース | 業務用カ メ ラ に装着し 映像・ 各種信号の長距離伝送が可… ソ ニーは、 業務用カ メ ラ /カ ムコ ーダーにカ メ ラ アダプタ ーを 装着し 、 接続ケーブルを介し てカ メ ラ コ ント ロールユニッ ト と 接続するこ と で映像・ 各種信号の長距離伝送を可能にし 、 ラ イ ブカ メ ラ システムを構築可能なカ メ ラ アダプタ ーシステムを2 機種発売し ます。 本システムは、 業務用H Dカ メ ラ 「 HXC-D7 0 」 、 メ モリ ーカ ムコ ーダー「 PM W -5 0 0 /3 5 Sony Jap an | ニュ ースリ リ ース | 新開発のEマウント 電動ズームレンズを搭載 レンズ交換… ソ ニーは、 Eマウント システムを採用し 、 総画素数1 6 7 0 万画素APS-Cサイ ズのセンサーを搭載 し た、 レンズ交換式 業務用NXCAM カ ムコ ーダー「 NEX-EA5 0 JH」 を発売し ます。 Sony Jap an | ニュ ースリ リ ース | 幅広い映像制作をサポート するXDCAM HD4 2 2 シリ ー… 1 2 3 4 5 6 7 8 9 1 0 次へ>> video OUT BNC ピン HD HDM I ソ ニー IN 映像 Unlim ited 端子 サービス 記録 ミ ニ 対応 SD DVD GENLOCK Vp ジャ ッ ク 可能 : : Computers don’t understand: • Layout of a page • Meaning of text fragments • The entities & facts we’re looking for Advertising & visual techniques are very confusing! Errors in extracted data
  • 8.  What is the name of the hotel? “Essex House Hotel and Suites from $154 USD”  Where is the hotel located? >60 Paris’s in the world “This Hilton hotel in Paris looks soooo nice;))”  Informal language “Cancun is a MUST! Check this... Hotel Ocean Spa Cancun 4d 3N w/2 adults from $199 usd” 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 8 INFORMATION EXTRACTION FROM UNSTRUCTURED TEXT CHALLENGING TASK BECAUSE COMPUTERS CAN’T READ • Extraction ambiguity • Structure ambiguity • Reference ambiguity Errors in extracted data
  • 9. 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 9 COMBINING DATA Keulen, M. (2012) Managing Uncertainty: The Road Towards Better Data Interoperability. IT - Information Technology, 54 (3). pp. 138-146. ISSN 1611-2776 Car brand Sales B.M.W. 25 Mercedes 32 Renault 10 Car brand Sales BMW 72 Mercedes-Benz 39 Renault 20 Car brand Sales Bayerische Motoren Werke 8 Mercedes 35 Renault 15 Car brand Sales B.M.W. 25 Bayerische Motoren Werke 8 BMW 72 Mercedes 67 Mercedes-Benz 39 Renault 45
  • 10. 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 10 … AND THE PROBLEM OF SEMANTIC DUPLICATES Car brand Sales B.M.W. 25 Bayerische Motoren Werke 8 BMW 72 Mercedes 67 Mercedes-Benz 39 Renault 45 Preferred customers … SELECT SUM(Sales) FROM CarSales WHERE Sales>100 0 ‘No preferred customers’
  • 11. Finding You on the Internet Input: name, address(es), phone number(s), email address(es)  How to find your on-line accounts (twitter, ebay, facebook, runkeeper, …) 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 11 BACK TO FRAUD DETECTION Persons ByNameFinder ByLocationFinder KnownAccount Enumerator other PersonUpdater Person data Person Pipeline ProfileExtractor PhotoExtractor MsgExtractor AccountPersister other Account data Twitter Accounts Account Pipeline EmailExtractor PhoneExtractor Language Extractor other MsgPersister Message data Message Pipeline attributes Experiment: • 22 sign up subjects • 12 with / 10 without • 15 iterations  Avg 200 candidates  11 out of 12 found • ISZW : 85 subjects Candidate accounts Additional info found
  • 12. • All activities involved in coupling and integration of information systems Data exchange, conversion, information extraction, integration, analysis, cleaning, evolution, migration, etc. • Focus: “in an imperfect world” Structural heterogeneity, data conflicts, semantic duplicates, incompleteness, inexactness, ambiguity, errors, etc. • Clean correct data is only a special case • Treat data quality problems as a fact of life, not as something to be repaired afterwards 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 12 RESEARCH FOCUS: DATA INTEROPERABILITY
  • 13. 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 13 MOST DATA QUALITY PROBLEMS CAN BE MODELED AS UNCERTAINTY IN DATA Car brand Sales B.M.W. 25 Bayerische Motoren Werke 8 BMW 72 Mercedes 67 Mercedes-Benz 39 Renault 45 Mercedes 106 Mercedes-Benz 106 1 2 3 4 5 6 X=0 X=0 X=1 Y=0 X=1 Y=1 X=0 4 and 5 different 0.2 X=1 4 and 5 the same 0.8 Y=0 “Mercedes” correct name 0.5 Y=1 “Mercedes-Benz” correct name 0.5 B.M.W. / BMW / Bayerische Motoren Werke analogously Example: semantic duplicates
  • 14.  Looks like ordinary database  Several “possible” answers or approximate answers to queries  What I showed is discrete uncertainty only; continuous uncertainty possible Uncertainty orthogonal to data model  Relational (SQL) / XML (XPath) / RDF (SPARQL) / Reasoning (DataLog)  Important: Scalability (big data!) 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 14 IMPORTANT TOOL: PROBABILISTIC DATABASE
  • 15. Sales of “preferred customers”  SELECT SUM(sales) FROM carsales WHERE sales≥ 100  Answer: 106  Analyst only bothered with problems that matter  Risk = Probability * Impact 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis INDETERMINISTIC DEDUPLICATION QUERYING AND RISK ASSESSMENT SUM(sales) P 0 14% 105 6% 106 56% 211 24% Second most likely answer at 24% with impact factor 2 in sales (211 vs 106) Risk of substantially wrong answer 15
  • 16.  Web harvesting: layout/navigation/extraction ambiguity  Possible values with probabilities and dependencies  Information extraction: extr/structure/ref ambiguity  Possible values with probabilities and dependencies  Candidate accounts in finding you on the internet  Possible (PersonID,AccID) pairs with probabilities  Associated extracted data with dependencies  Combining / coupling all this data  Just more possibilities and dependencies  Extraction of indicators = querying  Probabilistic indicators: Possible values with probabilities  Risk analysis and data mining  It’s just statistics; they can easily work with probabilistic data 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 16 PROBABILISTIC DATABASES IN FRAUD DETECTION
  • 17. 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 17 PUTTING IT ALL TOGETHER Person/C ompany data Web / Social media Probabilistic Database OSINT harvester Interpretation Combination Indicator extraction Fraud Risk Analysis Raw Evidence Make data quality and trust issues explicit as uncertainty in data Adapted to probabilistic indicators Batch-wise autonomous harvesting / monitoring
  • 18.  Although data is public, one cannot use it for anything!  Cooperation with ethicist: Aimee van Wynsberghe Generic guidelines for working with social network data To use or not to use: guidelines for researchers using data from online social networking sites van Wynsberghe, A. and Been, H. and van Keulen, M. (2013)  Value trade-off  People investigated  People whose account is false positive  The ISZW  All Dutch citizens 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 18 INTERMEZZO ON ETHICS
  • 19.  OSINT additional data source with traces close to the real world … but hard to extract and produces less quality data  OSINT requires more automation, autonomy and robustness  Modeling data quality problems as uncertainty in data  Probabilistic database approach for scalability  In terms of the V’s of Big Data  Volume  Velocity  Variety  Veracity 25 Feb 2015Dealing with poor data quality of OSINT data in fraud risk analysis 19 CONCLUSIONS my main object of study (while not forgetting about the other two)

Notas del editor

  1. Asbestos interesting: protection measures not observable in data. inspection will work. But here we like to optimize the inspections.
  2. They already do this for “dossier analysis” on individual basis
  3. With OSINT data, this problem of semantic duplicates is enormous .,..
  4. Also illustration on a web harvesting set-up This is only Twitter, but if done with more social media accounts, an uncareful tweet may help to resolve, say, a facebook account.
  5. Notice that all these are “tables”
  6. Isn’t this nice: all these data quality problems are now in one form, readily usable. Many data quality problems need not even be solved!
  7. Although requesting welfare support is not really by choice, receiver is not obliged to do so => By requesting welfare support, someone voluntarily gives up some privacy to allow the government to investigate if he rightfully does so.