SlideShare una empresa de Scribd logo
1 de 29
Benford’s Law… Is it magic? Gaetan “Guy” Lion  July 2010
What is the probability that the population number of any country starts with any of the first digit: 1,2,3,4,5,6,7,8, or 9? ,[object Object]
…  The Correct Answer
Countries populations follow Benford’s Law Chi Square P value the two distributions are the same: 0.8
Benford’s Law ,[object Object],[object Object],[object Object]
When does this law work? The data crosses at least one scale (or order of magnitude) as shown below:  You preferably need a sample > 100.
Demographic data follows  Benford Law very closely The U.S. has over 3,000 counties.  All shown demographic measures follow Benford’s Law pretty closely.  This very large sample renders the Chi Square Goodness of fit test very (if not excessively) rigorous.
NYSE Stocks volume This captures the first digit frequency of volume of over 2,000 NYSE stocks on June 21 st . The fit is excellent both visually and statistically.
PG&E SmartMeter test This captures 91 observations between April and July 2010 of analog vs SmartMeter kWh consumption readings.  Both the visual and statistical fit are pretty good.
Tennis pros ATP points The number of ATP points of the first 1,600 professional tennis players follow closely Benford’s Law.  Because of the large sample the associated P value is small.
Even when it is not supposed to  work…  It kind of does. I investigated Bernie Madoff’s monthly returns vs its closest competitor (GATEX).  Although those data sets were not fit to use Benford’s Law the visual fit was surprisingly good.
Is Benford Law magic?   Bacteria > No, a simple rule is that there are more small things than large things in the universe…
…  a simple explanation… The general principle is that there are more smaller observations vs larger ones.  There are probably nearly twice as many 1s as there are 2s and three times as many 1s as there are 3s, etc…  Using such a principle throughout gives us a frequency that is close to Benford’s Law.  We would need a sample > 1,000 to reach statistical significance at the 0.05 level that those two distributions are different.
Extending Benford’s Law beyond first digit ,[object Object]
Benford vs Simple rule for first two digits When dealing with first two digits (10 – 99), Benford’s Law and the Simple Rule have indistinguishable distributions.  You would need samples > 700,000 to reach statistical significance at the 0.05 level that the two distributions are different.
Time series growing by 2% per period A time series growing by 2% per period over 116 periods replicates almost exactly Benford’s Law frequency distribution.  This makes sense.  The difference between 1 and 2 is a 100% increase vs between 2 and 3 is only a a 50% increase, etc…  This entails there will be a lot more 1s than other digits.
Math properties of Benford’s Law ,[object Object],[object Object]
The Ones Scaling Test Looking at tax return numbers that followed BL closely, someone used the Ones Scaling Test to see if the number of “1s” would remain the same if multiplied by a constant.  In this case, they multiplied the set of numbers by 1.01 and did that 696 times.  This corresponds to multiplying the numbers progressively up to a factor of 1,000 as 1.01^696 = 1,000.  As shown, across all iterations the number of 1s remained very stable around the BL predicated level of 30.1%.  Source: “The Scientist and Engineer’s Guide to Digital Signal Processing.  Steve Smith, PhD.
[object Object],[object Object]
A few Benford’s Law applications… ,[object Object],[object Object],[object Object],[object Object]
Iran Election Mahmoud Ahmadinejad's vote totals have more '2s' and fewer '1s' than expected.  Roukema speculates Iranian officials replaced 1s by 2s.  So, for instance, in some town where he received 1,954 votes, they would report his having received 2,954 votes.  Source: Nate Silver.  fivethirtyeight.com
Franken Vote count “…This hugely violates Benford's Law -- there are not nearly enough totals beginning in 1 and too many beginning in numbers like 5, 6 and 7. The odds of these anomalies having occurred by chance are greater than a quadrillion to one against…  the reason this pattern emerges is because precinct sizes in Minnesota are not truly random .  There is a large number of precincts in Minnesota that are designed to serve between 1,000 and 2,000 voters; since Franken won about 42 percent of the votes statewide, this leads to a relatively high number of instances where his vote totals are in the high single digits (672, 704, 588, etc.)” Source: Nate Silver. fivethirtyeight.com Senator
Inspector Clouseau demonstrates how to run a fraud investigation
Detecting fraud (an example).  Step 1   A company issued 483 checks in 2009 Q4 that was audited and everything checked out.  It also issued 522 checks in 2010 Q1.  A fraud investigator notes that 09 Q4 pattern fit Benford Law very closely (P value 0.84).  He notes that the fit deteriorated in 010 Q1 9 (P value 0.06).
Step 2. Focus on the difference As shown, the company has issued many more checks starting with the ‘6’ digit than expected (60 vs 35 for BL).
Step 3. Focus on the 6s first two digits We have 28 checks out of 522 starting with the two digits 66 vs 3.4 expected per Benford’s Law.  This calls for further investigation.
Step 4.  Focus on the 66s to three digits   Carrying this analysis to the first three digits, we see an unusual # of checks starting with ‘666’ and ‘668.’  Later, we find that the checks starting with ‘666’ were legitimate ones that four employees wrote to pay for a monthly service that cost $5.95 per month plus tax or $6.66 with tax.  Meanwhile, 9 of the 10 checks starting with ‘668’ were fraudulent ones.
Replicating Clouseau’s success ,[object Object],[object Object],[object Object]
The Key ,[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

Profile of a Serial Killer by Ronee\' Simmons
Profile of a Serial Killer by Ronee\' SimmonsProfile of a Serial Killer by Ronee\' Simmons
Profile of a Serial Killer by Ronee\' Simmons
rasimmons88
 
Service marketing management of amazon
Service marketing management of amazonService marketing management of amazon
Service marketing management of amazon
Bendita Baylôn Ü
 
Crime Scene Investigations
Crime Scene InvestigationsCrime Scene Investigations
Crime Scene Investigations
CTIN
 

La actualidad más candente (20)

Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
 
Quantifiler™ Trio: Decision-support to help streamline Sexual Assault sample ...
Quantifiler™ Trio: Decision-support to help streamline Sexual Assault sample ...Quantifiler™ Trio: Decision-support to help streamline Sexual Assault sample ...
Quantifiler™ Trio: Decision-support to help streamline Sexual Assault sample ...
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan DataCredit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
Credit EDA Case Study : Exploratory Data Analysis on Bank Loan Data
 
Data analysis for business decisions
Data analysis for business decisionsData analysis for business decisions
Data analysis for business decisions
 
Profile of a Serial Killer by Ronee\' Simmons
Profile of a Serial Killer by Ronee\' SimmonsProfile of a Serial Killer by Ronee\' Simmons
Profile of a Serial Killer by Ronee\' Simmons
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Enron scam- Role of derivatives
Enron scam- Role of derivativesEnron scam- Role of derivatives
Enron scam- Role of derivatives
 
State v. Mott: A Case Study in Forensic Science
State v. Mott: A Case Study in Forensic ScienceState v. Mott: A Case Study in Forensic Science
State v. Mott: A Case Study in Forensic Science
 
Service marketing management of amazon
Service marketing management of amazonService marketing management of amazon
Service marketing management of amazon
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Credit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationCredit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client Presentation
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Data visualization for e commerce of jcpenney
Data visualization for e commerce of jcpenneyData visualization for e commerce of jcpenney
Data visualization for e commerce of jcpenney
 
Forensic Nanotechnology
Forensic NanotechnologyForensic Nanotechnology
Forensic Nanotechnology
 
Crime Scene Investigations
Crime Scene InvestigationsCrime Scene Investigations
Crime Scene Investigations
 
Murder Case Studies - Investigation Insights
Murder Case Studies - Investigation InsightsMurder Case Studies - Investigation Insights
Murder Case Studies - Investigation Insights
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 

Destacado

Destacado (7)

Using Benford's Law for Fraud Detection and Auditing
Using Benford's Law for Fraud Detection and AuditingUsing Benford's Law for Fraud Detection and Auditing
Using Benford's Law for Fraud Detection and Auditing
 
Using benford's law for fraud detection and auditing
Using benford's law for fraud detection and auditingUsing benford's law for fraud detection and auditing
Using benford's law for fraud detection and auditing
 
Descriptive Analysis of Benford's Law
Descriptive Analysis of Benford's LawDescriptive Analysis of Benford's Law
Descriptive Analysis of Benford's Law
 
Benford's Law: How to Use it to Detect Fraud in Financial Data
Benford's Law: How to Use it to Detect Fraud in Financial DataBenford's Law: How to Use it to Detect Fraud in Financial Data
Benford's Law: How to Use it to Detect Fraud in Financial Data
 
Memristor - The Missing Element
Memristor - The Missing ElementMemristor - The Missing Element
Memristor - The Missing Element
 
[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation
 
Project Management PERT and CPM
Project Management PERT and CPMProject Management PERT and CPM
Project Management PERT and CPM
 

Similar a Benford's law

Final Initial Project Development With Discussion
Final Initial Project Development With DiscussionFinal Initial Project Development With Discussion
Final Initial Project Development With Discussion
easternman99
 
Data quality applications_of_benford's_law_(finalv2)
Data quality applications_of_benford's_law_(finalv2)Data quality applications_of_benford's_law_(finalv2)
Data quality applications_of_benford's_law_(finalv2)
Robert Hillard
 
Cartel detection and collusion screening: an empirical analysis of the London...
Cartel detection and collusion screening: an empirical analysis of the London...Cartel detection and collusion screening: an empirical analysis of the London...
Cartel detection and collusion screening: an empirical analysis of the London...
Dr Danilo Samà
 
Ballot Problem for Many Candidates
Ballot Problem for Many CandidatesBallot Problem for Many Candidates
Ballot Problem for Many Candidates
guest838786
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
pooleavelina
 
Nov-Dec INTA -FAY TEPLITSKY
Nov-Dec INTA -FAY TEPLITSKYNov-Dec INTA -FAY TEPLITSKY
Nov-Dec INTA -FAY TEPLITSKY
Fay Teplitsky
 
1 Econometrics Final Exam Summer 2017 This exam i.docx
1  Econometrics Final Exam Summer 2017 This exam i.docx1  Econometrics Final Exam Summer 2017 This exam i.docx
1 Econometrics Final Exam Summer 2017 This exam i.docx
honey725342
 
Finding Patterns in Data Breaches
Finding Patterns in Data BreachesFinding Patterns in Data Breaches
Finding Patterns in Data Breaches
Rochester Security Summit
 
01 Descriptive Statistics for Exploring Data.pdf
01 Descriptive Statistics for Exploring Data.pdf01 Descriptive Statistics for Exploring Data.pdf
01 Descriptive Statistics for Exploring Data.pdf
SREDDINIRANJAN
 

Similar a Benford's law (20)

Mathematics ib ia example
Mathematics ib ia exampleMathematics ib ia example
Mathematics ib ia example
 
Final Initial Project Development With Discussion
Final Initial Project Development With DiscussionFinal Initial Project Development With Discussion
Final Initial Project Development With Discussion
 
Vote Counts and Benford's Law
Vote Counts and Benford's LawVote Counts and Benford's Law
Vote Counts and Benford's Law
 
Data quality applications_of_benford's_law_(finalv2)
Data quality applications_of_benford's_law_(finalv2)Data quality applications_of_benford's_law_(finalv2)
Data quality applications_of_benford's_law_(finalv2)
 
Benfords law o vs e
Benfords law o vs eBenfords law o vs e
Benfords law o vs e
 
Modelling Conformity of Nigeria’s Recent Population Censuses With Benford’s D...
Modelling Conformity of Nigeria’s Recent Population Censuses With Benford’s D...Modelling Conformity of Nigeria’s Recent Population Censuses With Benford’s D...
Modelling Conformity of Nigeria’s Recent Population Censuses With Benford’s D...
 
Data forensics with R and Power BI
Data forensics with R and Power BIData forensics with R and Power BI
Data forensics with R and Power BI
 
10 ways to identify Accounts Payable fraud Pt1
10 ways to identify Accounts Payable fraud Pt110 ways to identify Accounts Payable fraud Pt1
10 ways to identify Accounts Payable fraud Pt1
 
Cartel detection and collusion screening: an empirical analysis of the London...
Cartel detection and collusion screening: an empirical analysis of the London...Cartel detection and collusion screening: an empirical analysis of the London...
Cartel detection and collusion screening: an empirical analysis of the London...
 
Ballot Problem for Many Candidates
Ballot Problem for Many CandidatesBallot Problem for Many Candidates
Ballot Problem for Many Candidates
 
The Southbourne Tax Group: 10 Ways to Identify Accounts Payable Fraud
The Southbourne Tax Group: 10 Ways to Identify Accounts Payable FraudThe Southbourne Tax Group: 10 Ways to Identify Accounts Payable Fraud
The Southbourne Tax Group: 10 Ways to Identify Accounts Payable Fraud
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
 
Nov-Dec INTA -FAY TEPLITSKY
Nov-Dec INTA -FAY TEPLITSKYNov-Dec INTA -FAY TEPLITSKY
Nov-Dec INTA -FAY TEPLITSKY
 
226 lec9 jda
226 lec9 jda226 lec9 jda
226 lec9 jda
 
1 Econometrics Final Exam Summer 2017 This exam i.docx
1  Econometrics Final Exam Summer 2017 This exam i.docx1  Econometrics Final Exam Summer 2017 This exam i.docx
1 Econometrics Final Exam Summer 2017 This exam i.docx
 
Finding Patterns in Data Breaches
Finding Patterns in Data BreachesFinding Patterns in Data Breaches
Finding Patterns in Data Breaches
 
01 Descriptive Statistics for Exploring Data.pdf
01 Descriptive Statistics for Exploring Data.pdf01 Descriptive Statistics for Exploring Data.pdf
01 Descriptive Statistics for Exploring Data.pdf
 
benfords Law
benfords Lawbenfords Law
benfords Law
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdf
 
Definition Essay Sample High School Research Paper
Definition Essay Sample High School Research PaperDefinition Essay Sample High School Research Paper
Definition Essay Sample High School Research Paper
 

Más de Gaetan Lion

Más de Gaetan Lion (20)

DRU projections testing.pptx
DRU projections testing.pptxDRU projections testing.pptx
DRU projections testing.pptx
 
Climate Change in 24 US Cities
Climate Change in 24 US CitiesClimate Change in 24 US Cities
Climate Change in 24 US Cities
 
Compact Letter Display (CLD). How it works
Compact Letter Display (CLD).  How it worksCompact Letter Display (CLD).  How it works
Compact Letter Display (CLD). How it works
 
CalPERS pensions vs. Social Security
CalPERS pensions vs. Social SecurityCalPERS pensions vs. Social Security
CalPERS pensions vs. Social Security
 
Recessions.pptx
Recessions.pptxRecessions.pptx
Recessions.pptx
 
Inequality in the United States
Inequality in the United StatesInequality in the United States
Inequality in the United States
 
Housing Price Models
Housing Price ModelsHousing Price Models
Housing Price Models
 
Global Aging.pdf
Global Aging.pdfGlobal Aging.pdf
Global Aging.pdf
 
Cryptocurrencies as an asset class
Cryptocurrencies as an asset classCryptocurrencies as an asset class
Cryptocurrencies as an asset class
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?
 
Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?
 
How overvalued is the Stock Market?
How overvalued is the Stock Market? How overvalued is the Stock Market?
How overvalued is the Stock Market?
 
The relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest RatesThe relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest Rates
 
Life expectancy
Life expectancyLife expectancy
Life expectancy
 
Comparing R vs. Python for data visualization
Comparing R vs. Python for data visualizationComparing R vs. Python for data visualization
Comparing R vs. Python for data visualization
 
Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?
 
Standardization
StandardizationStandardization
Standardization
 
Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?
 
Regularization why you should avoid them
Regularization why you should avoid themRegularization why you should avoid them
Regularization why you should avoid them
 
Basketball the 3 pt game
Basketball the 3 pt gameBasketball the 3 pt game
Basketball the 3 pt game
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Último (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Benford's law

  • 1. Benford’s Law… Is it magic? Gaetan “Guy” Lion July 2010
  • 2.
  • 3. … The Correct Answer
  • 4. Countries populations follow Benford’s Law Chi Square P value the two distributions are the same: 0.8
  • 5.
  • 6. When does this law work? The data crosses at least one scale (or order of magnitude) as shown below: You preferably need a sample > 100.
  • 7. Demographic data follows Benford Law very closely The U.S. has over 3,000 counties. All shown demographic measures follow Benford’s Law pretty closely. This very large sample renders the Chi Square Goodness of fit test very (if not excessively) rigorous.
  • 8. NYSE Stocks volume This captures the first digit frequency of volume of over 2,000 NYSE stocks on June 21 st . The fit is excellent both visually and statistically.
  • 9. PG&E SmartMeter test This captures 91 observations between April and July 2010 of analog vs SmartMeter kWh consumption readings. Both the visual and statistical fit are pretty good.
  • 10. Tennis pros ATP points The number of ATP points of the first 1,600 professional tennis players follow closely Benford’s Law. Because of the large sample the associated P value is small.
  • 11. Even when it is not supposed to work… It kind of does. I investigated Bernie Madoff’s monthly returns vs its closest competitor (GATEX). Although those data sets were not fit to use Benford’s Law the visual fit was surprisingly good.
  • 12. Is Benford Law magic? Bacteria > No, a simple rule is that there are more small things than large things in the universe…
  • 13. … a simple explanation… The general principle is that there are more smaller observations vs larger ones. There are probably nearly twice as many 1s as there are 2s and three times as many 1s as there are 3s, etc… Using such a principle throughout gives us a frequency that is close to Benford’s Law. We would need a sample > 1,000 to reach statistical significance at the 0.05 level that those two distributions are different.
  • 14.
  • 15. Benford vs Simple rule for first two digits When dealing with first two digits (10 – 99), Benford’s Law and the Simple Rule have indistinguishable distributions. You would need samples > 700,000 to reach statistical significance at the 0.05 level that the two distributions are different.
  • 16. Time series growing by 2% per period A time series growing by 2% per period over 116 periods replicates almost exactly Benford’s Law frequency distribution. This makes sense. The difference between 1 and 2 is a 100% increase vs between 2 and 3 is only a a 50% increase, etc… This entails there will be a lot more 1s than other digits.
  • 17.
  • 18. The Ones Scaling Test Looking at tax return numbers that followed BL closely, someone used the Ones Scaling Test to see if the number of “1s” would remain the same if multiplied by a constant. In this case, they multiplied the set of numbers by 1.01 and did that 696 times. This corresponds to multiplying the numbers progressively up to a factor of 1,000 as 1.01^696 = 1,000. As shown, across all iterations the number of 1s remained very stable around the BL predicated level of 30.1%. Source: “The Scientist and Engineer’s Guide to Digital Signal Processing. Steve Smith, PhD.
  • 19.
  • 20.
  • 21. Iran Election Mahmoud Ahmadinejad's vote totals have more '2s' and fewer '1s' than expected. Roukema speculates Iranian officials replaced 1s by 2s. So, for instance, in some town where he received 1,954 votes, they would report his having received 2,954 votes. Source: Nate Silver. fivethirtyeight.com
  • 22. Franken Vote count “…This hugely violates Benford's Law -- there are not nearly enough totals beginning in 1 and too many beginning in numbers like 5, 6 and 7. The odds of these anomalies having occurred by chance are greater than a quadrillion to one against… the reason this pattern emerges is because precinct sizes in Minnesota are not truly random . There is a large number of precincts in Minnesota that are designed to serve between 1,000 and 2,000 voters; since Franken won about 42 percent of the votes statewide, this leads to a relatively high number of instances where his vote totals are in the high single digits (672, 704, 588, etc.)” Source: Nate Silver. fivethirtyeight.com Senator
  • 23. Inspector Clouseau demonstrates how to run a fraud investigation
  • 24. Detecting fraud (an example). Step 1 A company issued 483 checks in 2009 Q4 that was audited and everything checked out. It also issued 522 checks in 2010 Q1. A fraud investigator notes that 09 Q4 pattern fit Benford Law very closely (P value 0.84). He notes that the fit deteriorated in 010 Q1 9 (P value 0.06).
  • 25. Step 2. Focus on the difference As shown, the company has issued many more checks starting with the ‘6’ digit than expected (60 vs 35 for BL).
  • 26. Step 3. Focus on the 6s first two digits We have 28 checks out of 522 starting with the two digits 66 vs 3.4 expected per Benford’s Law. This calls for further investigation.
  • 27. Step 4. Focus on the 66s to three digits Carrying this analysis to the first three digits, we see an unusual # of checks starting with ‘666’ and ‘668.’ Later, we find that the checks starting with ‘666’ were legitimate ones that four employees wrote to pay for a monthly service that cost $5.95 per month plus tax or $6.66 with tax. Meanwhile, 9 of the 10 checks starting with ‘668’ were fraudulent ones.
  • 28.
  • 29.