SlideShare una empresa de Scribd logo
1 de 38
Hadoop for Large-scale Biometric Databases Jason Trost Cloud Computing Team Booz | Allen | Hamilton
This session shows the application of Hadoop and a large-scale, low-latency distributed fuzzy matching database to Biometrics Background - what you need to know about Biometrics The Problem – Big Data and unordered fuzzy matching A Solution - Hadoop Applications for Biometrics Session Agenda
Key Takeaways from this Session Searching large-scale Biometric Databases is a hard problem Hadoop is a potential solution to this problem Hadoop is a great platform for solving all sorts of Big Data and distributed computing problems, even low latency searching 3
4 Introduction to Biometrics Iris Face Fingerprint Biometrics: The science of establishing the identity of an individual based on the physical, chemical, or behavioral attributes of the person *  Modality: Physical or behavioral characteristics of an individual used to establish identity* Template: A symbolic or numeric representation of a modality optimized for storage and/or matching Palm Print Gait Hand Geometry Signature Ear Voice Keystroke Pattern Facial Thermogram Vein Pattern *   Handbook of Biometrics. A. Jain, P. Flynn A. Ross.
Assist with criminal investigations (e.g. crime scene fingerprints) Identify individuals entering and leaving the country Surveillance 5 Why are Biometrics Important? ,[object Object]
It has many useful applications where establishing identity is important
Banks and Financial Services companies are using biometrics to prevent banking and identity fraud
National governments are creating biometric databases for law enforcement & security reasons:,[object Object]
Enrollment: Adding New Identities and Biometrics Data to the Database Collect biographic information from an individual such as name, address, SSN, etc Capture biometric data in raw form (e.g. high resolution images) Transform raw biometric data into encoded biometric template (feature vector) Store all this information in the biometrics database 7
Verification: One-to-one Matching Lookup the biometric template for a particular individual Verify that the stored template and the recently captured template match Fuzzy matching is used for matching the biometric templates 8
Identification: One-to-Many Searching Capture some number of raw Biometric features, convert them into Biometric templates Perform fuzzy matching against large number of stored biometric templates to determine the identity If latency is not an issue, this is relatively straightforward, especially in MapReduce This is a hard problem for low latency applications and increasing in complexity as the size of these databases grow There is a speed/accuracy tradeoff The search space can be reduced using clustering techniques, but this only goes so far 9
What is Fuzzy matching? Fuzzy matching is an operation performed on two objects that determines how similar the objects are to each other Typically this operation produces a numeric similarity score Necessary when data collected from sensor is noisy, and matching needs to be very accurate Almost all biometric matching algorithms perform some sort of fuzzy matching: Elastic Bunch Graph Matching – face recognition algorithm BOZORTH3 - minutiae based fingerprint matching algorithm IrisCode - iris matching algorithm Other Examples: Image comparison Audio comparison Video comparison 10
Why Fuzzy Matching? Biometric data is inherently noisy and dirty Conditions are not exactly the same when the original biometric data was captured (Enrollment) and when a new reading occurs (Identification) Different types of cameras and sensors made by different companies Partial or smudged fingerprints (e.g. crime scene) Changes in skin tone, facial hair, makeup Different lighting conditions Aging and skin damage Weight gain, Weight loss Injury Derived from http://www.flickr.com/photos/glennji/3558118429/. Licensed under Creative Commons 11
Existing Large-scale Biometric Databases US Visitor & Immigrant Status Indicator Technology (US-VISIT)* International travelers’ biometrics (fingerprint and face) Collected at US ports of entry, Immigration Services, and State Department Used to support the Department of Homeland Security's mission FBI Integrated Automated Fingerprint Identification System, (IAFIS)** Used to solve and prevent crime and catch criminals and terrorists Includes fingerprints, criminal histories, mug shots, scars and tattoo photos, physical characteristics like height, weight, and hair and eye color, and aliases AllTrust Networks Paycheck Secure System Uses fingerprints to support secure check cashing Designed to stop fraud and speed check cashing Plus many more 12 *     One Team, One Mission, Securing our Homeland. US DHS.  **    http://www.fbi.gov/hq/cjisd/iafis/iafis_facts.htm ***  http://www.alltrustnetworks.com/News/6Million/tabid/378/Default.aspx
This session shows the application of Hadoop and a large-scale, low-latency distributed fuzzy matching database to Biometrics Background - what you need to know about Biometrics The Problem – Big Data and unordered fuzzy matching A Solution - Hadoop Applications for Biometrics Session Agenda
Combined U.S. government biometric databases are expected to grow to hold billions of identities The DHS’s US-VISIT program has the world’s largest and fastest biometric database (called IDENT) with over 110 million identities and roughly 145,000 identities enrolled or verified daily* From the FBI’s Integrated Automated Fingerprint Identification System (IAFIS) alone, there are 66.5 million identities with 8,000-10,000 more subjects added each day ** India is reportedly creating a biometric database to hold the fingerprints and face images for each of its 1.2 billion citizens as part of its Unique Identification Project *** European Union’s Biometric Matching System (EU-BMS) is expected to hold biometric information of 70 Million people to support visa applications, border control, and immigration **** AllTrust Networks Paycheck Secure system has enrolled over 6 Million users and has performed over 70 Million transactions***** 13 Growth of Biometric Databases *     US-VISIT: The world’s largest biometric application. William Graves. **     http://www.fbi.gov/hq/cjisd/iafis/iafis_facts.htm ***   http://www.business-standard.com/india/news/national-population-register-to-start-biometrics-data-collectiondec/399135/ ****  http://www.findbiometrics.com/articles/i/5220/ ***** http://www.alltrustnetworks.com/News/6Million/tabid/378/Default.aspx
Biometric Databases are a Big Data Problem Large scale operations Searching and storing 100 Million to 1 Billion Identities Multiple biometric templates and raw files per identity for multimodal matching (Fingerprints, Faces, and Iris) Typically, new raw files and templates are stored after each Verification and Identification operation because the biometrics readings change over time Raw Images: (500M Identities x 16KB-300KB* x 10-20) = 1-2 PB Biometric Templates: (500M Identities x 256b-3KB** x 10-20) = 2-27 TB 15
Biometric Databases Must Perform Fuzzy Matching ,[object Object]
Most applications require low latency fuzzy match searches in order to be useful
The objects being searched for cannot be ordered effectively to speed up searches
Clustering techniques can be used to reduce the search space, but this only goes so far
Fuzzy match searches are expensive and typically a large number of objects need to be searched to find a match16
This session shows the application of Hadoop and a large-scale, low-latency distributed fuzzy matching database to Biometrics Background - what you need to know about Biometrics The Problem – Big Data and unordered fuzzy matching A Solution - Hadoop Applications for Biometrics Session Agenda
Hadoop and Biometric Databases HDFS as file storage for petabytes worth of images Redundancy Distribution Opens the doors to storing more and more raw images and at higher resolutions 18 ,[object Object]
MapReduce can be used for improving feature selection by analyzing the entire database to select features that are most effective in distinguishing identities
Easy to test and deploy new algorithms against all data at scale
N-to-N matching search (special type of Identification search) to cleanse database, find people trying to circumvent the system (Identity Fraud, etc)
Map Reduce can be used for batched searching where latency doesn’t matter
What about low latency searching…?,[object Object]
Fuzzy Table Architecture 20
Fuzzy Table: Bulk Data Processing Component The centroids from K-means clustering are used to create a “Bin classifier” that is used determine the best bins to search for a given key {Key, Value} records are stored as SequenceFiles in HDFS and the files are stored in such a way to spread these records across the cluster for optimal parallel searching MapReduce is used for all other bulk or batch data processing including: Re-encoding the raw files into Feature vectors Performing large-scale feature evaluation to improve clustering Batch fuzzy match searching 21 ,[object Object]
This makes searching faster because a only small subset of the data must be processed
This concept is based on work done in academia**Efficient Search and Retrieval in Biometric Databases by Amit Mhatre, Srinivas Palla, Sharat Chikkerur and Venu Govindaraju * Efficient fingerprint search based on database clustering. Manhua Liu, Xudong Jiang, Alex Chichung Kot
Bulk Clustering and Real-time Classification 22 This makes searching for keys faster because only a small subset of the entire dataset needs to be processed using fuzzy matching The classifier determines which Bins need to be searched in order to find the most likely matching keys
Fuzzy Table: Data Storage and Bins Bins are represented as directories in HDFS containing one or more chunk files (stored as SequenceFiles): /fuzzytable/_table_fingerprints/_bin_000001/_chunk_000001 Chunk files contain many {Key, Value} pairs and are a small multiple of the HDFS block size  Chunk files are distributed uniformly and randomly across the Data Servers in the cluster This ensures that the bins are striped across the cluster for optimal parallel searching Also, chunk files are replicated across the Data Servers using the replication mechanism in HDFS Data Servers only search through chunk files that reside locally and results are returned in real-time as soon as a match is found 23
Fuzzy Table: Low Latency Fuzzy Matching Component The low latency component consists of three main parts Client – submit queries for Keys and get back {Key, Value} pairs Master Server – serve metadata about which Data Servers host  which bins Data Servers – Actually perform fuzzy matching searches Data Servers perform fuzzy matching against Keys in order to find {Key, Value} records double score = fuzzyMatcher.match(key, storedRec.getKey()); if(score >= threshold)  	return storedRec; Fuzzy matching searches are performed in parallel across many Data Servers 24
Fuzzy Table Query 25
Fuzzy Table Query 26

Más contenido relacionado

La actualidad más candente

Case study on Usage of Biometrics (Cryptography)
Case study on Usage of Biometrics (Cryptography)Case study on Usage of Biometrics (Cryptography)
Case study on Usage of Biometrics (Cryptography)Bhargav Amin
 
Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...
Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...
Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...CSCJournals
 
Multimodal fusion of fingerprint and iris
Multimodal fusion of fingerprint and iris Multimodal fusion of fingerprint and iris
Multimodal fusion of fingerprint and iris Dr. Vinayak Bharadi
 
A Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human RecognitionA Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human RecognitionIOSR Journals
 
11.graphical password based hybrid authentication system for smart hand held ...
11.graphical password based hybrid authentication system for smart hand held ...11.graphical password based hybrid authentication system for smart hand held ...
11.graphical password based hybrid authentication system for smart hand held ...Alexander Decker
 
A comparative analysis of Iris data compression
A comparative analysis of Iris data compressionA comparative analysis of Iris data compression
A comparative analysis of Iris data compressionIJSRED
 
An Efficient Fingerprint Identification using Neural Network and BAT Algorithm
An Efficient Fingerprint Identification using Neural Network and BAT Algorithm An Efficient Fingerprint Identification using Neural Network and BAT Algorithm
An Efficient Fingerprint Identification using Neural Network and BAT Algorithm IJECEIAES
 
Multimodal Biometric Systems
Multimodal Biometric SystemsMultimodal Biometric Systems
Multimodal Biometric SystemsPiyush Mittal
 
Fingerprint combination for privacy protection
Fingerprint combination for privacy protectionFingerprint combination for privacy protection
Fingerprint combination for privacy protectionMigrant Systems
 
ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...
ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...
ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...IJCSEIT Journal
 
IRJET- Securing E-Medical Documents using QR Code
IRJET-  	  Securing E-Medical Documents using QR CodeIRJET-  	  Securing E-Medical Documents using QR Code
IRJET- Securing E-Medical Documents using QR CodeIRJET Journal
 
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
IRJET - PHISCAN : Phishing Detector Plugin using Machine LearningIRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
IRJET - PHISCAN : Phishing Detector Plugin using Machine LearningIRJET Journal
 
A Survey: Data Leakage Detection Techniques
A Survey: Data Leakage Detection Techniques A Survey: Data Leakage Detection Techniques
A Survey: Data Leakage Detection Techniques IJECEIAES
 
Profile Identification through Face Recognition
Profile Identification through Face RecognitionProfile Identification through Face Recognition
Profile Identification through Face Recognitionijtsrd
 
A survey paper on various biometric security system methods
A survey paper on various biometric security system methodsA survey paper on various biometric security system methods
A survey paper on various biometric security system methodsIRJET Journal
 
MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...
MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...
MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...ijcisjournal
 
Security and privacy preserving challenges of e-health solutions in cloud com...
Security and privacy preserving challenges of e-health solutions in cloud com...Security and privacy preserving challenges of e-health solutions in cloud com...
Security and privacy preserving challenges of e-health solutions in cloud com...Venkat Projects
 

La actualidad más candente (20)

Case study on Usage of Biometrics (Cryptography)
Case study on Usage of Biometrics (Cryptography)Case study on Usage of Biometrics (Cryptography)
Case study on Usage of Biometrics (Cryptography)
 
Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...
Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...
Biometric Template Protection With Robust Semi – Blind Watermarking Using Ima...
 
Multimodal fusion of fingerprint and iris
Multimodal fusion of fingerprint and iris Multimodal fusion of fingerprint and iris
Multimodal fusion of fingerprint and iris
 
R01754129132
R01754129132R01754129132
R01754129132
 
A Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human RecognitionA Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human Recognition
 
11.graphical password based hybrid authentication system for smart hand held ...
11.graphical password based hybrid authentication system for smart hand held ...11.graphical password based hybrid authentication system for smart hand held ...
11.graphical password based hybrid authentication system for smart hand held ...
 
A comparative analysis of Iris data compression
A comparative analysis of Iris data compressionA comparative analysis of Iris data compression
A comparative analysis of Iris data compression
 
An Efficient Fingerprint Identification using Neural Network and BAT Algorithm
An Efficient Fingerprint Identification using Neural Network and BAT Algorithm An Efficient Fingerprint Identification using Neural Network and BAT Algorithm
An Efficient Fingerprint Identification using Neural Network and BAT Algorithm
 
Multimodal Biometric Systems
Multimodal Biometric SystemsMultimodal Biometric Systems
Multimodal Biometric Systems
 
Fingerprint combination for privacy protection
Fingerprint combination for privacy protectionFingerprint combination for privacy protection
Fingerprint combination for privacy protection
 
ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...
ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...
ADAPTABLE FINGERPRINT MINUTIAE EXTRACTION ALGORITHM BASED-ON CROSSING NUMBER ...
 
IRJET- Securing E-Medical Documents using QR Code
IRJET-  	  Securing E-Medical Documents using QR CodeIRJET-  	  Securing E-Medical Documents using QR Code
IRJET- Securing E-Medical Documents using QR Code
 
Ko3618101814
Ko3618101814Ko3618101814
Ko3618101814
 
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
IRJET - PHISCAN : Phishing Detector Plugin using Machine LearningIRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
 
A Survey: Data Leakage Detection Techniques
A Survey: Data Leakage Detection Techniques A Survey: Data Leakage Detection Techniques
A Survey: Data Leakage Detection Techniques
 
Profile Identification through Face Recognition
Profile Identification through Face RecognitionProfile Identification through Face Recognition
Profile Identification through Face Recognition
 
Keystroke dynamics
Keystroke dynamicsKeystroke dynamics
Keystroke dynamics
 
A survey paper on various biometric security system methods
A survey paper on various biometric security system methodsA survey paper on various biometric security system methods
A survey paper on various biometric security system methods
 
MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...
MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...
MULTIMODAL BIOMETRIC AUTHENTICATION: SECURED ENCRYPTION OF IRIS USING FINGERP...
 
Security and privacy preserving challenges of e-health solutions in cloud com...
Security and privacy preserving challenges of e-health solutions in cloud com...Security and privacy preserving challenges of e-health solutions in cloud com...
Security and privacy preserving challenges of e-health solutions in cloud com...
 

Similar a Biometric Databases and Hadoop__HadoopSummit2010

Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Hadoop World 2010 - BAH - Fuzzy Table
Hadoop World 2010 - BAH - Fuzzy TableHadoop World 2010 - BAH - Fuzzy Table
Hadoop World 2010 - BAH - Fuzzy TableCloudera, Inc.
 
Big data
Big dataBig data
Big dataCisco
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxParvathyparu25
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptxayush309565
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Li charles biometrics analytics & big data 122013a for release
Li charles    biometrics analytics & big data 122013a for releaseLi charles    biometrics analytics & big data 122013a for release
Li charles biometrics analytics & big data 122013a for releaseCharles Li
 
Identity Assertion, Emerging Trends,Identity Service in the Cloud
Identity Assertion, Emerging Trends,Identity Service in the CloudIdentity Assertion, Emerging Trends,Identity Service in the Cloud
Identity Assertion, Emerging Trends,Identity Service in the CloudCharles Li
 
Security Issues Related to Biometrics
Security Issues Related to BiometricsSecurity Issues Related to Biometrics
Security Issues Related to BiometricsYogeshIJTSRD
 

Similar a Biometric Databases and Hadoop__HadoopSummit2010 (20)

Bigdata
Bigdata Bigdata
Bigdata
 
Beekman5 std ppt_08
Beekman5 std ppt_08Beekman5 std ppt_08
Beekman5 std ppt_08
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Unit 2
Unit 2Unit 2
Unit 2
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Hadoop World 2010 - BAH - Fuzzy Table
Hadoop World 2010 - BAH - Fuzzy TableHadoop World 2010 - BAH - Fuzzy Table
Hadoop World 2010 - BAH - Fuzzy Table
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Big data
Big dataBig data
Big data
 
Big data mining
Big data miningBig data mining
Big data mining
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Li charles biometrics analytics & big data 122013a for release
Li charles    biometrics analytics & big data 122013a for releaseLi charles    biometrics analytics & big data 122013a for release
Li charles biometrics analytics & big data 122013a for release
 
Unit 1
Unit 1Unit 1
Unit 1
 
Identity Assertion, Emerging Trends,Identity Service in the Cloud
Identity Assertion, Emerging Trends,Identity Service in the CloudIdentity Assertion, Emerging Trends,Identity Service in the Cloud
Identity Assertion, Emerging Trends,Identity Service in the Cloud
 
Security Issues Related to Biometrics
Security Issues Related to BiometricsSecurity Issues Related to Biometrics
Security Issues Related to Biometrics
 

Más de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Más de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Biometric Databases and Hadoop__HadoopSummit2010

  • 1. Hadoop for Large-scale Biometric Databases Jason Trost Cloud Computing Team Booz | Allen | Hamilton
  • 2. This session shows the application of Hadoop and a large-scale, low-latency distributed fuzzy matching database to Biometrics Background - what you need to know about Biometrics The Problem – Big Data and unordered fuzzy matching A Solution - Hadoop Applications for Biometrics Session Agenda
  • 3. Key Takeaways from this Session Searching large-scale Biometric Databases is a hard problem Hadoop is a potential solution to this problem Hadoop is a great platform for solving all sorts of Big Data and distributed computing problems, even low latency searching 3
  • 4. 4 Introduction to Biometrics Iris Face Fingerprint Biometrics: The science of establishing the identity of an individual based on the physical, chemical, or behavioral attributes of the person * Modality: Physical or behavioral characteristics of an individual used to establish identity* Template: A symbolic or numeric representation of a modality optimized for storage and/or matching Palm Print Gait Hand Geometry Signature Ear Voice Keystroke Pattern Facial Thermogram Vein Pattern * Handbook of Biometrics. A. Jain, P. Flynn A. Ross.
  • 5.
  • 6. It has many useful applications where establishing identity is important
  • 7. Banks and Financial Services companies are using biometrics to prevent banking and identity fraud
  • 8.
  • 9. Enrollment: Adding New Identities and Biometrics Data to the Database Collect biographic information from an individual such as name, address, SSN, etc Capture biometric data in raw form (e.g. high resolution images) Transform raw biometric data into encoded biometric template (feature vector) Store all this information in the biometrics database 7
  • 10. Verification: One-to-one Matching Lookup the biometric template for a particular individual Verify that the stored template and the recently captured template match Fuzzy matching is used for matching the biometric templates 8
  • 11. Identification: One-to-Many Searching Capture some number of raw Biometric features, convert them into Biometric templates Perform fuzzy matching against large number of stored biometric templates to determine the identity If latency is not an issue, this is relatively straightforward, especially in MapReduce This is a hard problem for low latency applications and increasing in complexity as the size of these databases grow There is a speed/accuracy tradeoff The search space can be reduced using clustering techniques, but this only goes so far 9
  • 12. What is Fuzzy matching? Fuzzy matching is an operation performed on two objects that determines how similar the objects are to each other Typically this operation produces a numeric similarity score Necessary when data collected from sensor is noisy, and matching needs to be very accurate Almost all biometric matching algorithms perform some sort of fuzzy matching: Elastic Bunch Graph Matching – face recognition algorithm BOZORTH3 - minutiae based fingerprint matching algorithm IrisCode - iris matching algorithm Other Examples: Image comparison Audio comparison Video comparison 10
  • 13. Why Fuzzy Matching? Biometric data is inherently noisy and dirty Conditions are not exactly the same when the original biometric data was captured (Enrollment) and when a new reading occurs (Identification) Different types of cameras and sensors made by different companies Partial or smudged fingerprints (e.g. crime scene) Changes in skin tone, facial hair, makeup Different lighting conditions Aging and skin damage Weight gain, Weight loss Injury Derived from http://www.flickr.com/photos/glennji/3558118429/. Licensed under Creative Commons 11
  • 14. Existing Large-scale Biometric Databases US Visitor & Immigrant Status Indicator Technology (US-VISIT)* International travelers’ biometrics (fingerprint and face) Collected at US ports of entry, Immigration Services, and State Department Used to support the Department of Homeland Security's mission FBI Integrated Automated Fingerprint Identification System, (IAFIS)** Used to solve and prevent crime and catch criminals and terrorists Includes fingerprints, criminal histories, mug shots, scars and tattoo photos, physical characteristics like height, weight, and hair and eye color, and aliases AllTrust Networks Paycheck Secure System Uses fingerprints to support secure check cashing Designed to stop fraud and speed check cashing Plus many more 12 * One Team, One Mission, Securing our Homeland. US DHS. ** http://www.fbi.gov/hq/cjisd/iafis/iafis_facts.htm *** http://www.alltrustnetworks.com/News/6Million/tabid/378/Default.aspx
  • 15. This session shows the application of Hadoop and a large-scale, low-latency distributed fuzzy matching database to Biometrics Background - what you need to know about Biometrics The Problem – Big Data and unordered fuzzy matching A Solution - Hadoop Applications for Biometrics Session Agenda
  • 16. Combined U.S. government biometric databases are expected to grow to hold billions of identities The DHS’s US-VISIT program has the world’s largest and fastest biometric database (called IDENT) with over 110 million identities and roughly 145,000 identities enrolled or verified daily* From the FBI’s Integrated Automated Fingerprint Identification System (IAFIS) alone, there are 66.5 million identities with 8,000-10,000 more subjects added each day ** India is reportedly creating a biometric database to hold the fingerprints and face images for each of its 1.2 billion citizens as part of its Unique Identification Project *** European Union’s Biometric Matching System (EU-BMS) is expected to hold biometric information of 70 Million people to support visa applications, border control, and immigration **** AllTrust Networks Paycheck Secure system has enrolled over 6 Million users and has performed over 70 Million transactions***** 13 Growth of Biometric Databases * US-VISIT: The world’s largest biometric application. William Graves. ** http://www.fbi.gov/hq/cjisd/iafis/iafis_facts.htm *** http://www.business-standard.com/india/news/national-population-register-to-start-biometrics-data-collectiondec/399135/ **** http://www.findbiometrics.com/articles/i/5220/ ***** http://www.alltrustnetworks.com/News/6Million/tabid/378/Default.aspx
  • 17. Biometric Databases are a Big Data Problem Large scale operations Searching and storing 100 Million to 1 Billion Identities Multiple biometric templates and raw files per identity for multimodal matching (Fingerprints, Faces, and Iris) Typically, new raw files and templates are stored after each Verification and Identification operation because the biometrics readings change over time Raw Images: (500M Identities x 16KB-300KB* x 10-20) = 1-2 PB Biometric Templates: (500M Identities x 256b-3KB** x 10-20) = 2-27 TB 15
  • 18.
  • 19. Most applications require low latency fuzzy match searches in order to be useful
  • 20. The objects being searched for cannot be ordered effectively to speed up searches
  • 21. Clustering techniques can be used to reduce the search space, but this only goes so far
  • 22. Fuzzy match searches are expensive and typically a large number of objects need to be searched to find a match16
  • 23. This session shows the application of Hadoop and a large-scale, low-latency distributed fuzzy matching database to Biometrics Background - what you need to know about Biometrics The Problem – Big Data and unordered fuzzy matching A Solution - Hadoop Applications for Biometrics Session Agenda
  • 24.
  • 25. MapReduce can be used for improving feature selection by analyzing the entire database to select features that are most effective in distinguishing identities
  • 26. Easy to test and deploy new algorithms against all data at scale
  • 27. N-to-N matching search (special type of Identification search) to cleanse database, find people trying to circumvent the system (Identity Fraud, etc)
  • 28. Map Reduce can be used for batched searching where latency doesn’t matter
  • 29.
  • 31.
  • 32. This makes searching faster because a only small subset of the data must be processed
  • 33. This concept is based on work done in academia**Efficient Search and Retrieval in Biometric Databases by Amit Mhatre, Srinivas Palla, Sharat Chikkerur and Venu Govindaraju * Efficient fingerprint search based on database clustering. Manhua Liu, Xudong Jiang, Alex Chichung Kot
  • 34. Bulk Clustering and Real-time Classification 22 This makes searching for keys faster because only a small subset of the entire dataset needs to be processed using fuzzy matching The classifier determines which Bins need to be searched in order to find the most likely matching keys
  • 35. Fuzzy Table: Data Storage and Bins Bins are represented as directories in HDFS containing one or more chunk files (stored as SequenceFiles): /fuzzytable/_table_fingerprints/_bin_000001/_chunk_000001 Chunk files contain many {Key, Value} pairs and are a small multiple of the HDFS block size Chunk files are distributed uniformly and randomly across the Data Servers in the cluster This ensures that the bins are striped across the cluster for optimal parallel searching Also, chunk files are replicated across the Data Servers using the replication mechanism in HDFS Data Servers only search through chunk files that reside locally and results are returned in real-time as soon as a match is found 23
  • 36. Fuzzy Table: Low Latency Fuzzy Matching Component The low latency component consists of three main parts Client – submit queries for Keys and get back {Key, Value} pairs Master Server – serve metadata about which Data Servers host which bins Data Servers – Actually perform fuzzy matching searches Data Servers perform fuzzy matching against Keys in order to find {Key, Value} records double score = fuzzyMatcher.match(key, storedRec.getKey()); if(score >= threshold) return storedRec; Fuzzy matching searches are performed in parallel across many Data Servers 24
  • 45. Future Work Fuzzy Table is still a research prototype, but we plan to keep building it out to support this biometrics work Locality Sensitive Hashing instead of K-means clustering for binning and search space reduction Distributed/Replicated master servers (and Zookeeper integration) Real-time ingest Hopefully we will have performance/scalability metrics as well as more features and example applications to share within the next few months 33
  • 46. Conclusion Searching large-scale Biometric Databases is a hard problem Hadoop is a potential solution to this problem We used MapReduce for bulk processing to enable distributed low latency fuzzy matching over HDFS Hadoop is a great platform for solving all sorts of Big Data and distributed computing problems, even for low latency searching 34
  • 47. Contributors Cloud Computing Team Jason Trost Lalit Kapoor Daniel Neuberger Michael Beck Edmond Kohlwey Josh Sullivan Identity Management/Biometrics Team Abel Sussman Eric Karlinsky Deanna Walters Joel Rader Allen Wight 35
  • 49. Contact Information – Cloud Computing Team 37 Joshua Sullivan Senior Associate Lalit Kapoor Senior Consultant Michael Beck Senior Consultant Daniel Neuberger Senior Consultant Jason Trost Associate Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)543-4611 sullivan_joshua@bah.com Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)821-8000 kapoor_lalit@bah.com Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)821-8000 kapoor_lalit@bah.com Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)821-8000 kapoor_lalit@bah.com Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)543-4400 trost_jason@bah.com Edmund Kohlwey Consultant Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)617-3523 kohlwey_edmund@bah.com
  • 50. Contact Information – Identity Management Team 38 Joel Rader Identity Analyst Eric Karlinsky Identity Analyst Deanna Walters Biometrics Analyst Allen Wight Biometrics Analyst Booz Allen Hamilton Inc. 13200 Woodland Park Rd Herndon, VA 20171 (703) 984-0312 rader_joel@bah.com Booz Allen Hamilton Inc. 13200 Woodland Park Rd. Herndon, VA 20171 (703) 984-3532 Karlinsky_eric@bah.com Booz Allen Hamilton Inc. 13200 Woodland Park Rd Herndon, VA 20171 (703) 984-1982 walters_deanna@bah.com Booz Allen Hamilton Inc. 13200 Woodland Park Rd Herndon, VA 20171 (703) 984-1978 wight_allen@bah.com Abel Sussman Biometrics Subject Matter Expert Booz Allen Hamilton Inc. 13200 Woodland Park Rd. Herndon, VA 20171 (703) 984-7663 sussman_abel@bah.com