SlideShare a Scribd company logo
1 of 13
Download to read offline
How to search extracted data
© Copyright 2015 NowSecure, Inc.
Javier Collado
● It’s hard to decode data for each application with limited resources
○ There are a lot of applications
○ Each application version might change:
■ format (file type, database schema)
■ content (new and interesting data)
● Many applications store data in SQLite databases
Data extraction in mobile devices
© Copyright 2015 NowSecure, Inc.
● Libraries
○ Low level interface
○ Examples: lucene, xapian, whoosh
● Servers
○ High level interface
○ Examples: solr, elasticsearch, sphinx
Index and search
© Copyright 2015 NowSecure, Inc.
● Very flexible and permissive: each value has its own type
● Storage class: group of related datatypes (different lengths, encodings, …)
● Type affinity: preferred storage class for a column based on column type
● Not all the content should be indexed:
○ sqlite_master, sqlite_sequence
○ FTS tables
○ BLOBs
SQLite
© Copyright 2015 NowSecure, Inc.
sqlite> CREATE TABLE names (id INTEGER, name TEXT);
sqlite> INSERT INTO names VALUES (1, "Alice");
sqlite> INSERT INTO names VALUES ("Bob", 2);
sqlite> SELECT typeof(id), id, typeof(name), name FROM names;
integer|1|text|Alice
text|Bob|text|2
sqlite>
SQLite
© Copyright 2015 NowSecure, Inc.
sqlite> CREATE TABLE names (id INTEGER name TEXT);
sqlite> .schema names
CREATE TABLE names (id INTEGER name TEXT);
sqlite> INSERT INTO names VALUES (1, "Alice");
Error: table names has 1 columns but 2 values were supplied
SQLite
© Copyright 2015 NowSecure, Inc.
● Search server
● Document oriented (json)
● RESTful API
● Schema (mapping) not required, but needed to avoid errors due to SQLite flexibility
ElasticSearch
© Copyright 2015 NowSecure, Inc.
$ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: 1, name: "Alice"}'
{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_version":1,"created":true}
$ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: "Bob", name: 2}'
{"error":"MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: "
Bob"]; ","status":400}
$ curl -XGET 'http://localhost:9200/dfrws/_mapping/names'
{"dfrws":{"mappings":{"names":{"properties":{"id":{"type":"long"},"name":{"type":"string"}}}}}}
ElasticSearch
© Copyright 2015 NowSecure, Inc.
$ curl -XPOST 'http://localhost:9200/dfrws/_names' -d '{id: 1, name: "Alice"}'
{"error":"InvalidTypeNameException[mapping type name [_names] can't start with '_']","status":400}
$ curl -XGET 'http://localhost:9200/dfrws/names/_search' -d '{query: {match: {name: "Alice"}}}'
{"took":27,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":
0.30685282,"hits":[{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_score":0.30685282,"
_source":{id: 1, name: "Alice"}}]}}
ElasticSearch
© Copyright 2015 NowSecure, Inc.
● https://github.com/jcollado/esis
● Command line tool written in python
○ Ability to index every row in every table in every database file found under a given directory
○ Ability to search using simple queries
Example tool
© Copyright 2015 NowSecure, Inc.
● SQLite content can be indexed in elasticsearch but…
○ Types need to be consistent
○ Not relevant information needs to be discarded
Conclusions
© Copyright 2015 NowSecure, Inc.
● Index text information from other file types (Apache Tika)
● Regular expressions
● Highlight search results
● Search suggestions
● Language detection and custom analyzers
● Proximity matching (match vs. match_phrase)
Future work
© Copyright 2015 NowSecure, Inc.
© Copyright 2015 NowSecure, Inc.
Thanks

More Related Content

What's hot

SXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup LanguageSXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup Language
elliando dias
 
DBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmDBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pm
Sheeju Alex
 

What's hot (20)

SXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup LanguageSXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup Language
 
Object Storage
Object StorageObject Storage
Object Storage
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App Architecture
 
DBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmDBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pm
 
Introduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker NewsIntroduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker News
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Data-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in AccumuloData-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in Accumulo
 
Json Persistence Framework
Json Persistence FrameworkJson Persistence Framework
Json Persistence Framework
 
An Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteAn Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and Keynote
 
Mysql DBI
Mysql DBIMysql DBI
Mysql DBI
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
Php 2
Php 2Php 2
Php 2
 
Python Files
Python FilesPython Files
Python Files
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
PHP Training Session 6
PHP Training Session 6PHP Training Session 6
PHP Training Session 6
 
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
 

Viewers also liked (9)

Class Inventions New
Class Inventions NewClass Inventions New
Class Inventions New
 
Famiglia Lavoro
Famiglia LavoroFamiglia Lavoro
Famiglia Lavoro
 
Bdi Linkedin
Bdi LinkedinBdi Linkedin
Bdi Linkedin
 
Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0
 
Essere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggiEssere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggi
 
Heart diseases
Heart diseasesHeart diseases
Heart diseases
 
The gorgeous pearl
The gorgeous pearlThe gorgeous pearl
The gorgeous pearl
 
Pictures Of Zanzibar
Pictures Of ZanzibarPictures Of Zanzibar
Pictures Of Zanzibar
 
Zanzibar, The old days
Zanzibar, The old daysZanzibar, The old days
Zanzibar, The old days
 

Similar to How to search extracted data

Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
MapR Technologies
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
Olivier DASINI
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Similar to How to search extracted data (20)

Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
iOS & Drupal
iOS & DrupaliOS & Drupal
iOS & Drupal
 
Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
War of the Indices- SQL vs. Oracle
War of the Indices-  SQL vs. OracleWar of the Indices-  SQL vs. Oracle
War of the Indices- SQL vs. Oracle
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
Json to hive_schema_generator
Json to hive_schema_generatorJson to hive_schema_generator
Json to hive_schema_generator
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache sling
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
 
Discover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDiscover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQL
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

How to search extracted data

  • 1. How to search extracted data © Copyright 2015 NowSecure, Inc. Javier Collado
  • 2. ● It’s hard to decode data for each application with limited resources ○ There are a lot of applications ○ Each application version might change: ■ format (file type, database schema) ■ content (new and interesting data) ● Many applications store data in SQLite databases Data extraction in mobile devices © Copyright 2015 NowSecure, Inc.
  • 3. ● Libraries ○ Low level interface ○ Examples: lucene, xapian, whoosh ● Servers ○ High level interface ○ Examples: solr, elasticsearch, sphinx Index and search © Copyright 2015 NowSecure, Inc.
  • 4. ● Very flexible and permissive: each value has its own type ● Storage class: group of related datatypes (different lengths, encodings, …) ● Type affinity: preferred storage class for a column based on column type ● Not all the content should be indexed: ○ sqlite_master, sqlite_sequence ○ FTS tables ○ BLOBs SQLite © Copyright 2015 NowSecure, Inc.
  • 5. sqlite> CREATE TABLE names (id INTEGER, name TEXT); sqlite> INSERT INTO names VALUES (1, "Alice"); sqlite> INSERT INTO names VALUES ("Bob", 2); sqlite> SELECT typeof(id), id, typeof(name), name FROM names; integer|1|text|Alice text|Bob|text|2 sqlite> SQLite © Copyright 2015 NowSecure, Inc.
  • 6. sqlite> CREATE TABLE names (id INTEGER name TEXT); sqlite> .schema names CREATE TABLE names (id INTEGER name TEXT); sqlite> INSERT INTO names VALUES (1, "Alice"); Error: table names has 1 columns but 2 values were supplied SQLite © Copyright 2015 NowSecure, Inc.
  • 7. ● Search server ● Document oriented (json) ● RESTful API ● Schema (mapping) not required, but needed to avoid errors due to SQLite flexibility ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 8. $ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: 1, name: "Alice"}' {"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_version":1,"created":true} $ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: "Bob", name: 2}' {"error":"MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: " Bob"]; ","status":400} $ curl -XGET 'http://localhost:9200/dfrws/_mapping/names' {"dfrws":{"mappings":{"names":{"properties":{"id":{"type":"long"},"name":{"type":"string"}}}}}} ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 9. $ curl -XPOST 'http://localhost:9200/dfrws/_names' -d '{id: 1, name: "Alice"}' {"error":"InvalidTypeNameException[mapping type name [_names] can't start with '_']","status":400} $ curl -XGET 'http://localhost:9200/dfrws/names/_search' -d '{query: {match: {name: "Alice"}}}' {"took":27,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score": 0.30685282,"hits":[{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_score":0.30685282," _source":{id: 1, name: "Alice"}}]}} ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 10. ● https://github.com/jcollado/esis ● Command line tool written in python ○ Ability to index every row in every table in every database file found under a given directory ○ Ability to search using simple queries Example tool © Copyright 2015 NowSecure, Inc.
  • 11. ● SQLite content can be indexed in elasticsearch but… ○ Types need to be consistent ○ Not relevant information needs to be discarded Conclusions © Copyright 2015 NowSecure, Inc.
  • 12. ● Index text information from other file types (Apache Tika) ● Regular expressions ● Highlight search results ● Search suggestions ● Language detection and custom analyzers ● Proximity matching (match vs. match_phrase) Future work © Copyright 2015 NowSecure, Inc.
  • 13. © Copyright 2015 NowSecure, Inc. Thanks