SlideShare una empresa de Scribd logo
1 de 27
Deep Web and Digital 
Investigations 
Damir Delija 
Milano 2014 
1
What we will talk about 
• Web and “Deep Web” 
• Web and documents 
• Definitions 
• Technical issues 
• Forensic issues 
• I’m not an expert on deep or dark web 
• Discussion based on many sources and 
references
Inaccessible Web 
• Deep Web is a name for data inaccessible by 
regular search engines on the Internet 
• Deep Web sounds much better than 
inaccessible 
• Searchable / Accessible web is also called 
surface web 
• Dark web is part of www with illegal or 
immoral content 
• Dark web is not Deep Web it is part of it, but 
dark pages are on the surface web too
Inaccessible Resources 
• Inaccessible resources 
– it exists but we don’t know about it or it’s location 
– we can’t use it 
• It is an old problem 
– you have it, even in your own room 
• Is there any solution ? 
– idea from Gopher days, Veronica 
– it works well with static pages and data 
– abandoned in web days, becomes a source of tremendous 
power and wealth for Search Engines
Web and Internet and Documents 
• WWW is not the Internet ☺ 
– also full data or document space of each networked 
computer is not part of the Internet 
• WWW is hypertext document based structure 
– we have links among documents 
– a document is not necessarily a web page 
– documents must have a presentation ability to be visible 
through the web interface (transcription layer, often 
dynamicaly generated) 
– Links, web pages and documents can be static or 
dynamically generated 
– Dynamic documents are here because of volume of data 
(can’t be organised in static pages) 
Definitions are crucial in understandig deep and 
surface web
Volume of Data 
• For each document there is in average of 11 
copies in the system 
– enterprise measurements pre SAN calculation 
• Shows how document space expands rapidly 
• Even simple mail can cause data avalanches 
• From sourface web point of view ? 
• Mostly invisible 
• From Deep Web point of view ? 
• Data/documents copies are probably floating 
around, inaccessible to us
Web and Search Engines 
• Web can access material which is only 
referenced by a link and is not access 
protected 
• Today mostly we assumes search engine span 
equals web and Internet 
• To be effective search engines must have pre 
organised data to answer query 
• Enormous changing volume of collected data 
and propagation lag 
http://en.wikipedia.org/wiki/List_of_search_engines
Deep Resources 
• Deep Web depends on the method of how 
search engines acquire and store data 
• Web can be crawled or explored as link space 
• Hints are cache, proxy, protocol traffic 
• No clear boundary between deep resources 
and surface resources
Uncollectible Resources 
Deep Web Resources 
• Dynamic Web Pages 
– returns in response to a query or accessed only through a form 
• Unlinked Contents 
– Pages without any backlinks 
• Private Web 
– sites requiring registration and login (password-protected resources) 
• Limited Access web 
– Sites with captchas, no-cache pragma http headers 
• Scripted Pages 
– Page produced by javascript, Flash, AJAX etc 
• Non HTML contents 
– Multimedia files e.g. images or videos
Uncollectible Resources 
Documents and Disk Space 
• This comes close to e-discovery field 
• Is this part of Deep Web ? 
• Documents not in the web tree 
• accessible only by direct filesystem access 
• or by dedicated script effort 
• Files generally on the web servers and no-web 
servers machines 
– accessible only by direct filesystem access
Forgotten Data 
• From the security aspect, forgotten data is a 
very interesting part of Deep Web 
• What is forgotten data – maybe data without 
custodian ? 
• Verizon reported about big data breach from 
2008, 
– unknown data being part of data breach in 66% of 
incidents
Data Lifecycle 
• Data creation and circulation 
• How to find data and correlate it 
• Search engines 
• Proxies 
• Metadata, Logs , Feeds 
• Very interesting ideas in “Programming 
Collective Intelligence” By: Toby Segaran, 
O'Reilly Media, August 16, 2007
Hidden Data in Surface web ? 
• Web handles data available trough html and 
extensions 
• What about metadata and embedded data which 
is not accessible for search engines ?
Surface Web and Deep Issues 
• “Hidden Data in Internet Published Documents” 
– deep forensic impact 
• Specific data formats can have embedded 
elements which is not visible to search engine 
– like thumb views embeded in pictures 
– exif data in images 
– metadata in documents 
– stego
Idea of Treasure Island 
• What is not on the map is unknown 
• Hiden as treasure island 
• Idea of unexplored, uncharted with big gains .. 
• Because of size idea of Iceberg
Why Deep Web Exists ? 
• Why search engine fails? 
– Technology 
• Most of the web data is behind dynamically 
generated pages (web gateways) 
– Web crawler cannot reach them or data not announced 
– Can only be obtained if we have access to the system 
containing the information 
– Forms have to populated with values 
– understanding the semantic of the web gateway and 
data behind it
Measuring the Deep Web 
• How to measure – estimates are based on known 
examples 
• Try to generate pages based on known home pages 
and explore the link space, based on hop distances 
• First Attempt: Bergman (2000) 
– Size of surface web is around 19 TB 
– Size of Deep Web is around 7500 TB 
– Deep Web is nearly 400 times larger than the Surface Web 
• 2004 Mitesh classified the Deep Web more accurately 
– Most of the html forms are two hops from the home page
Deep Web Size 
Current Estimates 2014 
• Deep Web about 7500 Terabytes 
• Surface Web about 19 terabytes 
• Deep Web has between 400 and 550 times more 
public information than the Surface Web. 
• 95% of the Deep Web is publically accessible 
• More than 200,000 Deep Web sites currently exist. 
• 550 billion documents on Deep Web 
• 1 billion documents on Surface Web
History of Deep Web 
• Start: static html pages, web crawlers can easily 
reach, only few cgi-scripts 
• In mid-90’s: Introduction of dynamic pages, page 
generated as a result of a query or link access 
• In 1994: Jill Ellsworth used the term “Invisible 
Web” to refer to these websites. 
• In 2001, Bergman coined it as “Deep Web” 
• Dark web goes in parallel as crime start to spread 
over the Internet
Rough Timeline 
• 2001: Raghavan et al -> Hidden Web Exposure 
– domain specific human assisted crawler 
• 2002: Stumbleupon used Human Crawler 
– human crawlers can find relevant links that algorithmic crawlers miss. 
• 2003: Bergman introduced LexiBot 
– used for quantifying the Deep Web 
• 2004: Yahoo! Content Acquisition Program 
– paid inclusion for webmasters 
• 2005: Yahoo! Subscriptions 
– Yahoo started searching subcription only sites 
• 2005: Noulas et. al. -> Hidden Web Crawler 
– automatically generated meaningful queries to issue against search form 
• 2005: Google site map 
– Allows webmasters to inform search engines about urls on their websites that 
are available for crawling. 
– Web 2.0 infrastructure 
– Today Mobile device and Internet of things 
– each gadget can have (and has) web server for configuration
Forensic Issues
From Digital Forensic Viewpoint 
• Is there a way to carry out forensically sound 
actions on Deep Web ? 
• Can we apply standard digital forensic 
procedures and best practices ? 
• In both cases yes, 
– we are always limited in digital forensics, but that 
does not prevent reliable results
Web and Digital Forensic 
• Web is web ☺ 
• Web artifacts are web artifacts 
• The type of investigation determines how we 
handle web data 
– key element is: legal 
• Many possible scenarios and situations 
– follow the forensic principles and best practices as 
in any other situation 
– use scientific method 
– test and experiment to prove method
Deep Web and Forensic Tasks 
• How to prove access to Deep Web resources 
– same as ordinary resources, because it is mostly 
through browsers 
– advantage over blind Deep Web access since there 
are history, cache, log artifacts which shows which 
Deep Web resource was accessed 
• Deep Web artifacts 
– Mostly like any other web artifacts 
– Hidden Data in Internet Published Documents 
– Dark web as a specific subrange
Forensic Tools Issues 
• Forensics of specialised browsers and access tools 
– Thor / onion 
– Unusual browsers/accessing tools links, lynx, wget 
– Other browsers 12P Freenet 
• Key Question: Does our forensic framework 
support such tools? 
– Internet Evidence Finder 
– Encase 
– FTK 
– If not how to handle artifacts and data ? 
• What about mobile devices?
Conclusion and Questions 
• Challenging field 
• Size will grow with IPv6 take over and 
“Internet of things” concept 
• Cloud concept is important (size, acces, legal 
isuses) 
• Each new tehnology will add a new layer of 
invisibility eg. complexity 
• Size of available data simply force use of 
dynamic web pages
References 
Too many links ... 
• http://papergirls.wordpress.com/2008/10/07/timeline-deep- 
web 
• http://deepwebtechblog.com/federated-search-finds-content- 
that-google-can’t-reach-part-i-of-iii 
• http://deepwebtechblog.com/a-federated-search-primer- 
part-ii-of-iii 
• http://googleblog.blogspot.com/2008/07/we-knew-web- 
was-big.html 
• http://www.online-college-blog.com/features/100- 
useful-tips-and-tools-to-research-the-deep-web/

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Computer Forensics: You can run but you can't hide
Computer Forensics: You can run but you can't hideComputer Forensics: You can run but you can't hide
Computer Forensics: You can run but you can't hide
 
Digital forensics research: The next 10 years
Digital forensics research: The next 10 yearsDigital forensics research: The next 10 years
Digital forensics research: The next 10 years
 
Digital forensics ahmed emam
Digital forensics   ahmed emamDigital forensics   ahmed emam
Digital forensics ahmed emam
 
Cyber forensics 02 mit-2014
Cyber forensics 02 mit-2014Cyber forensics 02 mit-2014
Cyber forensics 02 mit-2014
 
Digital Forensic
Digital Forensic Digital Forensic
Digital Forensic
 
Cyber Incident Response & Digital Forensics Lecture
Cyber Incident Response & Digital Forensics LectureCyber Incident Response & Digital Forensics Lecture
Cyber Incident Response & Digital Forensics Lecture
 
Digital Forensics
Digital ForensicsDigital Forensics
Digital Forensics
 
Digital Evidence in Computer Forensic Investigations
Digital Evidence in Computer Forensic InvestigationsDigital Evidence in Computer Forensic Investigations
Digital Evidence in Computer Forensic Investigations
 
Sued or Suing: Introduction to Digital Forensics
Sued or Suing: Introduction to Digital ForensicsSued or Suing: Introduction to Digital Forensics
Sued or Suing: Introduction to Digital Forensics
 
Private Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic OpportunityPrivate Browsing: A Window of Forensic Opportunity
Private Browsing: A Window of Forensic Opportunity
 
DF Process Models
DF Process ModelsDF Process Models
DF Process Models
 
Digital investigation
Digital investigationDigital investigation
Digital investigation
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research Challenge
 
Computer Forensics – What Every Lawyer Needs to Know
Computer Forensics – What Every Lawyer Needs to KnowComputer Forensics – What Every Lawyer Needs to Know
Computer Forensics – What Every Lawyer Needs to Know
 
Computer forensics powerpoint presentation
Computer forensics powerpoint presentationComputer forensics powerpoint presentation
Computer forensics powerpoint presentation
 
Osint presentation nov 2019
Osint presentation nov 2019Osint presentation nov 2019
Osint presentation nov 2019
 
Role of a Forensic Investigator
Role of a Forensic InvestigatorRole of a Forensic Investigator
Role of a Forensic Investigator
 
Osint
OsintOsint
Osint
 
Digital forensics
Digital forensicsDigital forensics
Digital forensics
 
computer forensics
computer forensicscomputer forensics
computer forensics
 

Destacado

EdTech101 THW - Visual Media
EdTech101 THW - Visual MediaEdTech101 THW - Visual Media
EdTech101 THW - Visual Media
pamelalorrainee
 

Destacado (19)

Cis 2016 moč forenzičikih alata 1.1
Cis 2016 moč forenzičikih alata 1.1Cis 2016 moč forenzičikih alata 1.1
Cis 2016 moč forenzičikih alata 1.1
 
HTML5 and the dawn of rich mobile web applications
HTML5 and the dawn of rich mobile web applicationsHTML5 and the dawn of rich mobile web applications
HTML5 and the dawn of rich mobile web applications
 
Are Brands Fracking The Social Web?
Are Brands Fracking The Social Web?Are Brands Fracking The Social Web?
Are Brands Fracking The Social Web?
 
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
 
Managing Professional Information Overload (K12 Version)
Managing Professional Information Overload (K12 Version)Managing Professional Information Overload (K12 Version)
Managing Professional Information Overload (K12 Version)
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
4.5 mining the worldwideweb
4.5 mining the worldwideweb4.5 mining the worldwideweb
4.5 mining the worldwideweb
 
DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of Data
 
Web Aplication Vulnerabilities
Web Aplication Vulnerabilities Web Aplication Vulnerabilities
Web Aplication Vulnerabilities
 
Linkosophy
LinkosophyLinkosophy
Linkosophy
 
Modern Web Development
Modern Web DevelopmentModern Web Development
Modern Web Development
 
EdTech101 THW - Visual Media
EdTech101 THW - Visual MediaEdTech101 THW - Visual Media
EdTech101 THW - Visual Media
 
A Short History Of Media
A Short History Of MediaA Short History Of Media
A Short History Of Media
 
Media evolution
Media evolutionMedia evolution
Media evolution
 
Audio media
Audio mediaAudio media
Audio media
 
Deep web
Deep webDeep web
Deep web
 
visual media.ppt
visual media.pptvisual media.ppt
visual media.ppt
 
What You Need to Know About the Future of Wearable Technology
What You Need to Know About the Future of Wearable TechnologyWhat You Need to Know About the Future of Wearable Technology
What You Need to Know About the Future of Wearable Technology
 
ABOUT COLOR
ABOUT COLORABOUT COLOR
ABOUT COLOR
 

Similar a Deep Web and Digital Investigations

Deepak semantic web_iitd
Deepak semantic web_iitdDeepak semantic web_iitd
Deepak semantic web_iitd
Deepak Shevani
 
Deep Web Presentation April 25
Deep Web Presentation April 25Deep Web Presentation April 25
Deep Web Presentation April 25
nagold
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
NekoGato
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
animove
 

Similar a Deep Web and Digital Investigations (20)

Deep web Seminar
Deep web Seminar Deep web Seminar
Deep web Seminar
 
Deep Web
Deep WebDeep Web
Deep Web
 
Presentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxPresentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptx
 
Deep Web and TOR Browser
Deep Web and TOR BrowserDeep Web and TOR Browser
Deep Web and TOR Browser
 
Deepak semantic web_iitd
Deepak semantic web_iitdDeepak semantic web_iitd
Deepak semantic web_iitd
 
The Deep Web
The Deep WebThe Deep Web
The Deep Web
 
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) Web
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Deep Web Presentation April 25
Deep Web Presentation April 25Deep Web Presentation April 25
Deep Web Presentation April 25
 
Internet and its applications
Internet and its applicationsInternet and its applications
Internet and its applications
 
Internet and Its Applications
Internet and Its ApplicationsInternet and Its Applications
Internet and Its Applications
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
 
The Next Web of Linked Data -- University of St Thomas SEIS 708
The Next Web of Linked Data -- University of St Thomas SEIS 708The Next Web of Linked Data -- University of St Thomas SEIS 708
The Next Web of Linked Data -- University of St Thomas SEIS 708
 
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
 
Why librarians must use social media
Why librarians must use social mediaWhy librarians must use social media
Why librarians must use social media
 

Más de Damir Delija

Olaf extension td3 inisg2 2
Olaf extension td3 inisg2 2Olaf extension td3 inisg2 2
Olaf extension td3 inisg2 2
Damir Delija
 
Moguće tehnike pristupa forenzckim podacima 09.2013
Moguće tehnike pristupa forenzckim podacima 09.2013 Moguće tehnike pristupa forenzckim podacima 09.2013
Moguće tehnike pristupa forenzckim podacima 09.2013
Damir Delija
 
Cis 2013 digitalna forenzika osvrt
Cis 2013 digitalna forenzika osvrt  Cis 2013 digitalna forenzika osvrt
Cis 2013 digitalna forenzika osvrt
Damir Delija
 
Tip zlocina digitalni dokazi
Tip zlocina digitalni dokaziTip zlocina digitalni dokazi
Tip zlocina digitalni dokazi
Damir Delija
 
Sigurnost i upravljanje distribuiranim sustavima
Sigurnost i upravljanje distribuiranim sustavimaSigurnost i upravljanje distribuiranim sustavima
Sigurnost i upravljanje distribuiranim sustavima
Damir Delija
 
Improving data confidentiality in personal computer environment using on line...
Improving data confidentiality in personal computer environment using on line...Improving data confidentiality in personal computer environment using on line...
Improving data confidentiality in personal computer environment using on line...
Damir Delija
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...
Damir Delija
 

Más de Damir Delija (20)

6414 preparation and planning of the development of a proficiency test in the...
6414 preparation and planning of the development of a proficiency test in the...6414 preparation and planning of the development of a proficiency test in the...
6414 preparation and planning of the development of a proficiency test in the...
 
6528 opensource intelligence as the new introduction in the graduate cybersec...
6528 opensource intelligence as the new introduction in the graduate cybersec...6528 opensource intelligence as the new introduction in the graduate cybersec...
6528 opensource intelligence as the new introduction in the graduate cybersec...
 
Uvođenje novih sadržaja u nastavu digitalne forenzike i kibernetičke sigurnos...
Uvođenje novih sadržaja u nastavu digitalne forenzike i kibernetičke sigurnos...Uvođenje novih sadržaja u nastavu digitalne forenzike i kibernetičke sigurnos...
Uvođenje novih sadržaja u nastavu digitalne forenzike i kibernetičke sigurnos...
 
Ecase direct servlet acess v1
Ecase direct servlet acess  v1Ecase direct servlet acess  v1
Ecase direct servlet acess v1
 
Concepts and Methodology in Mobile Devices Digital Forensics Education and Tr...
Concepts and Methodology in Mobile Devices Digital Forensics Education and Tr...Concepts and Methodology in Mobile Devices Digital Forensics Education and Tr...
Concepts and Methodology in Mobile Devices Digital Forensics Education and Tr...
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
 
EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection
 
Ocr and EnCase
Ocr and EnCaseOcr and EnCase
Ocr and EnCase
 
Olaf extension td3 inisg2 2
Olaf extension td3 inisg2 2Olaf extension td3 inisg2 2
Olaf extension td3 inisg2 2
 
LTEC 2013 - EnCase v7.08.01 presentation
LTEC 2013 - EnCase v7.08.01 presentation LTEC 2013 - EnCase v7.08.01 presentation
LTEC 2013 - EnCase v7.08.01 presentation
 
Moguće tehnike pristupa forenzckim podacima 09.2013
Moguće tehnike pristupa forenzckim podacima 09.2013 Moguće tehnike pristupa forenzckim podacima 09.2013
Moguće tehnike pristupa forenzckim podacima 09.2013
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
 
Cis 2013 digitalna forenzika osvrt
Cis 2013 digitalna forenzika osvrt  Cis 2013 digitalna forenzika osvrt
Cis 2013 digitalna forenzika osvrt
 
Ibm aix wlm idea
Ibm aix wlm ideaIbm aix wlm idea
Ibm aix wlm idea
 
Aix workload manager
Aix workload managerAix workload manager
Aix workload manager
 
2013 obrada digitalnih dokaza
2013 obrada digitalnih dokaza 2013 obrada digitalnih dokaza
2013 obrada digitalnih dokaza
 
Tip zlocina digitalni dokazi
Tip zlocina digitalni dokaziTip zlocina digitalni dokazi
Tip zlocina digitalni dokazi
 
Sigurnost i upravljanje distribuiranim sustavima
Sigurnost i upravljanje distribuiranim sustavimaSigurnost i upravljanje distribuiranim sustavima
Sigurnost i upravljanje distribuiranim sustavima
 
Improving data confidentiality in personal computer environment using on line...
Improving data confidentiality in personal computer environment using on line...Improving data confidentiality in personal computer environment using on line...
Improving data confidentiality in personal computer environment using on line...
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Deep Web and Digital Investigations

  • 1. Deep Web and Digital Investigations Damir Delija Milano 2014 1
  • 2. What we will talk about • Web and “Deep Web” • Web and documents • Definitions • Technical issues • Forensic issues • I’m not an expert on deep or dark web • Discussion based on many sources and references
  • 3. Inaccessible Web • Deep Web is a name for data inaccessible by regular search engines on the Internet • Deep Web sounds much better than inaccessible • Searchable / Accessible web is also called surface web • Dark web is part of www with illegal or immoral content • Dark web is not Deep Web it is part of it, but dark pages are on the surface web too
  • 4. Inaccessible Resources • Inaccessible resources – it exists but we don’t know about it or it’s location – we can’t use it • It is an old problem – you have it, even in your own room • Is there any solution ? – idea from Gopher days, Veronica – it works well with static pages and data – abandoned in web days, becomes a source of tremendous power and wealth for Search Engines
  • 5. Web and Internet and Documents • WWW is not the Internet ☺ – also full data or document space of each networked computer is not part of the Internet • WWW is hypertext document based structure – we have links among documents – a document is not necessarily a web page – documents must have a presentation ability to be visible through the web interface (transcription layer, often dynamicaly generated) – Links, web pages and documents can be static or dynamically generated – Dynamic documents are here because of volume of data (can’t be organised in static pages) Definitions are crucial in understandig deep and surface web
  • 6. Volume of Data • For each document there is in average of 11 copies in the system – enterprise measurements pre SAN calculation • Shows how document space expands rapidly • Even simple mail can cause data avalanches • From sourface web point of view ? • Mostly invisible • From Deep Web point of view ? • Data/documents copies are probably floating around, inaccessible to us
  • 7. Web and Search Engines • Web can access material which is only referenced by a link and is not access protected • Today mostly we assumes search engine span equals web and Internet • To be effective search engines must have pre organised data to answer query • Enormous changing volume of collected data and propagation lag http://en.wikipedia.org/wiki/List_of_search_engines
  • 8. Deep Resources • Deep Web depends on the method of how search engines acquire and store data • Web can be crawled or explored as link space • Hints are cache, proxy, protocol traffic • No clear boundary between deep resources and surface resources
  • 9. Uncollectible Resources Deep Web Resources • Dynamic Web Pages – returns in response to a query or accessed only through a form • Unlinked Contents – Pages without any backlinks • Private Web – sites requiring registration and login (password-protected resources) • Limited Access web – Sites with captchas, no-cache pragma http headers • Scripted Pages – Page produced by javascript, Flash, AJAX etc • Non HTML contents – Multimedia files e.g. images or videos
  • 10. Uncollectible Resources Documents and Disk Space • This comes close to e-discovery field • Is this part of Deep Web ? • Documents not in the web tree • accessible only by direct filesystem access • or by dedicated script effort • Files generally on the web servers and no-web servers machines – accessible only by direct filesystem access
  • 11. Forgotten Data • From the security aspect, forgotten data is a very interesting part of Deep Web • What is forgotten data – maybe data without custodian ? • Verizon reported about big data breach from 2008, – unknown data being part of data breach in 66% of incidents
  • 12. Data Lifecycle • Data creation and circulation • How to find data and correlate it • Search engines • Proxies • Metadata, Logs , Feeds • Very interesting ideas in “Programming Collective Intelligence” By: Toby Segaran, O'Reilly Media, August 16, 2007
  • 13. Hidden Data in Surface web ? • Web handles data available trough html and extensions • What about metadata and embedded data which is not accessible for search engines ?
  • 14. Surface Web and Deep Issues • “Hidden Data in Internet Published Documents” – deep forensic impact • Specific data formats can have embedded elements which is not visible to search engine – like thumb views embeded in pictures – exif data in images – metadata in documents – stego
  • 15. Idea of Treasure Island • What is not on the map is unknown • Hiden as treasure island • Idea of unexplored, uncharted with big gains .. • Because of size idea of Iceberg
  • 16. Why Deep Web Exists ? • Why search engine fails? – Technology • Most of the web data is behind dynamically generated pages (web gateways) – Web crawler cannot reach them or data not announced – Can only be obtained if we have access to the system containing the information – Forms have to populated with values – understanding the semantic of the web gateway and data behind it
  • 17. Measuring the Deep Web • How to measure – estimates are based on known examples • Try to generate pages based on known home pages and explore the link space, based on hop distances • First Attempt: Bergman (2000) – Size of surface web is around 19 TB – Size of Deep Web is around 7500 TB – Deep Web is nearly 400 times larger than the Surface Web • 2004 Mitesh classified the Deep Web more accurately – Most of the html forms are two hops from the home page
  • 18. Deep Web Size Current Estimates 2014 • Deep Web about 7500 Terabytes • Surface Web about 19 terabytes • Deep Web has between 400 and 550 times more public information than the Surface Web. • 95% of the Deep Web is publically accessible • More than 200,000 Deep Web sites currently exist. • 550 billion documents on Deep Web • 1 billion documents on Surface Web
  • 19. History of Deep Web • Start: static html pages, web crawlers can easily reach, only few cgi-scripts • In mid-90’s: Introduction of dynamic pages, page generated as a result of a query or link access • In 1994: Jill Ellsworth used the term “Invisible Web” to refer to these websites. • In 2001, Bergman coined it as “Deep Web” • Dark web goes in parallel as crime start to spread over the Internet
  • 20. Rough Timeline • 2001: Raghavan et al -> Hidden Web Exposure – domain specific human assisted crawler • 2002: Stumbleupon used Human Crawler – human crawlers can find relevant links that algorithmic crawlers miss. • 2003: Bergman introduced LexiBot – used for quantifying the Deep Web • 2004: Yahoo! Content Acquisition Program – paid inclusion for webmasters • 2005: Yahoo! Subscriptions – Yahoo started searching subcription only sites • 2005: Noulas et. al. -> Hidden Web Crawler – automatically generated meaningful queries to issue against search form • 2005: Google site map – Allows webmasters to inform search engines about urls on their websites that are available for crawling. – Web 2.0 infrastructure – Today Mobile device and Internet of things – each gadget can have (and has) web server for configuration
  • 22. From Digital Forensic Viewpoint • Is there a way to carry out forensically sound actions on Deep Web ? • Can we apply standard digital forensic procedures and best practices ? • In both cases yes, – we are always limited in digital forensics, but that does not prevent reliable results
  • 23. Web and Digital Forensic • Web is web ☺ • Web artifacts are web artifacts • The type of investigation determines how we handle web data – key element is: legal • Many possible scenarios and situations – follow the forensic principles and best practices as in any other situation – use scientific method – test and experiment to prove method
  • 24. Deep Web and Forensic Tasks • How to prove access to Deep Web resources – same as ordinary resources, because it is mostly through browsers – advantage over blind Deep Web access since there are history, cache, log artifacts which shows which Deep Web resource was accessed • Deep Web artifacts – Mostly like any other web artifacts – Hidden Data in Internet Published Documents – Dark web as a specific subrange
  • 25. Forensic Tools Issues • Forensics of specialised browsers and access tools – Thor / onion – Unusual browsers/accessing tools links, lynx, wget – Other browsers 12P Freenet • Key Question: Does our forensic framework support such tools? – Internet Evidence Finder – Encase – FTK – If not how to handle artifacts and data ? • What about mobile devices?
  • 26. Conclusion and Questions • Challenging field • Size will grow with IPv6 take over and “Internet of things” concept • Cloud concept is important (size, acces, legal isuses) • Each new tehnology will add a new layer of invisibility eg. complexity • Size of available data simply force use of dynamic web pages
  • 27. References Too many links ... • http://papergirls.wordpress.com/2008/10/07/timeline-deep- web • http://deepwebtechblog.com/federated-search-finds-content- that-google-can’t-reach-part-i-of-iii • http://deepwebtechblog.com/a-federated-search-primer- part-ii-of-iii • http://googleblog.blogspot.com/2008/07/we-knew-web- was-big.html • http://www.online-college-blog.com/features/100- useful-tips-and-tools-to-research-the-deep-web/