The Information Security class I teach at the University of Wisconsin-Madison, is a broad survey course. To be honest, sometimes the material gets a bit dry. Therefore, tomorrow's lecture, which was supposed to be about secure network architecture, will now instead be about the Deep Web, the scary and mysterious part of the Internet, dedicated to spooky, nefarious and illegal activity. I think it is good to give the students a break from classic course material, and spend some time on this tangentially related topic. I am putting together a class discussion exercise to go along with it.
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Scary Halloween Lecture on Deep Web
1. Scary Halloween Lecture 365/765
The Deep Web—From Spooky to Creepy
Presented by Nicholas Davis, CISSP, CISA
2. This presentation contains explicit content,
which some people may find offensive.
The examples shown do not represent my
views or opinions, and are used for
demonstration only.
I do not endorse the use of the Deep Web
for unethical or illicit activities.
10/26/16 UNIVERSITY OF WISCONSIN 2
3. Session OverviewSession Overview
Introduction and Warning
The Deep Web Defined
Dynamic Content
Unlinked Content
Private Web
Contextual Web
Limited Access Content
Scripted Content
Non-HTML Content
Deep Web Search Engines & Tor Client
Examples of what can found on the Deep Web
Exciting Documentary Video
Question and Answer session
10/26/16 UNIVERSITY OF WISCONSIN 3
4. Some DefinitionsSome Definitions
Deep Web, Deep Net, Invisible Web, or
Hidden Web is not part of the Surface
Web (that which is normally accessed).
Do not confuse it with the Dark Internet,
which refers to computers which can no
longer be reached over the Internet
Some people think that the Deep Web is
a haven for serious criminality, and I
agree with them
10/26/16 UNIVERSITY OF WISCONSIN 4
5. Normal Web SearchNormal Web Search
vs. Deep Web Searchvs. Deep Web Search
Searching on the Internet today can be
compared to dragging a net across the
surface of the ocean: a great deal may be
caught in the net, but there is a wealth of
information that is deep and therefore
missed
10/26/16 UNIVERSITY OF WISCONSIN 5
6. Normal Web SearchNormal Web Search
vs. Deep Web Searchvs. Deep Web Search
Traditional search engines cannot see or
retrieve content in the deep Web—those
pages do not exist until they are created
dynamically as the result of a specific
search. As of 2001, the deep Web was
several orders of magnitude larger than
the surface Web
10/26/16 UNIVERSITY OF WISCONSIN 6
7. Deep Web SizeDeep Web Size
It is impossible to measure
or put estimates onto the
size of the deep web
because the majority of the
information is hidden or
locked inside databases.
Early estimates suggested
that the deep web is 4,000
to 5,000 times larger than
the surface web
10/26/16 UNIVERSITY OF WISCONSIN 7
8. Deep Web ResourcesDeep Web Resources
Dynamic ContentDynamic Content
Dynamic pages which are returned in
response to a submitted query or
accessed only through a form, especially
if open-domain input elements (such as
text fields) are used; such fields are hard
to navigate without domain knowledge.
10/26/16 UNIVERSITY OF WISCONSIN 8
9. Deep Web ResourcesDeep Web Resources
Unlinked ContentUnlinked Content
Unlinked content: pages which are not
linked to by other pages, which may
prevent Web crawling programs from
accessing the content. This content is
referred to as pages without backlinks
(or inlinks).
10/26/16 UNIVERSITY OF WISCONSIN 9
10. Deep Web ResourcesDeep Web Resources
Private WebPrivate Web
Private Web: sites that require
registration and login (password-
protected resources).
10/26/16 UNIVERSITY OF WISCONSIN 10
11. Deep Web ResourcesDeep Web Resources
Contextual WebContextual Web
Contextual Web:
pages with content
varying for different
access contexts (e.g.,
ranges of client IP
addresses or previous
navigation sequence).
10/26/16 UNIVERSITY OF WISCONSIN 11
12. Deep Web ResourcesDeep Web Resources
Limited Access ContentLimited Access Content
Limited access content: sites that limit
access to their pages in a technical way
(e.g., using the Robots Exclusion
Standard or CAPTCHAs, or no-store
directive which prohibit search engines
from browsing them and creating
cached copies
10/26/16 UNIVERSITY OF WISCONSIN 12
13. Deep Web ResourcesDeep Web Resources
Scripted ContentScripted Content
Scripted content: pages that are only
accessible through links produced by
JavaScript as well as content
dynamically downloaded from Web
servers via Flash or Ajax solutions.
10/26/16 UNIVERSITY OF WISCONSIN 13
14. Deep Web ResourcesDeep Web Resources
Non HTML ContentNon HTML Content
Non-HTML/text
content: textual
content encoded
in multimedia
(image or video)
files or specific
file formats not
handled by
search engines.
10/26/16 UNIVERSITY OF WISCONSIN 14
15. Accessing the Deep WebAccessing the Deep Web
While it is not always possible to
discover a specific web server's external
IP address, theoretically almost any site
can be accessed via its IP address,
regardless of whether or not it has been
indexed.
10/26/16 UNIVERSITY OF WISCONSIN 15
16. Accessing the Deep WebAccessing the Deep Web
Certain content is
intentionally hidden from
the regular internet,
accessible only with special
software, such as Tor. Tor
allows users to access
websites using the .onion
host suffix anonymously,
hiding their IP address.
Other such software includes
I2P and Freenet.
10/26/16 UNIVERSITY OF WISCONSIN 16
17. The Onion Router (Tor)The Onion Router (Tor)
Tool For the Deep WebTool For the Deep Web
Tor is software that installs into your
browser and sets up the specific
connections you need to access dark
Web sites. Critically, Tor is an encrypted
technology that helps people maintain
anonymity online. It does this in part by
routing connections through servers
around the world, making them much
harder to track.
10/26/16 UNIVERSITY OF WISCONSIN 17
18. Who Invented Tor?Who Invented Tor?
Oddly enough, Tor is the result of research
done by the U.S. Naval Research
Laboratory, which created Tor for political
dissidents and whistleblowers, allowing
them to communicate without fear of
reprisal.
10/26/16 UNIVERSITY OF WISCONSIN 18
19. Tor Client AvailableTor Client Available
For DownloadFor Download
10/26/16 UNIVERSITY OF WISCONSIN 19
20. Accessing the Deep WebAccessing the Deep Web
.onion.onion
.onion is a pseudo-top-level domain host
suffix designating an anonymous hidden
service reachable via the Tor network.
Such addresses are not actual DNS
names, and the .onion TLD is not in the
Internet DNS root, but with the
appropriate proxy software installed,
Internet programs such as Web
browsers can access sites with .onion
addresses by sending the request
through the network of Tor servers.
10/26/16 UNIVERSITY OF WISCONSIN 20
21. Accessing the Deep WebAccessing the Deep Web
Tor2web
10/26/16 UNIVERSITY OF WISCONSIN 21
22. What Deep Web LinksWhat Deep Web Links
Look LikeLook Like
Deep Web links
appear as a random
string of letters
followed by the .onion
TLD. For example,
http://xmh57jrzrnw6i
nsl followed by .onion,
links to TORCH, the
Tor search engine web
page.
10/26/16 UNIVERSITY OF WISCONSIN 22
23. Searching the Deep WebSearching the Deep Web
To discover content on the
Web, search engines use web
crawlers that follow
hyperlinks through known
protocol virtual port
numbers. This technique is
ideal for discovering
resources on the surface
Web but is often ineffective
at finding Deep Web
resources.
10/26/16 UNIVERSITY OF WISCONSIN 23
24. Give the People What TheyGive the People What They
Came Here For, Tonight!Came Here For, Tonight!
Just like general web search, searching
the Invisible Web is also about looking
for the needle in the haystack. Only
here, the haystack is much bigger. The
Invisible Web is definitely not for the
casual searcher. It is a deep but not dark
because if you know what you are
searching for, enlightenment is a few
keywords away.
10/26/16 UNIVERSITY OF WISCONSIN 24
25. Deep Web SearchDeep Web Search
EnginesEngines
10/26/16 UNIVERSITY OF WISCONSIN 25
26. Deep Web SearchDeep Web Search
infomineinfomine
http://infomine.ucr.edu/
Infomine has been built by a pool of
libraries in the United States. Some of them
are University of California, Wake Forest
University, California State University, and
the University of Detroit. Infomine ‘mines’
information from databases, electronic
journals, electronic books, bulletin boards,
mailing lists, online library card catalogs,
articles, directories of researchers, and
many other resources.
10/26/16 UNIVERSITY OF WISCONSIN 26
27. Deep Web SearchDeep Web Search
The WWW Virtual LibraryThe WWW Virtual Library
http://vlib.org/
This is considered to be the oldest
catalog on the web and was started by
started by Tim Berners-Lee, the creator
of the web. So, isn’t it strange that it
finds a place in the list of Invisible Web
resources? Maybe, but the WWW
Virtual Library lists quite a lot of
relevant resources on quite a lot of
subjects.
10/26/16 UNIVERSITY OF WISCONSIN 27
28. Deep Web SearchDeep Web Search
Complete PlanetComplete Planet
http://aip.completeplanet.com/
Complete Planet calls itself the ‘front door to
the Deep Web’. This free and well designed
directory resource makes it easy to access the
mass of dynamic databases that are cloaked
from a general purpose search. The databases
indexed by Complete Planet number around
70,000 and range from Agriculture to Weather.
Also thrown in are databases like Food & Drink
and Military.
For a really effective Deep Web search, try out
the Advanced Search options where among
other things, you can set a date range.
10/26/16 UNIVERSITY OF WISCONSIN 28
29. Deep Web SearchDeep Web Search
DeepPeepDeepPeep
http://www.deeppeep.org/
DeepPeep aims to enter the Invisible Web
through forms that query databases and web
services for information. Typed queries open
up dynamic but short lived results which
cannot be indexed by normal search engines.
By indexing databases, DeepPeep hopes to
track 45,000 forms across 7 domains.
The domains covered by DeepPeep (Beta) are
Auto, Airfare, Biology, Book, Hotel, Job, and
Rental. Being a beta service, there are
occasional glitches as some results don’t load in
the browser.
10/26/16 UNIVERSITY OF WISCONSIN 29
30. Deep Web SearchDeep Web Search
IncyWincyIncyWincy
http://www.incywincy.com/
IncyWincy is an Invisible Web search
engine and it behaves as a meta-search
engine by tapping into other search
engines and filtering the results. It
searches the web, directory, forms, and
images. With a free registration, you can
track search results with alerts.
10/26/16 UNIVERSITY OF WISCONSIN 30
31. Deep Web SearchDeep Web Search
DeepWebTechDeepWebTech
http://www.deepwebtech.com/
DeepWebTech gives you five search
engines (and browser plugins) for
specific topics. The search engines cover
science, medicine, and business. Using
these topic specific search engines, you
can query the underlying databases in
the Deep Web.
10/26/16 UNIVERSITY OF WISCONSIN 31
32. Deep Web SearchDeep Web Search
ScirusScirus
http://www.scirus.com/srsapp/
Scirus has a pure scientific focus. It is a
far reaching research engine that can
scour journals, scientists’ homepages,
courseware, pre-print server material,
patents and institutional intranets.
10/26/16 UNIVERSITY OF WISCONSIN 32
33. Deep Web SearchDeep Web Search
TechXtraTechXtra
http://www.techxtra.ac.uk/index.html
TechXtra concentrates on engineering,
mathematics and computing. It gives
you industry news, job announcements,
technical reports, technical data, full text
eprints, teaching and learning resources
along with articles and relevant website
information.
10/26/16 UNIVERSITY OF WISCONSIN 33
34. Bitcoin, The Currency of theBitcoin, The Currency of the
Deep WebDeep Web
• While not completely
anonymous, when
used correctly, it is
very difficult to track
down the true
owner/identity
• Not regulated by any
government or
corporate entity
10/26/16 UNIVERSITY OF WISCONSIN 34
35. Be Careful of What YouBe Careful of What You
Search For, You Might Just Find ItSearch For, You Might Just Find It
10/26/16 UNIVERSITY OF WISCONSIN 35
38. Deep Web, Dangerous WebDeep Web, Dangerous Web
SteganographySteganography
(ste-g&n-o´gr&-fē) (n.) The art and
science of hiding information by
embedding messages within other,
seemingly harmless messages
10/26/16 UNIVERSITY OF WISCONSIN 38