This document discusses web filtering techniques. It describes content filtering as analyzing email and web content to block malware, spam, and sensitive intellectual property. Basic filtering models include simple networks that forward traffic or proxied networks that alter traffic through an application. Common filtering techniques are firewalls, URL filtering, and content analysis using keyword and image scanning. Email filtering also uses heuristic and Bayesian filters. The document notes issues with over-blocking allowed content and under-blocking prohibited content, and lists popular free content filtering tools like DansGuardian, K9, OpenDNS, and editing hosts files.
4. Content Filtering
Analysis of Email and Web content
Prevent malware and spam from
entering the network
Determine whether incoming data
is harmful to the network or
outgoing data includes intellectual
property
Such data is then blocked from
entering or leaving the network
Simple network layout
Network with traffic proxy
Web Security and Anti-Virus
5. Network Layout
Connection is made by forwarding the uninspected traffic
straight to the destination.
Network with Traffic Proxied
Traffic is received by an application which alters the traffic
in both the directions before sending to the destination.
6. Web Filtering Techniques
Firewalls
URL Based Filtering
Content Analysis
Firewalls
Basic level of web filtering
Inspects traffic to identify requested site
to make decision (allow or block)
Black Lists: Undesirable Web addresses
White Lists: Desirable Web addresses
7. URL Based Filtering
Database of web addresses
Database loaded onto proxy
servers, firewalls and other
network infrastructure devices
Supports granular blocking
Content Analysis
Keyword Scanning
Image Scanning
8. Keywords Scanning
Tag words: Positive or Negative
score
Block:
Sum (score) > Threshold
Image Scanning
Content Analysis
9. Email Filtering
Primary communication
channel
Need to control SPAM and virus
mails
Filtering Techniques
Channel/Response:
Sender perform task
Heuristic Filters: Score words
or phrases
Bayesian Filters: Mathematical
Probability
10. Circumventing Web Filtering Getting around them:
Traffic is not passed through the filter.
Getting through them:
Traffic is passed through the filter by obscuring
the address of content and/or the content
itself.
Type of Content Tested Accuracy Percentage
Content of an Adult Nature – direct URL access 87%
Content of an Adult Nature – keyword searches 81%
Content not of an Adult Nature – direct URL access 86%
Content not of an Adult Nature – keyword searches 69%
Image Searches 44%
Email Attachments 25%
RSS Feeds 48%
Catalog Searches 75%
Database Searches 88%
Internet Filtering Test
11. Problem with filtering
"It could be expected that allowed
content would be blocked. If all
pornographic content is to be
blocked, other content with a
resemblance in features will also be
blocked; e.g. Adult education,
medical information, erotic content
etc."
“All filters over block. All filters under block.
No filter is 100% accurate because no
one agrees on what being 100%
accurate is.”
Conclusion