I was reading the Messing with Web Filtering Gateways post from GNU Citizen, and here are some comments / ideas:
- The problem is the impedance mismatch between the way the filtering software is parsing the headers and the way the webserver parses them. There will always be corner cases... For example, it would be interesting to see how filtering gateways / proxies / web servers react to the following type of requests (and what are the discrepancies between their behaviors):
GET / HTTP/1.1 Host: some.random.hostname.example.com Host: the.real.hostname ...
- As I recently found out the translate from English to English trick is now blocked by Google Translate (and also by Babelfish). They also blocked the "autodetect to English" version. Nice :-) Some alternatives are enumerated in the comments of the GNU Citizen post.
- I tried the Yahoo Pipes version. It works great, the only problem is that it obeys the robots.txt :-), which can block it from some sites.
- I also tried to use Google Translate twice (for exampel EN -> RO -> EN), but this seems to create an infinite recursion :-). I did the following: EN -> GE with Babelfish and then GE -> EN with Google Translate, but as you can guess, the results are not spectacular :-). Take a look here at the proxied Slashdot page.
Conclusion? Web filtering is useful, especially for keeping out really bad stuff (like malware). However, don't rely on it to solve your policy violation problems.