source: www/robots.txt @ 936

Last change on this file since 936 was 934, checked in by Sam Hocevar, 15 years ago
  • Added .htaccess and robots.txt skeleton. Also testing automatic updates.
  • Property svn:keywords set to Id
File size: 1.0 KB
Line 
1# $Id: robots.txt 934 2006-05-05 17:24:09Z sam $
2
3# Do not crawl CVS and .svn directories (they are 403 Forbidden anyway)
4User-agent: *
5Disallow: CVS
6Disallow: .svn
7
8# "This robot collects content from the Internet for the sole purpose of
9# helping educational institutions prevent plagiarism. [...] we compare
10# student papers against the content we find on the Internet to see if we
11# can find similarities." (http://www.turnitin.com/robot/crawlerinfo.html)
12#  --> fuck off.
13User-Agent: TurnitinBot
14Disallow: /
15
16# "NameProtect engages in crawling activity in search of a wide range of
17# brand and other intellectual property violations that may be of interest
18# to our clients." (http://www.nameprotect.com/botinfo.html)
19#  --> fuck off.
20User-Agent: NPBot
21Disallow: /
22
23# "iThenticate® is a new service we have developed to combat the piracy
24# of intellectual property and ensure the originality of written work for
25# publishers, non-profit agencies, corporations, and newspapers."
26# (http://www.slysearch.com/)
27#  --> fuck off.
28User-Agent: SlySearch
29Disallow: /
30
Note: See TracBrowser for help on using the repository browser.