source: web/trunk/www/robots.txt @ 4307

Last change on this file since 4307 was 4307, checked in by Sam Hocevar, 10 years ago

Tell search engine bots not to crawl our git mirrors.

  • Property svn:keywords set to Id
File size: 1.2 KB
Line 
1# $Id: robots.txt 4307 2010-01-29 22:18:58Z sam $
2
3# Do not crawl CVS and .svn directories (they are 403 Forbidden anyway)
4User-agent: *
5Disallow: CVS
6Disallow: .svn
7
8# Prevent excessive search engine hits
9Disallow: /cgi-bin/trac.cgi
10Disallow: /log
11
12# Don’t crawl git repos
13Disallow: /git/*.git/*
14Disallow: /git/*.git.broken/*
15
16# "This robot collects content from the Internet for the sole purpose of
17# helping educational institutions prevent plagiarism. [...] we compare
18# student papers against the content we find on the Internet to see if we
19# can find similarities." (http://www.turnitin.com/robot/crawlerinfo.html)
20#  --> fuck off.
21User-Agent: TurnitinBot
22Disallow: /
23
24# "NameProtect engages in crawling activity in search of a wide range of
25# brand and other intellectual property violations that may be of interest
26# to our clients." (http://www.nameprotect.com/botinfo.html)
27#  --> fuck off.
28User-Agent: NPBot
29Disallow: /
30
31# "iThenticate® is a new service we have developed to combat the piracy
32# of intellectual property and ensure the originality of written work for
33# publishers, non-profit agencies, corporations, and newspapers."
34# (http://www.slysearch.com/)
35#  --> fuck off.
36User-Agent: SlySearch
37Disallow: /
38
Note: See TracBrowser for help on using the repository browser.