source: web/trunk/www/robots.txt

Last change on this file was 4868, checked in by Sam Hocevar, 7 years ago

web: ask crawler bots to hit us less aggressively.

  • Property svn:keywords set to Id
File size: 1.2 KB
Line 
1# $Id: robots.txt 4868 2013-08-14 10:12:39Z sam $
2
3# Do not crawl CVS and .svn directories (they are 403 Forbidden anyway)
4User-agent: *
5Disallow: CVS
6Disallow: .svn
7
8# Be gentle
9User-agent: *
10Crawl-delay: 5
11
12# Prevent excessive search engine hits
13Disallow: /cgi-bin/trac.cgi
14Disallow: /log
15
16# Don’t crawl git repos
17Disallow: /git/*.git/*
18Disallow: /git/*.git.broken/*
19
20# "This robot collects content from the Internet for the sole purpose of
21# helping educational institutions prevent plagiarism. [...] we compare
22# student papers against the content we find on the Internet to see if we
23# can find similarities." (http://www.turnitin.com/robot/crawlerinfo.html)
24#  --> fuck off.
25User-Agent: TurnitinBot
26Disallow: /
27
28# "NameProtect engages in crawling activity in search of a wide range of
29# brand and other intellectual property violations that may be of interest
30# to our clients." (http://www.nameprotect.com/botinfo.html)
31#  --> fuck off.
32User-Agent: NPBot
33Disallow: /
34
35# "iThenticate® is a new service we have developed to combat the piracy
36# of intellectual property and ensure the originality of written work for
37# publishers, non-profit agencies, corporations, and newspapers."
38# (http://www.slysearch.com/)
39#  --> fuck off.
40User-Agent: SlySearch
41Disallow: /
42
Note: See TracBrowser for help on using the repository browser.