# Robots Crawler file. # ==================== # # This file is accessed by all robots, not just Google. # It will list the Sitemap URL - a list of all pages to be indexed. # # This file will list all pages NOT to be indexed. # This file is an 'exclusion' list. # # The exclusion list is only a set of requests - not orders. # Similar to 'Private - Keep Out' on an unlocked door. # Nasty robots will ignore all such requests. # # # This file does not support an 'Allow' statement. # It does support a 'Sitemap' statement, that acts as an 'include' file. # ie Sitemap: http://www.pastpages.co.uk/crawler.txt # This is an indicator to robots to examine all URLs in that file. # # # Types of Site Files: # # sitemap.xml Used by Google, and maybe a few others (submitted) # sitemap.xml.gz Used by Google, and maybe a few others # urllist.txt Used by Yahoo and a few others. (submitted) # ror.xml # # # This file is used by automatic site file generators. # They normally include every file they come across, unless told otherwise. # Robots.txt should specify all files and folders that should be ignored. # It may take several passes to refine the exclusions. # # # # # End of General Notes and Comments # =========================================================================== # # Notes on wildcards. # #Disallow: /*K- this Disallows URLs starting with 'K-' #Disallow: /*K-* this Disallows 'K-' anywhere in URL string #Disallow: /K-* this Disallows URLs ending with 'K-' # This will stop all the small image files from being indexed. # Robots does not seem to support the '?' as a wildcard as this character # is used extensively in search queries and is used to suppress search URLs. # #Disallow: /*.jpg$ this Disallows jpg files. The '$' anchors the match # to the very end of the string. # # # # =========================================================================== # The sitemap listed below holds all useful pages on the site, # and is a kind of 'Include' file. Sitemap: http://www.oldkentmaps.co.uk/urllist.txt # ............................................. # The lines below are all exclusion statements # They eliminate the files that # (a) should not be included in any automated Site File - very important # (b) should not be viewed by a robot - not so important # User-agent: * means any robot, not specifically Google or Yahoo. User-agent: * Disallow: /*.css$ Disallow: /*.asp$ Disallow: /*.jpg$ Disallow: /*.gif$ Disallow: /refresh* Disallow: /no_frames.htm Disallow: /transfer-list.htm Disallow: /images/ Disallow: /off-line/ Disallow: /site-files/ Disallow: /*K-