Web robots are also known as WWW robots or Internet bots, which are automatic program used to analysis, crawl and index the website in search engine. Web crawling is the process of create a copy of webpages and index in search engine. This web robots will first checks the website’srobots.txt file, each and every site have its own robots.txt file.
User-agent: the robot the following rule applies to
Disallow: the URL you want to block
User-agents
Note: / is the directory, which contains all the files of website. (slash) / is also called as root directory.
- To block the entire site
Disallow: /
- To block the entire site for all search engine.
User-agent: * Disallow: /
- To block the entire site for only particular search engine.
User-agent: Googlebot Disallow: /
- To block a directory and its content
Disallow: /junk-directory/
- To block a page
Disallow: /private_file.html
- To remove a specific image from Google search
User-agent: Googlebot-Image Disallow: /images/tobby.jpg
- To remove all images from Google Image Search
User-agent: Googlebot-Image Disallow: /
- To block files of a specific file type (for example, .gif)
User-agent: Googlebot Disallow: /*.gif$
All gif format files will be blocked and it will not be indexed in google search.
- Google adsenseworks based on website crawling( ads will display based on site content). This makes the web page from search results, but keep the mediapartners-google to crawl the site for displaying ads.
User-agent: Mediapartners-Google Allow: /
0 comments:
Post a Comment