MediaMustard

+44 (0)20 7183 5689

sales@mediamustard.com

Make effective use of robots.txt

A "robots.txt" file tells search engines whether they can access and therefore crawl parts of your site. This file, which must be named "robots.txt", is placed in the root directory of your site.

You may not want certain pages of your site crawled because they might not be useful to users if found in a search engine's search results. If you do want to prevent search engines from crawling your pages, Google Webmaster Tools has a friendly robots.txt generator to help you create this file.

(Note that if your site uses subdomains and you wish to have certain pages not crawled on a particular subdomain, you'll have to create a separate robots.txt file for that subdomain.)

There are a handful of other ways to prevent content appearing in search results, such as adding "NOINDEX" to your robots meta tag, using .htaccess to password protect directories, and using Google Webmaster Tools to remove content that has already been crawled.

Tips

  • Use more secure methods for sensitive content - You shouldn't feel comfortable using robots.txt to block sensitive or confidential material. One reason is that search engines could still reference the URLs you block (showing just the URL, no title or snippet) if there happen to be links to those URLs somewhere on the Internet (like referrer logs). Also, non-compliant or rogue search engines that don't acknowledge the Robots Exclusion Standard could disobey the instructions of your robots.txt. Finally, a curious user could examine the directories or subdirectories in your robots.txt file and guess the URL of the content that you don't want seen. Encrypting the content or password-protecting it with .htaccess are more secure alternatives.
  • Avoid:
  • allowing search result-like pages to be crawled (users dislike leaving one search result page and landing on another search result page that doesn't add significant value for them)
  • allowing a large number of auto-generated pages with the same or only slightly different content to be crawled: "Should these 100,000 near-duplicate pages really be in a search engine's index?"
  • allowing URLs created as a result of proxy services to be crawled

Be aware of rel="nofollow" for links

Setting the value of the "rel" attribute of a link to "nofollow" will tell Google that certain links on your site shouldn't be followed or pass your page's reputation to the pages linked to. Nofollowing a link is adding rel="nofollow" inside of the link's anchor tag.

If you link to a site that you don't trust and don't want to pass your site's reputation to, use nofollow

When would this be useful? If your site has a blog with public commenting turned on, links within those comments could pass your reputation to pages that you may not be comfortable vouching for. Blog comment areas on pages are highly susceptible to comment spam. Nofollowing these useradded links ensures that you're not giving your page's hard-earned reputation to a spammy site. Many blogging software packages automatically nofollow user comments, but those that don't can most likely be manually edited to do this.

This advice also goes for other areas of your site that may involve user-generated content, such as guestbooks, forums, shout-boards, referrer listings, etc. If you're willing to vouch for links added by third parties (e.g. if a commenter is trusted on your site), then there's no need to use nofollow on links; however, linking to sites that Google considers spammy can affect the reputation of your own site.

Another use of nofollow is when you're writing content and wish to reference a website, but don't want to pass your reputation on to it. For example, imagine that you're writing a blog post on the topic of comment spamming and you want to call out a site that recently comment spammed your blog. You want to warn others of the site, so you include the link to it in your content; however, you certainly don't want to give the site some of your reputation from your link. This would be a good time to use nofollow.

Lastly, if you're interested in nofollowing all of the links on a page, you can use "nofollow" in your robots meta tag, which is placed inside the <head> tag of that page's HTML. The Webmaster Central Blog provides a helpful post on using the robots meta tag. This method is written as <meta name="robots" content="nofollow">. This nofollows all of the links on a page

Promote your website in the right ways

While most of the links to your site will be gained gradually, as people discover your content through search or other ways and link to it, Google understands that you'd like to let others know about the hard work you've put into your content. Effectively promoting your new content will lead to faster discovery by those who are interested in the same subject. As with most points covered in this document, taking these recommendations to an extreme could actually harm the reputation of your site.

Tips

  • Blog about new content or services - A blog post on your own site letting your visitor base know that you added something new is a great way to get the word out about new content or services. Other webmasters who follow your site or RSS feed could pick the story up as well.
  • Don't forget about offline promotion - Putting effort into the offline promotion of your company or site can also be rewarding. For example, if you have a business site, make sure its URL is listed on your business cards, letterhead, posters, etc. You could also send out recurring newsletters to clients through the mail letting them know about new content on the company's website.
  • Know about social media sites - Sites built around user interaction and sharing have made it easier to match interested groups of people up with relevant content.
  • Avoid:
    • attempting to promote each new, small piece of content you create; go for big, interesting items
    • involving your site in schemes where your content is artificially promoted to the top of these services
  • Add your business to Google's Local Business Center - If you run a local business, adding its information to Google's Local Business Center will help you reach customers on Google Maps and web search. The Webmaster Help Center has more tips on promoting your local business.
  • Reach out to those in your site's related community - Chances are, there are a number of sites that cover topic areas similar to yours. Opening up communication with these sites is usually beneficial. Hot topics in your niche or community could spark additional ideas for content or building a good community resource.
  • Avoid:
    • spamming link requests out to all sites related to your topic area
    • purchasing links from another site with the aim of getting PageRank instead of traffic

Make use of free webmaster tools

Major search engines, including Google, provide free tools for webmasters. Google's Webmaster Tools help webmasters better control how Google interacts with their websites and get useful information from Google about their site. Using Webmaster Tools won't help your site get preferential treatment; however, it can help you identify issues that, if addressed, can help your site perform better in search results. With the service, webmasters can:

  • see which parts of a site Googlebot had problems crawling
  • upload an XML Sitemap file
  • analyze and generate robots.txt files
  • remove URLs already crawled by Googlebot
  • specify the preferred domain
  • identify issues with title and description meta tags
  • understand the top searches used to reach a site
  • get a glimpse at how Googlebot sees pages
  • remove unwanted sitelinks that Google may use in results
  • receive notification of quality guideline violations and file for a site reconsideration

Yahoo! (Yahoo! Site Explorer) and Microsoft (Live Search Webmaster Tools) also offer free tools for webmasters.

Take advantage of web analytics services

If you've improved the crawling and indexing of your site using Google Webmasters Tools or other services, you're probably curious about the traffic coming to your site. Web analytics programs like Google Analytics are a valuable source of insight for this. You can use these to:

  • get insight into how users reach and behave on your site
  • discover the most popular content on your site
  • measure the impact of optimizations you make to your site (e.g. did changing those title and description meta tags improve traffic from search engines?)
  • For advanced users, the information an analytics package provides, combined with data from your server log files, can provide even more comprehensive information about how visitors are interacting with your documents (such as additional keywords that searchers might use to find your site).
  • Lastly, Google offers another tool called Google Website Optimizer that allows you to run experiments to find what on-page changes will produce the best conversion rates with visitors. This, in combination with Google Analytics and Google Webmaster Tools (see our video on using the "Google Trifecta"), is a powerful way to begin improving your site.

Except as otherwise noted, the main content of this document is licensed under the Creative Commons Attribution 3.0 License. Source: Google.