SEO Tip of the Week: Crawl Budgets

SEO Tip of the Week: Crawl Budgets

90 Digital CEO Nick Garner gives us tips on crawl budgets in this edition of CalvinAyre.com’s SEO Tip of the Week.

Hi there, this is my seo tip of the week – extended version!

Getting the best rankings from big sites i.e. bookmaker sites or huge affiliate sites

Have you heard of  ‘crawl budget’ and ‘crawl rank’? My tip is to account for this if you have a BIG website. That’s because Google gives you a crawl budget in line with your page rank. If you have pages with a tiny amount of page rank, they are likely to not be crawled often, if at all. This means if competitors are being crawled more frequently for a similar page than you – they will out rank you. Therefore steer Google to the pages that you care about the most so they get more crawl budget and so you then have more chance of ranking.

Matt Cutts talks about crawl budget:

“The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we’ll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we’ll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline. Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank. The pages that get linked to a lot tend to get discovered and crawled quite quickly. The lower PageRank pages are likely to be crawled not quite as often.”

What Dawn found with her 1,500,000 page site was that when she asked Google to index everything through sitemaps.xml, the more they indexed, the less they crawled. The upshot was a huge cleanup of her site, focusing what crawl budget they had onto certain parts of her site that really mattered to her. For an operator, this means concentrating on pages that have potential traffic (obviously!) for instance making sure Google doesn’t look at pages with the names of horses, but focuses on particular horse races.

This is what she did to fix her site up:

  1. Find out where googlebot goes and keep watching

  2. Ensure urls return the correct server response and keep checking

  3. Ensure that dynamic variables validate & watch out for infinite loops

  4. Don’t be afraid of hard 404’s – give a 410 response where necessary – avoid soft 404’s

  5. Check xml sitemaps – thoroughly

  6. Categorise xml sitemaps

  7. Gain access to testing / dev environment before template changes go live

  8. Ensure your important pages have the most internal links

  9. Understand and manage parameters and url rewrites

  10. Use robots.txt well

  11. Avoid phoney .htaccess folders

  12. Avoid deep architectures

  13. Avoid a jumble sale – too flat

  14. Visit webmaster tools daily (at least)

  15. Optimise your internal structure – utilise cross module linking if necessary

http://www.move-it-marketing.co.uk/seo/website-architecture/infinite-loops-dirty-architecture-too-many-indexed-urls

Recap: It’s a nice idea to have Google index every URL, but from personal experience, it does not mean more rankings. It’s far better to concentrate Google around the pages you care about get more frequent crawls and aim to rank those.

——————————————-

other ideas:

organising international sites properly

http://www.koozai.com/blog/news/events-news/smx-london-2014-day-1/ John meuller

International hreflang markup is essential for international websites if the wrong site is showing in SERPs. This is the best way to help Google understand international sites.

For canonical tags content should be identical, not similar (excluding dynamic content or ads, etc.)

————————————-

See if you have bad links

http://spyonweb.com/