Every once in awhile its a great idea to audit the Search Engine Results for duplicate pages, 404’s, and any thing else we don’t want showing up in the results. This might sound super OCD to some people but, presenting your site to google in the best possible way will give you those advantages you want. In order to view the pages of ours that we have indexed in google, bing or yahoo. We use a search operator, and the operator we are wanting to use is “site:domainname.com“. This is because we are only wanting to look at this websites results.
Tip: Looking at one page with 100 results is so much easier than looking at 10 different pages with 10 results. Use this format: https://www.google.com/search?num=100&q=site:tumblr.com
We are wanting to let google know these pages are really just to help with the user experience. Examples of pages we likely would want to no-index – Category pages, eBooks index page, tags, and other miscellaneous archives like the authors page. Generally you will not come across many situations where you will want to Nofollow Noindex a page. This will perfect bots from reading the page, and indexing.
When i was getting started on this site, i was playing around a lot with the category structure for 1 major reason. I wanted a layout that would be super easy to find related topics for anyone that visits the blog. This will also help google see the authority within my site with all the supporting articles. Tags was another idea i thinking about using, to be very specific in the breadcrumbs. After playing around with it, felt it was over kill. Those old tag and category archives were flooding the serp results, and its time to remove them.
Reviewing the Indexed Pages in Google:
The changes to the category/link structure created a lot of duplicate links in the SERP. Google is seeing and thinking I have 5+ different blog post on the same topic, using the same content, just different page urls. Not something we really want, and will be interesting to see how the ranking improves after cleaning this up.
The good news is category level changes in WordPress will still redirected internally to the correct page. So those should update the next time google does a full crawl. If I don’t see these changed/updated within a few weeks, then ill manually submit them for removal. Re-index the new correct page with the url structure we are currently using now.
No-indexing and Deleted Unwanted Pages:
Cleaning up the results doesn’t always mean fixing your mistakes. I’ve been running a small paid test campaign on making a free webpage for a review of our work. I don’t want this link to be found unless i sent it to you, so lets go ahead and remove that and noindex the page.
We really only have 3 possible options when it comes to telling the spiders what to do with our page. Noindex, Follow= Bots are still welcome to crawl the page, but not to index it. Its like saying hey google, this page is just to help users, dont include it in the search engine. Noindex, Nofollow = Bots are not allowed to read this page, or index it. You are really just “asking” though, if they wanted too they could of course crawl the page regardless. It might not be google/bing, but a private company that is collecting data.
<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”>
<META NAME=”ROBOTS” CONTENT=”INDEX, NOFOLLOW”>
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
Rather than keep the tag pages on our site and using resources, lets just go ahead and remove all of them.
With the Category pages, and author pages if you are planning on using multiple content writers. You want to keep these because they are what gives us the url stuture we like, and helps improve user experience. It also helps users find more related topics so they can learn more about it. You just need to tell google to just ignore them when it comes to ranking/indexing ect ect.
Submitting URLs to be Removed:
If you have pages like our “test page” or “free website page” that you want live, but dont want people finding in search engine results. Then go ahead and temporally remove the Urls within the google search console. Note* you will want to no index them if you never plan on having these indexed in the SERP. I went ahead and added the tag and category pages as well, to just get the ball rolling on the clean up process.
Normally within a day or so you will see those URLs being processed and removed from the results.
Asking Google to Recrawl The Site
If you have updated a larger amount of errors/ small problems in the meta or url structure, page content. It happens to the best of us, but you can ask google to go ahead and re-crawl your entire website for updates.
Its kind of hidden, but under the crawl menu in search console you will find “fetch as google”
Fetch the page of your domain as google. Once you do, this will list the page below under Path.
Submit to Index – On the page you want google to crawl.
The two choices :
Crawl only this URL
Crawl this Url and its Direct links
If you only have one page to update, than go ahead and select crawl only this url. The site folder blog has all the pages to our articles, we are going to submit /blog/ to be recrawled and its direct links.
Note: You are limited to the amount of special request like this that you can do a month. So make sure to keep that in mind.
Its slowly updating, changing to what we are wanting it to be. If you are a newer site, patience will page off once its aged and google crawls it more for new updates/articles.
Note: Don’t do bulk deletion, like trying to remove 100+ links a day… With google and links, Always go about it slowly.