NoIndex Header Tag Module

By | March 2, 2012

Sometimes you might want to have Google and other (are there any other?) search engines not show certain products in your site. This is where noindex is worth some value. Have a read of this Google Support page.

With that in mind, we know that any page with noindex will eventually be removed from the Google results pages. It might take some time, but it will happen when the Googlebot hits your page and reads the noindex tag.

Took me all of 10 minutes to code up a quick and easy mod to enable shop owner to set each product to index or noindex. Default is, of course, index!

Admin side:

An extra red/green selector;
red = noindex
green = index

Shop Side;

No outward change, but looking at the underlying code of the page, this is added if a product has that red selector;

This is done using a Header Tag Module

The benefit to this is that these modules are drop in, so there is not any need to amend the code of the product_info.php page.

Be aware that some people giving advice at the osCommerce sell SEO services and therefore are going to try to confuse the hell out of you by spouting shite.

5 thoughts on “NoIndex Header Tag Module

  1. enigma1

    Well to be honest I read about the argument in the osc forum. So here is what I would do because I don’t trust any of the methods proposed.

    Google has been known to make frequent algo changes about the way googlebot perceives tags and they may decide in the future to process the noindex differently. Some other search engines may ignore it even today and still list the content which in turn may endup with a link reference on another site. Although the bot theoretically should never index the original page I am sure it will keep accessing it because it will find links from the product listing pages, best sellers, sitemap etc.

    Now few years back – I think it was 2008 – I was monitoring the logs of an osc site, because the googlebot was insisting accessing the popup_image.php file since it wasn’t a physical image file behind it and was indexing the popup_image.php unfortunately. I tried pretty much everything I could think of, including the noindex tag, but the bot was completely ignoring them back then. I even used nofollow, but that didn’t work either I believe it was happening because it was seeing the link as part of the jscript line. Eventually I used some javascript with jquery/fancybox to get around it and retired the popup_image.php. So the bot could no longer access them.

    I then came across on something similar for another site to what the poster mentioned. It wasn’t about hiding products but was about thin content due to many information pages. So basically the merchant was using extensive/detailed info pages (one or two text pages for each product) and I think the spiders were having trouble realizing whether the site was about commerce or a blog or a journal etc because of too much text. Remembering the previous problem, I used jquery again with a lightbox and I had a js link, so if the visitor wanted some detailed info he would click the link which in turn did an ajax call fetching the data from the server. The hard-coded link googlebot would see about it was just a hash ‘#’ so it couldn’t see anything as I was hooking the click event and the actual link was encoded with another HTML property in the link (class, id, etc., can be used). Plus the ajax call was via /POST to be sure even if the request was attempted by spiders (because they found a hard link externally) they could get an invalid request response or a 301 redirect etc. I then removed the old php scripts that were exposed with spiders. It worked in the sense the main focus of the site’s content shifted mainly for ecommerce products which was what the merchant wanted.

  2. Gary Post author

    Hi Mark – long time buddy!

    We can only go by what the rules are now. If they change in the future then we change to accomodate it. The point of this mod is to add the noindex meta tag, no more no less – with that meta tag in place, Google will not index the page, assuming that they have seen the page. If they have not seen the page, then can request in/ex clusion as required.

    Or, to put it more clearly;

    A page that is seen by the googlebot as having noindex will not show in the index on the next update. That’s the rule (for Google at least) as of today.

    I think that other poster is spreading uncertainty because he sells SEO services. I don’t ..

  3. Gary Post author

    ps, what’s your thoughts on making a 410 instead of a noindex ? 410 might be too strong?

  4. enigma1

    No actually, I think a 404 or 410 header will do well in this case. Spiders won’t index these pages while the browsers will render them normally.

  5. enigma1

    One thing to make sure though is no internal links exposed to 4x pages because the spiders will report them as site errors.

Leave a Reply

Your email address will not be published. Required fields are marked *