Blocking WordPress Pages from Being Indexed by Google with Robots.txt

The Robots.txt file is a file every WordPress site comes with, which tells Bots and Search Engines which pages you want them to index. It’s more of a request (rather than a demand), but the Robots.txt file is publicly accessible at www.yourdomain.com/robots.txt so be aware the disallowing a page won’t make it hidden from the web. In fact, in some cases it’s even easier to find. It only means that it won’t show up on major search engines.

In most WordPress themes, The Robots.txt file is typically easily editable from the WP Dashboard’s File Editor (like so). But not with Root’s Sage theme. That access is restricted. WordPress baked the Robot.txt into WordPress with version 3. So now the best way to edit it is by adding a robots_txt WordPress filter in the theme’s function.php– or preferable Sage/src/filters.php file.

For this particular use case, I didn’t want our AdWords landing pages showing up on Google when people search for our products. I don’t want anyone going to these pages that aren’t coming from AdWords because it can skew the data. So I’ve added this to our filters.php file:

/**
* Filter function used to disallow AdWords pages from robots.txt
*/

add_filter('robots_txt', function($content, $is_public) {
$content .= "Disallow: /*-gaw01";
return $content;
}, 10, 2);

Our AdWords pages URL is the same as the regular public product, but ends in -gaw01. So adding this filter will prevent any URLs containing -gaw0 to be ignored. Now our Robots.txt look like this:

User-agent: *
Disallow: /
Disallow: /*-gaw01

Leave a Reply