Skip to main content

robots.txt

Understand how the robots.txt file works on Access Volcanic websites and how to interpret SEO crawler warnings.

Grace avatar
Written by Grace
Updated over a month ago

Overview

This article helps you understand how the robots.txt file works on your Access Volcanic website. The robots.txt file tells search engines which parts of your website they can access. It is managed centrally by Access Volcanic to protect sensitive areas and direct search engine crawl budget towards your most valuable content.


Key benefits

  • Protect sensitive and system areas from being crawled by search engines.

  • Prevent low-value URL variations from being indexed unnecessarily.

  • Help search engines focus on your valuable content such as job listings and website pages.

  • Reduce unnecessary server load from infinite URL variations.

  • Ensure consistent SEO behaviour across all Access Volcanic websites.

  • Maintain security by blocking access to administrative and authentication areas.


Before you start

Before reviewing robots.txt information, make sure you have:

  • Basic understanding of SEO and how search engines work.

  • Access to your website's robots.txt file at your-domain.com/robots.txt.

  • Knowledge of any SEO tools you're using that may flag blocked URLs.

⚠️ Important: The robots.txt file cannot be customised or changed on individual Access Volcanic sites. It is managed centrally and reviewed regularly to follow SEO best practices.


Understanding robots.txt file configuration

The robots.txt file is a publicly accessible file that tells search engines and other crawlers which parts of your website they are allowed or not allowed to access.

Robots.txt prevents indexing of sensitive or private areas.

It protects system endpoints that should not be crawled. It reduces unnecessary server load from infinite URL variations and ensures only relevant content appears in search engines.

Standard configuration used

Below is the standard robots.txt file used across Access Volcanic sites:

User-Agent: *

Disallow: /admin$
Disallow: /admin/*

Disallow: /sa$
Disallow: /sa/*

Disallow: /api/*

Disallow: /users/auth/*
Disallow: /sso/*

Disallow: /*?*

Disallow: /templates/*

Allow: /db_assets/production*?t=*

Disallow: /job/*/apply
Disallow: /job/*/save_job
Disallow: /job/*/unsave_job

Disallow: /jobs/*/*/*

Sitemap: [sitemap URL]

Key rules explained

  • Administrative areas (/admin, /sa) block administrative and super admin areas for security purposes.

  • System endpoints (/api, /users/auth, /sso) prevent sensitive functional endpoints and authentication flows from being crawled or indexed.

  • Query parameter URLs (/?) block URLs that contain query strings. The /? pattern means any URL with a ? query parameter. This includes internal search result pages that could create infinite URL combinations.

  • Job action URLs block functional URLs like apply, save_job, and unsave_job. These perform actions rather than display content.

  • Asset allowance (Allow: /db_assets/...) explicitly allows important static assets to be crawled even when they include query parameters.

  • Sitemap reference points search engines to your automatically generated sitemap listing key pages for indexing.

Understanding SEO crawler warnings

When you run SEO crawls with third-party tools, you may see warnings or errors for URLs. These include job action links or internal search URLs containing query parameters.

These are not real issues because on Access Volcanic sites, these URLs are intentionally blocked for good reasons.

Functional URLs like apply, save_job, and unsave_job are actions for logged-in users. They are not pages for search engines.

Query-parameter URLs, including internal search pages can create huge numbers of URL variations. These wastes crawl budget and dilutes visibility of important pages.

Most automated SEO tools cannot interpret the purpose of each URL. They simply report that a URL exists and is blocked by robots.txt. This leads them to flag blocked URLs as warnings, even when the block is deliberate and beneficial for SEO.

πŸ“Œ Note: For more information about SEO optimisation, see the Improving your SEO help guide.


Best practices

  • Use your SEO tools to focus on genuine content issues such as thin content, broken links, or missing metadata rather than warnings about URLs blocked by design in robots.txt.

  • Treat blocked URLs under /admin, /sa, /api, /users/auth, /sso, and query-parameter URLs as expected behaviour when reviewing crawl reports.

  • Refer to your sitemap linked in robots.txt when checking which pages are intended to be indexed.

  • Contact Access Volcanic Support if you need help interpreting SEO crawler results or understanding how your site is indexed.

  • Include a sample of the URLs being flagged and a copy or screenshot of the relevant SEO tool report when requesting assistance.

⚠️ Important: The robots.txt configuration prevents unnecessary or harmful URLs from being indexed and preserves crawl budget for important content. Because robots.txt is managed at platform level, you do not need to make changes.


FAQs

Q1: Can I change or customise my robots.txt file?

  • Answer: No, on Access Volcanic, robots.txt is standard and managed at platform level. Individual sites cannot override or edit it to ensure consistency and security across all client websites.

Q2: Why does my SEO tool show lots of blocked URLs as errors?

  • Answer: SEO tools often flag blocked URLs as warnings or errors. They don't understand that on Access Volcanic, URLs like job actions or internal search results are blocked intentionally to improve your site's SEO performance.

Q3: Are blocked job application URLs bad for my SEO?

  • Answer: No, job actions such as apply, save, or unsave are not content pages and blocking them helps search engines focus on the job detail pages and other meaningful content instead.

Q4: How do search engines know which pages to index?

  • Answer: The Sitemap line in robots.txt points to your automatically generated sitemap. Search engines use this to discover the main pages that should be indexed rather than relying on blocked URLs.

Q5: Will these blocks affect my search rankings?

  • Answer: No, the robots.txt configuration is designed to improve your search performance by directing search engines to your valuable content while protecting system areas and preventing indexing of low-value pages.

Q6: What should I do about crawler warnings for blocked URLs?

  • Answer: If your SEO crawler shows warnings for blocked administrative URLs, query-parameter URLs, and job action URLs, this is expected behaviour. These URLs are blocked by design to protect your site and improve search performance, so you can safely ignore these warnings.

Q7: Can I see my website's robots.txt file?

  • Answer: Yes, you can view your robots.txt file by visiting your-domain.com/robots.txt in any web browser. This shows the same standard configuration used across all Access Volcanic sites.

Q8: Why are some assets allowed despite the query parameter block?

  • Answer: The Allow rule for /db_assets/production ensures that important static assets like stylesheets and scripts can be crawled by search engines even when they include version parameters, which is necessary for proper site indexing.

Did this answer your question?