The name is quite evocative and symbolic: orphan pages and indicates the main feature of these resources, which have no inbound references from any other page of the site.
That is, there are pages that do not receive internal links, practically isolated from the site structure and other pages. Even from this summary alone it is clear that the presence of this situation can be a problem for SEO, but searching for and correcting orphan pages is not complicated and there are various intervention tools.
Definition of orphan pages
In SEO parlance, orphan pages are defined as those that are present on the site but do not have links pointing to them from any other page. Orphan page, in English, can therefore be a URL or a subpage that is physically present but substantially invisible to browsing users because it is absent from the site’s internal linking structure.
Be careful not to confuse them with dead-end pages: these are bottoms or pages that lead nowhere else, because they have no exit links.
SEO problems caused by orphan pages
Orphan pages are URLs that cannot be found regularly by users and, in some ways (if not in the sitemap), not even by Googlebot, which has the function of following links, external and internal, and determining the structure and shape of the site.
Their presence causes various SEO problems, such as a weak index, internal link structure deficiencies (if the orphan page has links to other resources), but also difficulties in keyword targeting.
Causes of orphan pages
There are several reasons why these URLs may appear: product pages that are no longer in stock, old news content that is now disabled or deleted videos.
Other reasons that generate orphan pages are incorrect use of CMS for page creation, poor management of a migration, categories taken offline without redirection, failure to delete test pages (e.g. those used for A/B testing).
Then, there are two common technical causes that give rise to orphan pages that should be immediately remedied, as they essentially create duplicate pages that should automatically and consistently redirect to a single URL. These are the management of non-canonical HTTPS / HTTP and www / non-www and that of trailing slashes, the final address bars.
Check the page variants.
Ideally, every published page of the site should use HTTP or HTTPS (preferably) consistently and always in the www or non-www version.
To check for errors, you can do a simple test: type the four variants of the site’s home page into your browser –
https://www.example.com http://www.example.com https://example.com http://example.com
Check that all four automatically redirect to the same URL which, for consistency, should be set as canonical for itself.
If one of these variants does not redirect correctly, it may be a sign of similar problems on other pages and you should check other URLs for the same problem;
Check links with trailing slash
Another thing to watch out for is the consistent use of slashes.
For example, these two URLs may produce the same content, but the URLs are not identical:
To find out if the settings are correct, just do a random check on some pages of the site searched with and without the final bar, checking if there is an automatic redirection to the same URL and if the choice is consistent.
Negative effects of orphan pages for SEO
In general, the linking structure of a website should be organized in a uniform way to ensure two objectives: to encourage internal link juices to important pages and to ensure a good user experience.
Orphan pages have no value for the site and can even become harmful, especially if they are present in large numbers.
On the one hand, it creates frustrating experiences for users because they cannot access these pages through the natural structure of the site; if there is important or useful information on those pages, then it is wasted.
On the other hand, they can have an impact on crawl budget optimization and on the quality of site visits/conversions: the crawler may not report a lot of data or an indexing-friendly profile, and this can affect rankings in the long run.
Having no internal links, then they receive no equity, and search engines have no semantic or structural context in which to evaluate the page: they have no way of understanding where the page fits into the overall site.
Search for pages with crawler
Search engines, such as Google, usually find new pages in two ways:
- The crawler follows a link from another page.
- The crawler finds the URL listed in the XML sitemap.
In order for Google to crawl and index the page, it must first be able to find it because of the links; in the case of orphan pages, this is not possible and therefore these URLs are often not indexed and may never be displayed in the search results.
Even if they are listed in the XML sitemap, orphan pages remain a problem for SEO and we need to try to identify and correct them.
How to find all the orphan pages of the site
The first step to solve the problem of orphan pages is to identify scannable pages or create a list of URLs that can currently be made by scanning the site.
It is important to have a list of all active URLs – i.e. those that can be crawled – and therefore exclude pages that are not indexable by search engines, because they are classified as noindex or blocked by the robots.txt setting.
Scanning should always start from the home page of the site and continue by making sure you use the canonical URL, including the correct HTTPS or HTTP versions and www or non-www versions.
Compare URL lists to find gaps
Once the scan has been obtained, the list of URLs is exported to an Excel worksheet.
Now we need to continue with the gap analysis, which compares data from different sources, looking for any discrepancies: for example, Google Analytics data, Search Console data, Sitemap data or site file server logs.
What matters is to have complete lists of URLs that can be analysed for “missing” resources to identify gaps, precisely: using, for example, the match formula in excel, the match search is automatically launched and it will be possible to find orphan URLs.
How to manage and resolve orphan pages
After going through these steps and finding all the orphaned pages, it is time to figure out what their fate should be based on some evaluation and reflection:
- Is the page relevant?
- Does it rank for some keywords, despite everything?
- Does it generate visits?
- Do you receive backlinks from authorized external sources?
- Does its existence in the site’s taxonomy make sense?
- Is it optimized?
If the answers are positive, it is necessary to improve this page and insert it into the internal link structure of the site, simply by linking to it from another page; to improve its performance, you can update and improve the content if necessary.
On the other hand, if the page is useless and, in addition, has duplicate or near-duplicate content, the best option is to remove it by setting an HTTP 404 or 410 code, which could also offer advantages in terms of crawler budget efficiency.