When a search engine indexes your site, one of the many things it does it checks to make sure that your content is original (ie: that you aren't just copying from or plagiarizing another site). If it thinks that you have duplicate content - either within your own site or copied from another site - it can hurt your search engine rankings or even remove your content completely from their search results. To state the obvious - this is bad.
The good news is that search engines have a relatively simple mechanism to let them know where it should send users who search for the duplicate content - which will prevent them from hurting your search engine rankings. It is called the canonical url.
When you have the same content in multiple places, you must decide which location you want to be the "primary" (ie: canonical) location and then direct search engines to use that location for search results.
Technical Introduction
There are two main ways to tell search engines what your canonical url is:
- On HTML web pages, the easiest way is to include a specific tag in your document head which looks like this:
<link rel="canonical" href="YOUR_CANONICAL_URL_HERE">
- On other content, you can still tell search engines what your canonical url is by sending a special header in the HTTP response which looks like this: Link: <YOUR_CANONICAL_URL_HERE>; rel="canonical"
One additional note about canonical URLs: Search engines consider canonical URLs as "hints" to be "honored strongly", but they may still choose a different primary location even if you follow all of their recommended guidelines. If you implement all of the proper suggestions but still experience issues then it may be time to consuled an SEO expert.
Setting the Canonical URL
Thankfully it is very easy to set the canonical URL in Marketpath CMS. And you don't have to know anything about HTML or HTTP protocols in order to do it:
- Edit your page
- Navigate to Page -> Search Optimization (SEO) -> Canonical URL
- Enter the location of your canonical (ie: primary) content
- Save
- Marketpath CMS will automatically take care of the rest for you using the canonical link method described above
Setting the Canonical URL on non-HTML pages
For non-HTML pages served from Marketpath CMS, it is still possible to return the canonical URL in the HTTP headers although this does require some additional markup. In your template, include the following or similar code:
{% if entity.canonical_url.is_valid %}
{% capture canonical_header %}<{{entity.canonical_url.value}}>; rel="canonical"{% endcapture %}{% set_header link:canonical_header %}
{% endif %}
Should you always set the Canonical URL?
Some developers and SEO experts recommend always setting the Canonical URL. Their reasoning is that it does not hurt you to do so and may avoid some of the pain associated with unintentional content duplication. Along with other mechanisms for informing search engines about your preferred URLs, setting the canonical URL is a pretty safe thing to do, but there are many cases where it is unnecessary and may not help as much as you think. This is really a decision you should make on a case-by-case basis, although for most content pages it is pretty safe. Consider:
- Do robots.txt, sitemap.xml, RSS/XML/JSON/etc.. feeds, or other meta pages need canonical URLs? No (though I suppose it wouldn't hurt)
- Does a blog page which only serves as an index page to aggregate blog posts need a canonical URL? No
- Does a blog post or article need a canonical URL? Yes
- Does a home page or landing page need a canonical URL? Yes (in most cases)
- Does a page that is not publicly-accessible need a canonical URL? No
- Does a page whose content changes depending on how it is loaded (ie: query parameters) need a canonical URL? No (it would be counter-productive)
Avoiding Duplicate Content
Although canonical URLs are relatively easy to use, the best solution in most cases is simply to avoid duplicate content. This is both an obvious and sometimes nuanced endeavor:
- Sometimes content is intentionally duplicated. Example: providing information in different formats, or placing it in multiple locations for improved usability and user experiences.
- Content is often unintentionally duplicated - often as a result of the technologies used to serve it. Example: A website with separate mobile vs desktop pages, index pages, pages that change based on query parameters, content that is reused on multiple pages of your site, pages served from both www and non-www domains, etc...
When Intentionally Copying and Moving Content
The obvious side of avoiding duplicate content is simply that if you find yourself copying content from one publicly-accessible location to another then maybe you should consider linking to the original content instead. Or you could migrate the content from one location to the other and then link to the content from the original location.
What you probably do NOT want to do is completely remove the original location, since doing so would remove any "SEO juice" from it.
Here is a summary of a few of your options:
- Instead of copying the content, simply link to the original location - optionally with a summary of what the user will find there. This has the added benefit of being easier to maintain since you will only need to update one location when the time comes to update your content.
- If you want to completely move from one page to another, or if you want to combine multiple pages into one: Set up a permanent redirect from the old location to the new one. Search engines will transfer the reputation from the old location to the new one and links to the old location will be transparently transferred to the new location.
- If you want to move the content from the old page to a new page but keep the old page active: Set up a Canonical URL on the old page that points to the new page. Search engines may transfer the reputation from the old location to the new location, but links to the old location will still work and direct traffic there.
- If you want to keep the content on BOTH pages, you should strongly consider adding a Canonical URL to BOTH pages pointing to the one that you want to be the primary location. It is up to you whether that should be the old location or the new location. Both locations will remain accessible and search engines will have a good idea of where to send users who search for your content.
Avoiding Unintentionally Duplicated Content
Avoiding unintentional duplicates can be much more difficult - partially because so many of the factors related to duplicate content are outside of your control or require greater technical expertise to deal with. Here are a few simple guidelines and suggestions to aid with that:
- If possible, use technologies that give you the tools that you need to control your own SEO factors - such as setting up permanent redirects or canonical URLs - without having to do it in code. Even better if those technologies do it automatically for you (so long as they don't take control totally out of your hands - of course. That can get messy)
- Wherever you can, redirect users to the proper location - ideally with permanent redirects. A few examples:
- Redirect all requests from "yoursite.com" to "www.yoursite.com" (set up in Marketpath CMS as a domain-level redirect)
- Redirect all requests from "http://www.yoursite.com" to "https://www.yoursite.com" (set up in Marketpath CMS as a domain setting)
- Redirect all requests from "https://www.yoursite.com/oldpage.html" to "https://www.yoursite.com/newpage" (set up as a redirect in Marketpath CMS)
- Configure the robots meta tag on pages that should not be indexed by search engines - such as non-public pages (configured in the page's SEO properties in Marketpath CMS - also editable via liquid markup)
- Use semantic markup in your HTML to easily distinguish the primary vs supplementary content on each page of your site - which has the added benefit of making your content more SEO friendly in general.
- Where possible, limit your reliance on and use of query parameters. Where that is not feasible consider informing search engines how you would like those query parameters to be handled (ie: through the Google Search Console, etc...)
- When displaying lists of content (eg: blog posts, syndicated content, etc...), only display a summary and then link to the full content at the original source.
For Further Consideration
This topic starts to grow dangerously close to a full discussion of SEO best practices - and that is because it is difficult to adequatly address one aspect of SEO (Canonical URLs) without a better understanding of how SEO as a whole works.
If you have made it this far and want to know more about how to implement SEO best practices then you should really consult an SEO expert. On the other hand, if you are an SEO expert and are interested in helping your clients get the most value for their money with Marketpath CMS please talk to us today. We'd love the opportunity to help each other out!