Duplicate content can be a clear and present danger to the success of your rankings and it can easily lead to a loss of traffic which is why it is an issue that needs to be addressed and remedied if you don’t want to experience unwanted search engine difficulties.
A good starting point would be to understand clearly what duplicate content is and, just as importantly, gaining an insight as to why it can be such a problem.
You will obviously comprehend the basics of duplicate content, which is that it is content that appears on the internet in multiple locations. Each location will have its own unique URL and if the same content appears across more than one web address it is a blatant example of duplicate content.
How does Google handle duplicate content?
It is always worth talking about how Google interprets the issue of duplicate content as it is the dominant search engine, of course, and that means it is highly relevant to appreciate their take on the issue.
One thing to say straight away is that there are plenty of myths and misunderstandings surrounding this issue and what Google actually thinks and does about the duplicate content.
The majority view is that if you are caught with duplicate content you will be subjected to a penalty from Google and that seems to scare webmasters to a greater degree than the situation actually warrants.
Actually, Google takes what you could class as a forgiving view of duplicate content and they have actually reiterated on numerous occasions that they do not impose penalties when duplicate content is uncovered.
Their view is that duplicate content should be viewed as substantive blocks of content that are within or across domains that completely or substantially mirror other content. The key takeaway is that they generally tend to accept that this content is not deceptive in origin.
In other words, mistakes happen and it is also widely accepted that at least 25% of the web is filled with duplicate content, so they would have a gargantuan task on their hands to penalize all of that.
Another key point to consider is that Google has actually put a lot of thought into ways of preventing duplicate content and is has designed algorithms that are intended to shelter webmasters from any impact.
How these algorithms work is to gather the various versions of the same content that it has found and put them into a prioritized cluster, where the URL that is considered the best makes the cut and is displayed.
They also consolidate links from pages within that cluster so they are doing a bit of the admin and housekeeping for you to a certain extent, although you don’t have that element of control over what they deem to be the best.
The only time that Google will likely time a dim view of duplicate content and take action is when it decides that it has been created solely for the purpose of manipulating search results.
The key takeaway here is that Google will attempt to identify the original source of the content in question and display that version in preference to others.
If you spot duplicate content yourself the best thing to do is file a request to have it removed using the Digital Millenium Copyright Act for guidance. What you don’t want to do is take upon yourself to block access to the duplicate content as this will thwart Google’s consolidation methods.
What causes duplicate content?
Knowing how search engines such as Google handle duplicate content helps with your SEO strategies and it would also be helpful to gain a better understanding of some of the underlying causes of duplicate content.
Here is a look at some of the typical scenarios that end up creating duplicate content. Many of them are technical and are not as a result of human input, as the majority of us would not intentionally put the same content in two different places.
One prominent issue revolves around a developer failing to fully appreciate the fundamental concept of a URL. It’s about a unique ID for each article on your database which is used as the method of identification rather than the URL. However, the search engine will view the URL as the unique identifier for that piece of content, which then leads to a duplication issue.
Another URL issue to be aware of is when parameters are used for tracking and sorting but they do not change the content of the page, when tracking links, for example. This setup could make it more challenging to achieve a good ranking.
Also, it can often be the case that as a website surges in popularity it seems to encourage a greater number of scrapers. Content scrapers are automated programs that are designed to gather data across a multitude of different websites and they represent a big threat to any site that has worked hard to ensure it has predominantly unique or proprietary content.
The biggest impact of scraping is that it leads to duplicate content. If that is not enough, there is the problem that if your content is now being used on another site it could cause your rankings to drop.
These are just a few of the causes of duplicate content and we can help guide you on these technical challenges if you are experiencing issues with scrapers, URLs, and other problems that can lead to duplicate content.
Taking a proactive approach to resolving duplicate content issues
Firstly, it would be a smart move to try and maintain a level of consistency with your internal linking.
An example of internal linking that can cause duplicate content would be as follows –
This lack of consistency could easily create duplicate content issues.
Next, if you restructure your site it is essential that you make use of 301 redirects, which is a RedirectPermanent instruction within your .htaccess file that executes a redirect with minimal delays. It also has the effect of redirecting Googlebot and other spiders and this administrative instruction helps to address what could otherwise be a duplicate content scenario.
You should also take care with any syndication arrangements you have in place. Google, for instance, is designed to automatically display the version it believes is the most appropriate for each user and their chosen search, but this might not be the version that you want them to see.
The way to counteract this problem would be to make sure that every syndicated site includes a link back to your original content. Additionally, you might want to take the added precaution of requesting that anyone using your syndicated content uses the noindex meta tag so that search engines are not able to index their version of your content.
If you are providing multilingual versions of your website it would be a good idea to use top-level domains as this helps the search engine to identify and handle country-specific content more easily. This will make it easier for them to provide the most appropriate version of your page when returning a search result.
Another point to consider relates to the use of similar content on your website, which can cause some confusion and the perception of duplicate content. For example, you might have a couple of pages that make use of the same core information on each of them, which could be perceived as duplicated content.
The way to resolve that issue would be to consider consolidating the pages or expand the content of each one so that they contain more unique text on each.
Other points to consider include-
Avoid publishing stubs- The use of these placeholders results in publishing pages where there is not actually any content. Use the noindex meta tag to stop these pages from being indexed if you are going to have them on your site.
Avoid boilerplate repetition – A good example of this would be when you carry the same disclaimer message on the bottom of each page. A better approach would be to shorten that message to little more than a link to a page that contains more comprehensive details.
Familiarize yourself with the site’s content management system – Understanding how content is displayed on your site could be crucial in your battle to minimize the impact of duplicate content. The reason for this is that you could end up showing the same content in multiple formats and that could be viewed as duplicate content.
The bottom line is that duplicate content on your site is not going to cause you to feel the full force of Google’s wrath unless they decide that the primary purpose of your duplicate content is to try and deceive and manipulate search engine results in your favour.
If Google takes that view it will remove your site from its search results and you will then have to submit your site for reconsideration once you have made changes and checked that its format and content no longer violates its guidelines.
Clearly, duplicate content can be a potential danger to the success of your rankings and that’s why you need to be proactive with your efforts to avoid it happening and get outside help if you are unsure how to minimize the impact it can have.