Print| Email

How to Stop Website Content Scrapers

Content scraping is the practice of repurposing content taken from elsewhere on the web without permission. Content scrapers take your original content and republish it on their own websites. Instead of directing users to the source, content scrapers reproduce your work, depriving you of traffic to your site.

The legality of content scraping varies around the world. Nevertheless, in most jurisdictions a variety of laws may apply to content scraping, including laws surrounding copyright protection.

Copyright Protection Against Content Scraping

Copyright is the sole right to produce or reproduce a work or any substantial part thereof in any material form (subsection 3(1) of the Copyright Act). Copyright subsists in every original literary, dramatic, musical and artistic work. Copyright also subsists in compilations, i.e. works resulting from the selection or arrangement of literary, dramatic, musical or artistic works or from the selection or arrangement of data. In order for copyright to subsist in a work, the work be of copyrightable subject matter, fixed, and original (i.e., not a mere copy another’s work but a work derived from skill and judgement).

By reproducing content from your website, content scrapers may be infringing on your copyright in this content, even if they give you credit or reference your website in some way. Attributing credit to the author does not change the nature of the activity, nor does it serve to excuse the infringement. Note, however, that copyright protects the expression of ideas and not ideas themselves – copyright law may not protect your content if someone is scraping factual information or underlying ideas from your website. An example of factual information is historical facts.

How Else Can I Protect My Content?

So long as your content is available online, there is a risk that it will be scraped and republished elsewhere. However, there are active measures you can take to deter content scrapers from targeting you.

Terms of use can be a valuable tool to give notice to users of your website that content scraping is prohibited. In the event that your content is stolen, terms of use can also give you an additional ground on which to establish legal proceedings—namely, breach of contract. Carefully drafted terms of use can act as a binding legal contract between you and users of your site, which may enable you to pursue a remedy for content scraping under both the law of contracts and copyright.

How you provide your website’s terms of use can affect their enforceability under contract law. Courts in some jurisdictions have drawn distinctions between user agreements that require users to actively click an “accept” button (“click-wrap” agreements) and those formed from a user’s continued navigation or use of the site (“browse-wrap” agreements) – the former are more likely to be enforceable than the latter. A key consideration for browse-wrap agreements is whether the user is provided with sufficient notice of the terms and an opportunity to read them before proceeding further. Be aware that the enforceability of terms of use may also be subject to applicable consumer protection legislation.

Our professionals have experience drafting effective protections for websites directed to a wide range of industries and displaying all types of content. These terms are not only valuable in the context of content scraping, but also serve to limit your liability, set out ownership of your content, and help prevent other possible abuses of your website.

How Do I Know If My Content Is Being Stolen?

There are a number of steps you can take to identify instances of content scraping from your website. With the help of some handy automated tools, you can more easily detect resharing of your content online.

Trackbacks

A trackback is a feature common to Wordpress sites. A trackback is a notification that another website has linked to you. While not intended for the purposes of detecting content scraping, trackbacks can be a valuable part of a content monitoring strategy. By incorporating links into your content which are directed to other content on your site, you will receive an alert when these links are shared on other sites.

Google Alerts and Google Search Authorship

Speaking of alerts, Google Alerts is another tool which can be repurposed in a fight against content scraping. By creating Google Alerts for, for example, key phrases featured in your content, the titles of your posts, and your name (if you sign your posts), you ensure that you will be notified when Google indexes new content featuring these phrases. By picking specific phrases you can cut down on the number of alerts and “false positives” and ensure that you are only being notified when appropriate. Google Alerts also allows you to indicate how frequently you wish to receive notifications. You may choose to receive a weekly summary of new content containing your selected phrases or be instantly notified when new content becomes available.

Google Search Authorship is useful for content creators who publish material online on a regular basis. By setting up Google Search Authorship, you will link your content to your Google profile, and should someone steal your content, Google will be able to identify you as the author and prioritize your domain over that of the person who stole your content.

Copyscape, etc.

Copyscape is a free service which can be used to identify duplicate content on the web. Simply copy and paste your URL into the field on the home page and click “Go”. Copyscape will display instances of your content (content available at the URL provided) found elsewhere on the web. A paid version of Copyscape is also available which offers additional tools and features. Plagspotter is another example of similar paid software for detecting content duplication.

Who Is Hosting This

Who Is Hosting This is a website that helps you determine which website hosting services is hosting the website which has stolen your content. Subsequently, you may choose to contact the website hosting service and requests the entire website containing your stolen content, be taken down.

Someone Has Stolen My Content – What Now?

If you discover your website has been targeted by content scrapers, you may want to get in touch with the owner of the website on which your content is republished. Our professionals can also send a letter to the owner of this website demanding that the stolen content be removed from their platform.

Regardless of the avenue you choose to assert your rights, you will need to identify the individual(s) involved. The best place to start is to search the Internet Corporation for Assigned Names and Numbers (ICANN) WHOIS database. Whenever someone registers a domain name, they are required to provide some basic information including a physical address, e-mail address, and phone number. WHOIS is a great place to start your search; however, website owners can elect to remain private in exchange for a fee, or by employing a company to register on their behalf. As a result, WHOIS may not provide all the information you are looking for.

Another way to identify the owner of the offending website is to ask the website host. Most website hosts have pages on their own websites dedicated to reporting abuse or illegal activity by websites they host. The related policies and procedures will vary between services but reporting the offending webpage may help you to identify its owner and/or have the impugned content removed from the web.

A final option is to seek a Norwich order from a court. A Norwich order compels a third party to disclose information you need in order to commence a lawsuit. In determining whether to grant a Norwich order, courts consider a number of factors. These factors include:

  • whether you have provided sufficient evidence to raise a valid claim;
  • whether you have shown that the third party was somehow involved in the wrong;
  • whether that third party is the only practicable source of information;
  • whether that third party could be indemnified should any harm come of the order,
    if granted; and
  • whether the interests of justice favour the disclosure.

Where Can I Sue?

Jurisdiction is a key aspect of any legal proceeding. Jurisdiction refers not only to whether the court in which you initiate a proceeding has the authority to hear and make determinations in your matter, but also whether it should be the court to do so. Disputes related to the Internet can make the question of jurisdiction particularly complex and uncertain because of the nature of the Internet and its reach. You may run into a situation where the individual you want to sue is located in another country. The question, then, is where the lawsuit should be brought.

In a recent case, a Canadian court had the opportunity to consider jurisdiction in the context of a lawsuit related to online defamation. The court held that, while the matter was connected to Canada by virtue of the webpage being accessible in Canada, the matter should be tried in Israel, the country in which the defamation originated. The court found that fairness and efficiency favoured proceedings in Israel because witnesses were located there and the plaintiff had business interests and a reputation there. It remains to be seen how a Canadian court would decide jurisdiction if faced with a content scraping case involving similar jurisdictional issues.

What Remedies Can I Obtain?

The remedies available to you will depend on the successful cause of action. Often, your remedy may consist of monetary compensation (damages) and/or a court order (an injunction).

Damages

Damages are often awarded in copyright infringement cases. The Copyright Act entitles a copyright owner to damages and, if not already accounted for, any profits the infringer made from the infringement. The quantum of damages will depend on the nature of damages deemed appropriate and is often subject to the court’s discretion.

Under the Copyright Act, a plaintiff may also elect to seek statutory damages as an alternative to compensatory damages any time before final judgment. For statutory damages you do not have to prove your losses suffered as a result of the infringement. Rather, if you prove infringement, you are generally entitled to the following: (a) between $500 and $20,000 for each work infringed, if the infringement was for commercial purposes; and (b) between $100 and $5,000 for all works infringed, if the infringement was for non-commercial purposes.

Courts consider various factors in determining an award of statutory damages. These factors include: the good or bad faith of the defendant, the conduct of the parties before and during the proceedings, the need to deter other infringement of the copyright in question and, if the infringement was for non-commercial purposes, the need for an award to be proportionate to the infringement. In special cases, a court can exercise its discretion to award less than $500 for infringement for commercial purposes.

For breach of contract, damages are awarded to put the plaintiff in the position it would have been in had the contract not been breached. In a case of content scraping, these damages may often be difficult to establish, especially if there is minimal evidence provided by the plaintiff as to the injury sustained. However, nominal damages may be awarded where a wrong was committed but there was no established financial loss.

If the infringer’s conduct with respect to either cause of action was particularly egregious, a court may also award additional damages termed “punitive” damages. These damages are so named because they are intended as punishment rather than compensation. Punitive damages are awarded in rare circumstances - calculated, knowing and deliberate conduct on behalf of a defendant may still be insufficient to warrant these damages.

Injunctions

Courts can also compel a defendant to cease its infringement by issuing an injunction, i.e. an order prohibiting the affected party from doing something. In some cases, an injunction is granted to protect the plaintiff from harm that it might otherwise sustain while proceedings are still underway – this is termed an interlocutory injunction. Given that interlocutory injunctions restrain the defendant prior to determination of the case, they are not readily granted. A party seeking an interlocutory injunction must demonstrate that it will suffer irreparable harm, or harm not compensable by damages, if an interlocutory injunction is not granted. As such, permanent injunctions granted upon final disposition are a more common remedy. For further discussion, please see our article on injunctions.

There are active measures you can take to deter content scraping and enforce your rights against infringers – your work has value; let us help you protect it.

If you have been a victim of content scraping, contact us now for a confidential and complimentary phone appointment to discuss how we can help.