• GBD videos on vimeo
  • subscribe : rss feed
  • Entry

    Reef Writing Etiquette & Blogonomics: Stop the Scraping

    A growing trend on forums, aquarium club websites, and even the mothership that is MASNA is the unfortunate act of scrapping. Scraping is when you copy and paste an article, or republish an RSS feed that you don’t own, on to another website. For writers who provide content like myself, it’s quite annoying. Imagine if [...]

    A growing trend on forums, aquarium club websites, and even the mothership that is MASNA is the unfortunate act of scrapping. Scraping is when you copy and paste an article, or republish an RSS feed that you don’t own, on to another website. For writers who provide content like myself, it’s quite annoying.

    Imagine if you were a painter, you took the time to create this painting and at the first showing someone took a high resolution photograph and distributed / sold copies to galleries and across the internet without giving you credit or any monetary consideration. While I don’t consider my ramblings on aquaria on par with Picasso, the feeling is the same. Scraping sucks.

    glassbox design

    The original. Social media pro tip: Don’t steal stuff that’s not yours.

    Not only is your work being stolen, but it hurts traffic coming from search engines. Yes, some search engines will penalize websites for duplicate content as it is seen as spam. Meaning GBD may be given a lower search engine result or omitted completely.

    For our long time RSS readers you know that we switched to truncated RSS feeds in 2009 to prevent this behavior. While truncated feeds can be annoying, the amount of scraping and spamming on the site’s launch was absurd. It wasn’t until the always on point Felix Salmon continued to shine light on the positives of full RSS feeds that I caved in and switched to full feed in 2010. [For those unfamiliar with RSS, see here]

    Since then the scrapping has been interesting. The main offenders are no longer spam-indexing website looking to display well on search enginges (such as this one), but it’s coming from our own kind–Reefers!

    Arguably the most prominent scraper of GBD articles is the Marine Aquarium Society of North America. That organization that I support by attending and covering MACNA every year. MASNA never asked permission to use my articles, nor did they with the other writers and bloggers that I’ve talked with.

    UPDATE 6/21: MASNA President Steven Allen has addressed the issue, responding to the comment below and correcting the full reproduction aggregation. As GBD, myself and our contributors do support MASNA and believe in their mission we’re pleased with this resolution and Steven’s swift action.

    masna stealing articles

    Yes, MASNA is a non-profit, however, that does not fully waive them of copyright violations, nor does it trump common decency; like sending a polite e-mail asking to use our content. As of pixel time MASNA has republished 135 articles from GBD without permission. Ironically their RSS scraping program will also publish this very article on their website.

    MASNA’s first stated goal is to “Educate our members with online and published material, the MACNA conference, and other sanctioned events”. It is extremely unfortunate that such a respected organization has set this example of “education”.

    reefchat copyright theft

    MASNA is not the only offender. Smaller forums like ReefChat have done the same.

    Why is this an issue? Etiquitte and search engine issues aside, economically it hurts the sustainability of websites like Glassbox Design who create original content. These sites that scrape, or steal content, gain traffic from the republished works, which in turn creates an asset to sell advertising space. On the flip side, it takes traffic away from  GBD and potential revenue with it. Blogging about aquariums is not a moneymaker, however, without our sponsors advertising and supporting the site, this very article would not be possible.

    Lastly, and most importantly, scraping creates fragmented dialogues which can create misinformation. Readers of websites that scrape are generally not aware that the material is being taken and republished. As was the case with our Vortech Giveaway from April when readers of MASNA published comments thinking they were entered in the GBD contest. [Let me reiterate that MASNA is not the only offender, but they are the most visible in the aquarium industry.]

    Should a reader have questions, these website scrapers do not [and should not] answer for the original author. The best feature of blogs is to facilitate this dialogue and interaction. While scrappers may think they are consolidating information, all they are doing is muddying the interwebz  and creating asymmetric information.

    Forums and websites can take note from the likes of Matt Rogers at 3reef, who asked to published our newest links on their site. In exchange we gladly offered 3reef a banner on GBD. On a similar note, our friends at the Washington Area Marine Aquarium Society (WAMAS) update their homepage with links to our most recent articles here on GBD, rather than illegally reproducing our work.

    On the internet, every click is a vote. Click wisely.

    Related Posts

    1. Get Your Geek On, DeepDyve Launches iTunes Inspired 99 cent Scientific Papers
    2. ScubaBoard $10M Lawsuit Sparks Online Free Speech Debate
    3. Weekend Link Round Up : Reef Blogs & Websites
    4. Reef News | New Deltec Line? GA Invasive Species Exhibit, Got vittatus?, Apex, Barry the Seaworm, and Reef Life Magazine
    5. Let’s Really Go Green, Can We Stop the Hot Potato?
    • http://www.nanoreefblog.com Curvball

      You beat me to this article Eric! Very nicely done – interesting to see how the scrappers react when this appears on their sites.

      I for one dislike blogs that restrict their feeds (unsub's from GBD when you restricted the feed, but signing up again now:D ) – based on that nanoreefblog.com's feed is unrestricted, I want readers to be able to read the info in which style they're comfortable with.

      I normally only ever catch up on all the blogs via RSS – that's what it is there for ;)

      We bloggers need to stand side by side and expose the illegal scrapers. If you must scrape – why not ask for permission 1st before publishing others content and claim it as your own.

    • Jpetry

      I fully believe you guys. I feel that this is be a better way to protect your information. However, I do feel that if you are a blogger and the other site is willing to publish the original author/site you would be happy to give the information even if they don't ask first.

    • http://sgraber.myopenid.com/ Shane Graber

      I’m a big fan of full articles in RSS feeds and I personally find I visit websites more often that publish full article contents in RSS feeds as opposed to those that don’t (we’d do it on Advanced Aquarist, but our articles are just too long).

      That being said, you run into this exact problem where 3rd party websites use plugins for their forum/cms that scrape RSS feeds and post those contents as articles that it looks as though they wrote themselves.

      Three thoughts come to mind on how to fix this via your server:

      1. Outright ban their domain access your RSS feed via .htaccess so that they can’t access your RSS feed at all. This is fairly draconian, but it gets the job done but it does outright remove you from visibility to their website and their readers.

      2. Use an HTTP Redirect (via .htaccess) for offending websites so that they are redirected to a truncated version of your RSS feed so that when they do scrape your RSS feed that they’re only scraping a truncated version. You still get visibility from their website but they no longer can show 100% of your content.

      3. I’d recommend is that you ban hotlinking of images from outside domains again using .htaccess: http://altlab.com/htaccess_tutorial.html Your full RSS feed may suffer the problem of no longer showing photos but your text would still show up. You could restrict the hotlinking restrictions to specific domains that way your RSS feed would show images for everyone but offending websites. Google around on this a bit as there’s a lot of info on how to do it. Make sure that you allow the various robots (googlebot, etc) access though as we’ve found we get a lot of traffic to our site via Google Image searches.

      In looking at the MASNA website, it looks pretty plain that they’re using RSS feeds to harvest content. Your feed is a full feed and it shows up their as a complete article. Reefbuilders recently switched to truncated feeds and their articles are showing up as truncated posts. That’s what I’m basing my assumption on.

      One big gotcha is that you’re using Feedburner for all of your RSS needs. I’m not all that familiar with what you can do with Feedburner, but check around and see if you can ban/redirect for specific domains. The image restrictions via .htaccess will work regardless though.

      Have you contacted MASNA and ReefChat and asked them to stop posting your articles as their own?

    • http://www.facebook.com/people/Adrian-Thiessen/89904241 Adrian Thiessen

      I always wondered what you guys think about that as I see your articles all over the web.

      This is a good read for anyone. Especially those with blogs.

      Perhaps the standards of RSS should change to make it very apparent where the original content is coming from.

      Unfortunately it is the internet and everything is somewhere else for free at all times.

    • http://sgraber.myopenid.com/ Shane Graber

      re: apparent where the original content is coming from

      That's already in the RSS feed:

      <rss>
      <channel>
      <title>glassbox-design.com</title>

      <link>http://glassbox-design.com</link>
      <description>the modern reef blog</description>
      <lastBuildDate>Tue, 22 Jun 2010 13:04:35 +0000</lastBuildDate>
      <generator>http://wordpress.org/?v=2.9.2</generator>
      <language>en</language>
      <sy:updatePeriod>hourly</sy:updatePeriod>
      <sy:updateFrequency>1</sy:updateFrequency>

      </channel>
      </rss>

      What it boils down to is how the plugin for the forum/cms is written and how the site administrator decides the plugin should function.

      Outside of restricting access via .htaccess, contacting the offending websites or even their hosting providers, or using a cease/desist letter, there's not much else you can do.

    • Tkable3

      Terrible standard to set from MASNA. As a member of their organization they have a complaint letter coming.

      Tom

    • http://www.nanoreefblog.com Curvball

      Interesting points there Shane, thanks for sharing that. Being more of a designer, the 'tech' stuff does get passed me at times, but this info is valuable to all bloggers.

      I like the redirect idea… you could have some interesting content show up on their pages, ha ha.

    • Joost
    • http://stonyreef.com LH

      Unacceptable. Can we really not hammer out a single paragraph summarizing the article, why we liked it, w/ a link to the original? Just plain lazy.

      Also, re: typo – should be “scraping content” not “scrapping”.

    • danger

      Have you personally asked these sites to stop publishing your content Eric?

    • Info

      Its pretty much the same as when chinese companies copy Protein skimmers from Deltec/ bubble king etc, You are happy to promote the knock offs, No Etiquette in that.

    • http://glassbox-design.com/ eric michael

      @Shane,

      I appreciate the thorough response. I am aware of the restrictions you've mentioned, with the .htaccess being the only real 'solution'.

      I should have prefaced the article with my intentions. I have not yet contacted MASNA, ReefChat or the other offenders. My intention was to evaluate the responses of the GBD readers and community. It is now quite clear, and I will be following up with them on the matter.

      MASNA is now apparently well aware of the issue. They have since removed 38 articles and are now only providing links to our content. Note that old links to still show the full unauthorized reproduction.

    • http://glassbox-design.com/ eric michael

      @”Info”

      A slippery slope. GBD has implicit copyrights on the text and images, unless sourced with permission that cover and protect our rights / interests.

      For products, additional protection can be gained through a patent.

      Additionally, as you are in the industry please be cognizant of disclosing your interests and affiliations to those reading. Please do not use this comment space to push products or agendas.

      Thank you.

    • Steve Allen

      Eric,
      I wanted to apologize for the full articles being published. That was never the intent. It was also never the intent of MASNA to look as though they authored the articles. This is the reason each post shows the originators domain and the “Read the rest of this article” link sent the readers to the original article.

      The feed template being used was supposed to post summary content with the links back. On the sites that had only summary information this worked as designed. I did not even realize that the full article was being published to our site for GBD. This is not an excuse, it is simply an explanation.

      It was announced last year at MACNA that we were providing this page as a aggregate of news articles regarding the reef industry and hobby. I have asked several times though out the year for input on feeds to include. This is why your site was included on the list. As we are limited in staff and resources, the rss feeds on reef keeping news from around the world is the best way we can provide news to hobbyist, while also supporting the efforts of sites like yours.

      To be honest if it were not for Joost, I would not have even seen this article and known it was a problem. Since his post on our site, the template has been corrected and only summary information is listed with links back to the originator.

      If you still wish for me to remove the summary links to GBD, please feel free to contact me at the email below.

      Steven Allen
      MASNA President

    • http://glassbox-design.com/ eric michael

      Steven,

      I appreciate you taking the time to respond and quickly modify the aggregation format. We're more than happy to work with MASNA.

      Best,
      Eric

    • Brendon Cameron

      My suggestion would be to consider speaking privately to these sites before posting publicly….

    • http://www.nanoreefblog.com Curvball

      Why? People shouldn't be stealing content to start off with – it's just bad ethics.

      Good to see MASNA are on top of this now.

    • http://blog.captive-aquatics.com Michael Maddox

      Very good post Eric, and I agree with your statement, Curvball. The biggest offender I've found is tiny blogs like aquanerd, whom my lawyer had to speak with regarding outright theft of images and content!

      It's nice of Steve to have fixed the issues. Way to go!

    • http://blog.aquanerd.com AquaNerd

      lol @ mike maddox. you've got to be kidding me!

      if anyone wants to hear the story (and it's a long one), read on below. i would like to apologize in advance to eric and the rest of the glassbox crew. not only did mike air dirty laundry that is inaccurate, but i'm also apologizing for my response. please remove my comment if it's a distraction, but it might make for some entertainment. i doubt it, but you never know.

      mike and i are from the same area geographically. he and i both went to the same school (texas a & m galveston). we are also both members of a local reef club, where i saw him promoting his website.somehow, i was offered a chance to write for his blog. i wrote one article for him and found out how much i liked blogging. so, i started up my own free blog using google's blogger platform.

      being new to the whole blogging thing, i was using google images and images from other sites, and i wasn't completely informed of copyright law. to clarify, i wasn't using anyone's textual content, just images related to what i was writing about. additionally, i was not using anyone's rss/atom feeds to publish content on my site, which is what this gbd article is focusing on.

      back to the story…i was just blogging for fun, which i'm still doing to this day, so i didn't think much of image use or copyrights anyways. i had a handful of images from random places on the web filling up space in various articles just so people wouldn't see piles and piles of words. i admit i was doing something wrong, but had no idea what i was doing was wrong.

      a short time into my blogging experience, i get an email from mike. the email of course was a scathing one. this threw me off because i thought he and i were friends to some extent and i thought he should have handled it differently. i mean we went to the same school and he even bought various equipment from me at one point. i apologized to mike for using a small handful of his images, immediately removed all of the images he requested or had ownership of, and i apologized again. he then comes back at me and tries to say that my free hosted, free template-based website looked too much like his. that's when i resisted and started pushing back. the site i was using looking nothing like his (different color scheme, different layout, and hell even a different blogging platform). i told him that if he had an issue he needed to contact his attorney. that's the last i heard of him. his attorney never emailed me, unless for some reason the lawyer thought it would be more professional to use mike's personal email account.

      several months later i moved to a different blog platform (wordpress). then i started getting random negative comments on my blog under random names. i tracked the ip address and matched it to his screen name from our local forum, and immediately knew all those derogatory and childish comments came from him. mike made a handful of comments telling me how whatever idea i was talking about was stupid, or how wrong i was, or whatever. i just deleted the comments and moved on. i didn't retaliate. i didn't blast him publicly. i just walked away.

      then, i see this comment above. i thought about sitting quietly by, but it's time to take some stand. i admitted fault when i used images from his site. i apologized and dropped it. but he never moved on. and now he's calling aquanerd a tiny site. this is extremely laughable, since every traffic site that displays information about site traffic other than my own (alexa for example), puts my traffic flow higher than his. i'll admit, i'm not on the same level that glassbox design is, but i get quite a bit of traffic in my opinion. oh well, i guess mike just got frustrated that aquanerd is doing very well and has decided to try to tear me down.

      regarding the content on my site…i use digital media that i have either created, have permission to use, or are part of the public domain. there was one incident in the past, described above, in which i was unaware of copyright policy and used images i shouldn't have used. they were removed once the issue was brought to my attention and there have been no issues since.

      mike, if you would like to talk about this further, you know my email address.

    • http://www.captive-aquatics.com Mike Maddox

      Looks like we’ve misundersood each other, Brandon. I admit I did react in a knee-jerk manner when I saw that my content had been used without my permission. I apologize for mentioning it publicly, and request that Eric delete my previous comment.

      I have taken you up on your offer of reconciliation, and sent you an email. :)

    • http://www.captive-aquatics.com Mike Maddox

      Looks like we've misundersood each other, Brandon. I admit I did react in a knee-jerk manner when I saw that my content had been used without my permission. I apologize for mentioning it publicly, and request that Eric delete my previous comment.

      I have taken you up on your offer of reconciliation, and sent you an email. :)

    blog comments powered by Disqus