Sunday, April 18, 2010

Online plagiarism... it can only get worse

As more and more people go online to look for content, more advertisers are also going online to capture their attention, and that means more advertising dollars, and more people competing for those dollars.

One way to get them is write articles and such for Hubpages or Squidoo, which are similar to blogs, but better organized, with more of a community. However, even in such communities, you will find content pirates who steal content from others, hoping for some of that advertising income.

And there are even auto-rewriters to "fix up" the pirated material so a simple plagiarism scanner won't catch it.

How to spot plagiarists, here or elsewhere

How I found a plagiarist here on Hubpages

Just the other day, I was doing hub hopping, when I came across a printer review. I recognized the model, as the model is several years old, so I am surprised to see a review just appearing hours before. I was only going to rate up/down, but somehow the wording doesn't sound right for normal user's review. The printer is no longer made, yet the review sounds as if the printer was brand new. Furthermore, in a personal review, the reviewer talks about his or her experience with the printer, what s/he likes and dislikes. Yet this review doesn't say that. It actually compared the printer to two other printers, and even provided timed comparison for a standard test! (This printer took _______ long to complete the test, while printer B took ______ long...)
This sounds so suspicious, I decided to see if I can dig up some more information. I took some sentences randomly from the hub, put them in quotes, and plug them into Google. No match.
I then took the three brands of printer mentioned, plugged them into Google, plus "review", and voila, found a PC Magazine article review, that referred to the exact same three printers.,2817,1970425,00.asp
I then started to compare a random paragraph... And found 90% match. Here's an excerpt from the review:
"On our business applications suite (timed with QualityLogic's hardware and software), the 1018's total time was 11 minutes 28 seconds, compared with roughly 8:50 for the Lexmark and Samsung printers. That's enough of a difference to be noticeable, but I wouldn't call it intolerable, and I'm pretty impatient about waiting for printers...."
Whereas the alleged plagiarist has a matching paragraph:
"On the commercial software suite (monitored with proper components and computer software), the 1018's overall time frame is eleven minutes twenty eight seconds, when compared with around 8:50 for a Lexmark and also Samsung printers. Which is enough of a variance to be notable, and fairly tolerable, in case you are concerned about printing speed. "
See the trick? major words have been substituted, but the sentence is virtually the same, and some details were taken out, but the sentences are a one-to-one match. Even the timed figures, 11:28, and 8:50 are exactly the same. Though the plagiarist substituted words for numbers, hoping to throw off exact-quote searches.
A quick check of the remainder confirms the suspicion: the entire hub was copied from the review, but reworded slightly.
The hub has been flagged for review. I've left a comment to give plagiarist notice, but the comment was denied, so it is clear that the plagiarist chose to ignore the warning.
UPDATE: the "offending" hub is no longer there. Apparently Hubpages takes this sort of plagiarism seriously.

Why plagiarize at all here?

First, I was puzzled. Hubpages is not school or work. I can understand if there is time pressure to produce some sort of work or school, so why would someone plagiarize an article just to pump up their hub count? Then I realized why: adsense earnings.
It is perfectly possible to earn month by writing hubs, and make money from adsense. There are several articles over at TheKeywordAcademy that explains how to. As this is not about that, I am not linking to it, but you can find it on Google easily. The point is, if you target the right keywords, you can make money through Hubpages and adsense. Due to Hubpage's size, a hub ranks higher than the same article published on a blog or such.
However, you need content, and the quickest way to put content on Hubpages is to steal them.
Yet, direct stealing is out, because there are plagiarism scanners out there. Yet most are only subject to random sentence searches, and computers are not very good in matching up stuff if you substitute a lot of stuff. Yet a person doing this is just, well, doh. So, yes, there is software to help with this sort of cheating. It's known as a "rewriter". I am NOT going to link to it, but they are out there. If you feed an existing article to it, it will pull information from the web, substitute exact words with generic words, replace words with synonyms, and other tricks, to produce an article that still reads okay, but will likely NOT trigger a plagiarism scanner, because it is not a straight copy and paste.
The hub referenced above is likely the result of a rewriter. 

So how *do* you spot a plagiarist?

Most of time, it would just be a "gut instinct", depending on your field of expertise.
I happen to be an IT expert. I know computers inside out, and most operating systems and peripherals. I also subscribed to PC magazine for many years. I also contribute many hubs, and write several blogs. Thus, I know what a personal hub should should sound like, and a professional review should sound like. The hub above sounded completely wrong. It wasn't something I can put a finger on first, but the more I read it, the more I realized it is NOT a personal review. A personal review would NOT have comparison with other printers, esp. timed, as well as other details.
This lead to further research, and discovery of the source.
Basically, the offense here is TOO MUCH detail, and the WRONG TYPES of details.
If you suspect that whatever you are reading is an auto-rewrite, you need to pick keywords from the article for research. The trick is picking the right ones. A rewriter is designed to defeat exact quote searches, but they cannot substitute keywords without destroying the article. So you have to use that against them. 
If it is a printer review, put in the brand name, plus "printer review".If you find the proper nouns, use them. I found "Lexmark", "Samsung", and "HP", so I put in those, plus "printer review", and out came the result I was looking for. You can do something similar for your own research.
The numbers are also a give-away, as would any details, like address, phone number, date reference, location names, and so on. Those cannot be changed without messing up the article.
Once you found the possible "donor", you then randomly pick a paragraph and commence matching manually.


Hubpages is a great little community that can be a good source of side income if you produce quality hubs. However, it can be ruined by some unscrupulous hub'ers who wish to pollute our community with their stolen goods.
If you run across a hub you believe to be pirated, or result of auto-rewrite, you should flag it, along with the suspected "donor article", and send it to hubpages admin for review. If you are not sure, post the hub in the forums and perhaps we can study it together.
Together, we can keep the hubs clean of pirates.
Enhanced by Zemanta

No comments: