Create Account


Subscribe to my feed

Transcraping - Translation Scraping

Why settle for just English language content?

I want to introduce a new topic to you guys: Translation Scraping. Now a day's you see lots of scraper sites that scrape RSS feeds and republish content in adsense laden sites. Well that's all well and good, but clearly, we have other tools in our arsenal to monetize scraper splogs.....we have the ability to translate on the fly. 

 

Consider this: A simple script that takes a keyword, does a google blog search for that keyword, collects all the urls that come up as a match, passes that URL to an online translator, and then posts the translated content to a blog via xml-rpc.

 

I mean, why not? If you are going to scrape sites in the same language, you might as well cover your bases and give'er in other languages too! Come on, show a little multiculturalism for christ's sake...........

 

Here is a little example I hacked up using a post from my good friend Eli over at Blue Hat Seo. From my experience, I happen to know that scraped splogs his content convert really well with the russian market........( sorry buddy :P )
This is programmed in ruby and uses mechanize and the xml-rpc library



require 'xmlrpc/client'
module MetaWebLogAPI
  class Client
    def initialize(server, urlPath, blogid, username, password)
      @client = XMLRPC::Client.new(server, urlPath)
      @blogid = 1
      @username = "bingobango"
      @password = "password"
    end
  
    def newPost(content, publish)
      @client.call('metaWeblog.newPost', @blogid, @username,
          @password, content, publish)
    end

  end
end

require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = "Mac Safari"
agent.set_proxy('localhost', '8118')

@source = "http://www.bluehatseo.com/followup-seo-empire-part-1/"
@url = "http://www.online-translator.com/url/tran_url.asp?lang=en&url=#@source&direction=er&template=General&cp1=NO&cp2=NO&autotranslate=on&psubmit2.x=40&psubmit2.y=7"

doc = agent.get @url
title = doc.search("p.post-info").inner_text
guts = doc.search("div.post-content").inner_text

 client = MetaWebLogAPI::Client.new('bingobango.wordpress.com', '/xmlrpc.php', 'bingobango', 'bingobango', 'password')
  blogpost = {'title' => title, 'description' => guts, } 
 client.newPost(blogpost, true) 



And you can stroll on over to http://bingobango.wordpress.com/ to see the results of our handiwork.

 

The really cool thing about this, is you can create a spider that automates these procedures indefinately.....so create a script that monitors a group of keywords, and create a few blogs (depending on the size of the keyword niche, and how much content you are dealing with) and have your spider automatically translate and post new content as it comes in.

 

 

--Rob



Back
Comments:
Name: The Tools Guy
Website: http://tools.aosp.com
Comment: now to convert it to asp.net
Name: Rob
Comment: Good luck man! If you convert it and feel like posting up your script for others to see, feel free to leave it here in the comments!
Name: Jeff
Website: http://www.adeptmarketingconcepts.com
Comment: My mind is racing with ideas here. It's time to "recycle" some content making it fresh and clean for Google ;)
Name: Le secret
Website: http://blog.royalfx.ru
Comment: Good luck man!
Add a comment:
Name

Website

Comment