login
Forgot?

Don't have an account? Register one now!

Login

Rapidminer Ninja wanted / Webscraping using Rapidminer

Bids 
2
Avg Bid
$49 USD
€39 EUR
CLOSED
  • Project ID:

    1457542
  • Project Type:

    Fixed
  • Budget:

    $47-$97 USD
    (Approx. €37-€78 EUR)

Project Description:

** Your knowledge/skills
Mandatory
- You are an experienced user of Rapidminer 5.2
- You have already a previous experience of successful webscraping using Rapidminer 5.2

** Your work habits
Mandatory
- You respect the deadlines (you will proactively report any hurdles)
- You will answer emails within 24 hours
- You will not outsource the job, fully or parts of it

** Your personality
- You don’t hesitate to provide input/ideas that could bring added value to the project
- You are interested in a long term collaboration on further webscraping projects.

** Your task will be
Your mission is to create a webscraping process in Rapidminer where the input is a set of keywords, and the output is a unique Excel spreadsheet (.xls or .xlsx).

- Let’s choose the example of the set of keywords: US “trade balance” (trade balance is between quotes)

- The process will search the 9 following websites for these keywords
http://www.reuters.com
http://www.bloomberg.com
http://www.businessweek.com
http://online.wsj.com
http://www.ft.com
http://www.nytimes.com
http://www.smh.com.au
http://www.guardian.co.uk
http://www.telegraph.co.uk

- For each website, the process will retreive the 3 (default value) most recent articles. This number must be configurable by website, ie. we may configure 5 articles for the NY Times but only 2 for the WSJ.

- The process will save the content of each article (only the article, not the full webpage) in an Excel spreachsheet where the columns are ordered as following:

+ Column 1: publishing date of the article
The format of the date is different on the websites. For example:
On Reuters : Tue Sep 20, 2011 11:40pm EDT
On Bloomberg : Sep 18, 2011 9:00 PM GMT+0200
On Businessweek : August 04, 2011, 4:45 PM EDT
On WSJ : September 27, 2011, 7:30 PM IST
On FT : September 11, 2011 4:24 pm
Etc.
+ Column 2: direct link to the article on the website (the source webpage that has been processed)
+ Column 3: title of the article (without html tags)
+ Column 4: content of the article (without html tags)

- The file “result.xls” will be saved under c:\rapidminer\

** You will deliver
Mandatory
- You will test the process before delivery in order to ensure it works as described
- You will provide the .RMP file.

Skills required:

Web Scraping

Project posted by:

emonnier Switzerland
0.0 (0 Reviews)

Last seen: Apr 30, 2012 10:33 AM CEST

If you are the project creator or one of the bidders, please Log In for more options.


All Bids ()

ggpplusfacebbff
Bangladesh From Bangladesh        Offline
$50 in 1 day 
$25 Milestone Requested
3 months ago
Please check PMB
sjwka89
       Offline
$47 in 1 day 
$24 Milestone Requested
3 months ago
Higher quality & good reputation service always!!! Account Creating section $3 per 1k hotmail account (non verified) $3 per 1k yahoo account Bulk Email sending Service We highly Exprience in Email Sending ... more
Higher quality & good reputation service always!!! Account Creating section $3 per 1k hotmail account (non verified) $3 per 1k yahoo account Bulk Email sending Service We highly Exprience in Email Sending Section so if you like Sent your email then see our price. 10k Email Sent $15 20k Email Sent $25 100k Email Sent $100 Craiglist Email Scrapper All or any country & personal Search Engine Email Scrapper yahoo, gmail, aol serach Engine to find email. Virtual Bank Account Now you easily verified paypal & alertpay Proxy List sell 1000 proxy daily monthly price $30 2000 proxy daily Monthly price $50 4000 proxy daily monthly price $80 10000 Proxy daily monthly price $150 We sent you daily update proxy list in your email. with many more!!! www.bulkservice.webs.com check & see!!!!! less