I would like a Python script written using Scrapy that scrapes every post on [login to view URL] and parses the contents into a JSON file that matches this structure for each post:
{
'post_type' : "blog_post",
'url': '[login to view URL]',
'post_author_twitter1': '@johnbiggs',
'post_author1': 'John Biggs',
'post_author_twitter2': '',
'post_author2': '',
'post_date': '2007-06-21',
'post_subject': 'Writers Write "B-Logs," Get Money',
'post_content': 'USA Today, that bastion of hard news, is covering a new fad popular....',
}
Some posts have multiple authors perhaps with matching twitter profiles that need to be parsed into individual fields.
Hello! Although I am new to Freelancer.com, I am an experienced programmer/web scraper with a Master's degree in Computer Science. I can create the blog-to-JSON scraper you have requested. I have created similar web scraping software in the past using Python (which I would recommend using for the third party libraries such as Scrapy, BeautifulSoup and Mechanize), and will gladly provide code and previously scraped data for an example. Thank you for your consideration, and I hope to work with you soon.
I am a Python/scrapy expert, and also interested in your project, Please contact me to discuss more details, Thanks,
################################################################################################################################
Hi. I'm an experienced Python programmer and have experience with Scrapy. I am interested in taking up this job. We can discuss further details on chat. Thanks.
This is Nitin having HUGE experience in scraping HUGE data in least amount of time.
I code in php, python and perl, and scrapers written by me are being used to scrape more than 30 million pages per day without being blocked.
I would like to help you in getting all the data you are looking for.
Please pm me in case you find my bid suitable.
And don't forget to check my reviews here :
http://www.freelancer.com/users/1303125.html
Cheers,
Nitin
Hi Sir,
I have developed more than 70 scrapers using scrapy and node.js.
For multiple authors it would be better to use another format.
....
'authors' :[ {'post_author_twitter': '...', 'post_author': '...'}, {....}],
....
This format will work out of box.
If you still want such format I can create new exporter which will convert to your desired format.
Regards
Ilshat
Hello sir,
I have experience of the implementing scrappers of different types of content in Python.
**How can I help you?**
Firstly, as soon as techcrunch supports RSS, I will fetch urls and titles from RSS feed.
Secondly, using Python requests library, I'll fetch content of article and authors. It's easy to do using BeautifulSoap library.
At the end I will make JSON file using standard Python's library.
You just should answer for a few questions:
1) An article may contain images or some kind of formatting. Do you want to save text only?
2) How much last articles should the script fetch?
When I receive answer for that questions, I can start working on grabber.
Best,
Vyacheslav
Hello,
Can your json structure be adjusted in any way? We could use a json array for the authors if there are more authors. If structure can't be changed, that's fine. Also, do I need to use Scrapy? That's ok too but I completed similar projects before without using this framework.
Thanks,
Bogdan
Hi.
I checked TechCrunch and it's seems quite possible to scrape all their blog posts. Their search can be used for listing all blog posts (there are less than 10 000 posts in total) and the rest from there is piece of cake.
This task shouldn't be very difficult as I have scraped data successfully from websites with over 100 000 pages.
Project shouldn't take long, but to be safe, I marked that it will take 6 days. It will be probably done in 2 days.
Waiting for you response so I could start working already.
Hello there,
thank you for this opportunity, I really interested in this Scrapy job.
I've just placed my initial bid. If you are serious, maybe I can provide you with some demo. Please reply if you are interested too :)
Regards,
Dolek