tutor me in Json, Fiddler and anything else relevant to web scraping
$30-5000 USD
Geschlossen
Veröffentlicht vor etwa 13 Jahren
$30-5000 USD
Bezahlt bei Lieferung
Ok, so I want to learn more about web scraping. In particular, I have already formulated two tasks illustrating stuff I ought to learn:
1. parsing this http://www.amazon.com/gp/bestsellers/hpc/ref=sv_hpc_1 sort of a page without using browser control to access the DOM. I want to learn is how to scrape it using a php script or C# app that would work with javascript, send direct requests to the server, use json (if any json is involved) and, well, basically, do anything whatsoever so long as it does not involve the DOM. If you know 10 ways to scrape this page that I am not even aware of, please tell me about that, and you will be that much more likely to get the job :-)
2. downloading a file from an https server (well, sort of like gmail) via a hyperlink. Ok, so I suspect that this task is a lot more complex than the previous one, and here I will definitely need expert help even to properly formulate a task that is actually doable. But here is my reasoning. First of all, I know how to login to an https site using browser control. The app that I want to use for the download will host the browser control and access all of its data. So, maybe it should be able to learn enough about the https session to spoof the browser's file download request? Again, I have poor grasp of this stuff, so if what I just said is totally off, correct me. The underlying reason for trying to do this is to bypass the browser control's various confirmation dialogs on file download - while I am aware of various hacks that avoid the dialog problem in some cases, I want it eliminated completely by doing the download outside of the browser.
Note, btw, that I am fully aware of file download from gmail via POP protocol, and that's not what I am looking for. POP is great, but not all sites support that, and I want to learn how to do it in the broadest possible case.
Whereas I am ok with limiting this second task purely to the browser control case, I would also very much like to learn how to make an https login app/script/whatever that does not rely on browser control on windows. So e.g. if you can run me through setting up some popular PHP script (I think there are a few of those) that can login to https sites, that would be great.
CONTINUED BELOW IN "LEGAL PROTECTIONS"