Website Crawler and Software Identifier

Storniert Veröffentlicht Sep 1, 2010 Bezahlt bei Lieferung
Storniert Bezahlt bei Lieferung

I need a solution that will crawl the web identifying sites that use the open source software Joomla, Wordpress, Drupal, Mambo, Alfresco, and Plone.

I would want it to create line in a mysql database with:

-the url of the site,

-what software it's using,

-what version of the software

-the title tag

-the description meta tag

-the keywords meta tag

We would want to point the program at a directory (like [url removed, login to view]) to start. Then, have it find more URLs as it goes. So, the solution needs to find the websites on its own just like how a search engine does it.

NOTE: YOU must be the one to define how the tool knows the software and the version that the website is using for these 6 packages: Joomla, Wordpress, Drupal, Mambo, Alfresco, and Plone.

The solution must run on Linux or Mac OS X and MySQL. NO WINDOWS. We'd prefer something written in PHP 5.

The logic goes like this:

1. Go to list of sites to be crawled at future time and take the next one on the list.

2. Go to URL.

3. Does this site run one of the 6 open source software? If yes, go on to step 3. If not, go to step 5.

4. What version?

5. Write all info about website to database.

6. Are there any links from this website to other sites? If yes, go to step 6. If not, go to step 7.

7. Write links to list of sites to be crawled at a future time.

8. Go to Step 1.

If the list is empty, go to [url removed, login to view] and crawl around until you find some more links.

OR, offer me a better solution!

I know that this is a solvable problem and that lots of people have created web crawlers before. I'm hoping that there is an out of the box open-source solution somewhere that we can just tweak so it wouldn't take very long to get this running.

Bonus feature: How much extra would it cost to also grab the contact information if that's easily available on the website? (Domain name registration information is NOT acceptable. It has to be the published contact-us phone number from the website in question.)

JavaScript MySQL PHP Script Install Web Scraping

Projekt-ID: #784424

Über das Projekt

11 Vorschläge Remote Projekt Aktiv Sep 20, 2010

11 Freelancer bieten im Durchschnitt $600 für diesen Job

SigmaVisual

We can help in your project, please check PMB and our ratings/reviews to get idea of our experience.

$750 USD in 20 Tagen
(280 Bewertungen)
8.1
MuktoSoftware

Please check inbox. Thanks.

$750 USD in 14 Tagen
(422 Bewertungen)
7.4
dsendra

hi. what you need is a crawler. We are highly experienced on it, we specialize in data-mining software extracting thousand e-mails, addresses, and other information from several sites, yellow pages, google, craigslist, Mehr

$590 USD in 20 Tagen
(32 Bewertungen)
5.1
ankitfrenz

please refer pmb

$750 USD in 12 Tagen
(8 Bewertungen)
5.1
orion300

I can do this job.

$350 USD in 2 Tagen
(14 Bewertungen)
4.3
artpar001

Hi, hire me and i will make you a fully flexible PHP solution based on the requirnments. See PM for more info.

$500 USD in 3 Tagen
(2 Bewertungen)
2.7
klycoder

Dear Superior5, please check PM

$750 USD in 25 Tagen
(0 Bewertungen)
0.0
dipti123

-- Sir please check your PM --

$430 USD in 6 Tagen
(0 Bewertungen)
0.0
softcoder00

Please check PMB

$750 USD in 15 Tagen
(0 Bewertungen)
0.0
zongwonil

Hr. Our Team have been developed WEB site crawler programs. Now Our team's be come to seven programmer. Develope Tools:VC,PHP,JSP,ASP,IDA,JAVA Decompiler ETC. I want success of your project. Thank you! My Skyp Mehr

$550 USD in 5 Tagen
(0 Bewertungen)
0.0
goplan

Please check your PM

$500 USD in 7 Tagen
(0 Bewertungen)
0.0
css1

i have completed this project in specified time period

$280 USD in 7 Tagen
(0 Bewertungen)
0.0