Archive

Posts Tagged ‘screen scraper’

Screen Scraper

July 1st, 2009 Comments off

Need someone to do a real time screen scrap of a ebay auction.

If you have experience in doing this contact me for full details.

Screen Scraper For Dating Site

June 23rd, 2009 Comments off

Application to interface with match.com and automate wink clicks and email sending.
Summary:
To create an application that will automate the wink and email functions from match.com. The application will function by capturing the user id’s of web searches, and storing the data in a database. Then each user will be emailed a custom message based on variables captured from the profiles.
For instance if a profile is found with the variable eyes=blue then the sentence

Php Spider / Screen Scraper

April 21st, 2009 Comments off

I need a spider / screen scraper written in PHP which will save results in a MySQL database.

The goal is to find HTML only websites which have been consistently updated in the last 2 years according to archive.org.

First, it needs to display an HTML form which will accept the starting variables such as the search query, prohibited keywords, years to check, update threshold, and any other paramenters.

Once the search query is submitted, it will pull up search engine results (specify which seach engine and if you will use an API in your bid) and retreive the results. It will then visit each link and scan the site to see if the extension of all pages of the site are .htm or .html. It will also test each page against a list of prohibited words and phrases and if it finds any it will skip that site.

If the site is all .htm or .html pages and does not have any of the prohibited keywords, it will then check each page of the site on archive.org to see the change history. On archive.org, an update is noted with a *. The script should check how many updates have been made to each page in the years specified at the beginning.

If the total count of all updates for the site in the years specified is above the specified threshold, it should do a whois lookup and add the following to the database:

Main Site Link (From search engine)
Site Name (From search engine)
Site Description (from search engine)
Number of pages (From site scan)
Number of updates (for all pages)
Years of updates (from initial form entry)
Whois Data
List of links (in html format) to any archive.org pages with updates

Please let me know you have any recommendations for changes to this project.

Time frame is within the next 2 weeks.

Bear