Home > MySQL, PHP > Spider

Spider

October 30th, 2009

I need an extra feature for the sphider.eu script. I want to be able to set it to crawl and index only URLs from a specific TLD. For example: I want it to index only .COM domains. The next day I only want it to index .NET domains. Etcetera.

I also want it to be able to crawl URLs with a specific word in them, for example only those URL’s with the word -net-. It will then index only the domains like for example: internet.com, netvibes.com, etc.

It should be possible to set a pause between URLs to crawl, for example crawl 1 URL, then wait 3 seconds. It should not use too much of my server resources nor from the server it is crawling.

It has to index the site title, description and keywords from the meta tags.


Spider

Categories: MySQL, PHP Tags: , , , , , ,
Comments are closed.
Bear