Archive

Posts Tagged ‘webcrawler’

Webcrawler That Returns WordPress Xml

November 18th, 2011 Comments off

I want a piece of software that,

Give it the URL of any web site, crawls the web site and creates a valid WordPress eXtended RSS (WXR) file. This way any web site can be converted to a WordPress site easily.

This should run on Windows, but could be an AIR app or Java or anything cross-platform.

Webcrawler : Screen Scraping

October 27th, 2009 Comments off

Need a python webcrawler (screen scraping) for hotel reviews on Priceline.com.
The results are to be stored in two text files per city, using a predefined format.
The crawler can take a start city URL and a cutoff date.
The bid winner will also run the crawler to collect all data to confirm that it works correctly.
The format of the files (two of them) will be provided to the bid winner.

Webcrawler – Screen Scraping

October 26th, 2009 Comments off

Need a python webcrawler (screen scraping) for hotel data on Orbitz.com.
The results are to be stored in two text files per city, using a predefined format.
The crawler can take a start city URL and a date.
The bid winner will also run the crawler to collect the data to confirm that it works correctly.
The format of the files (two of them) will be provided to the bid winner.

Webcrawler Run

October 22nd, 2009 Comments off

I need someone with Python skills that can modify and run an existing Python web crawler for Yahoo!Travel, which extracts content from web pages and stores them in two text files using a predefined format. Yahoo blocks the calling IP , so this may involve IP hiding.
The script is written in Python. If things go well, I will need more Python scripts for web scraping for other sites, using the same file format.
More details to the developer that wins the bid.

Automated Webcrawler Spider

September 29th, 2009 Comments off

I need an automated Webcrawler for dynamic Websides than get information like.

name,
address,
Email,
age,
phone,
service of the personas (203 possible services),
pictures (between 1-30 pictures each person),

I have a list, see the .xls file. The content will come form 5 special Websites. See the .pdf files. The crawler must open for more different Website with similar content in the future.

The Webcrawler has to run on an Windows PC with XP-Software. The result have to be an .csv file. See the example .xls file.

As the result of your Work I expact an .exe file with the ready run configured software for the 5 Websides.
And an .csv file with the result from your test run.
The target is to crawl out 90% of the existing content.

To be sure that you are read the discription answer the follwoing Question.
How many data sets I like to copy? Look at the XLS file.
What kind of content is it? Look at the 5 PDF

Automated Webcrawler Spider

September 23rd, 2009 Comments off

I need an automated Webcrawler for dynamic Websides than get information like.

name,
address,
Email,
age,
phone,
service of the personas (203 possible services),
pictures (between 1-30 pictures each person),

I have a list, see the .xls file. The content will come form 5 special Websites. See the .pdf files. The crawler must open for more different Website with similar content in the future.

The Webcrawler has to run on an Windows PC with XP-Software. The result have to be an .csv file. See the example .xls file.

As the result of your Work I expact an .exe file with the ready run configured software for the 5 Websides.
And an .csv file with the result from your test run.
The target is to crawl out 90% of the existing content.

To be sure that you are read the discription answer the follwoing Question. How many data sets I like to copy? Look at the XLS file cell A3.

Torsten

Webcrawler (java Or Whatever)

September 5th, 2009 Comments off

Hi,

First: I need a webcrawler, but it’s not necessary that it is written in java, I only want to use it in windows and I want that the crawler is really fast, really really fast!

I have a list with needed features (I even have a sketch of the wished surface):

- The crawler should extract all links from a website
-> Needed specification: If I want, the crawler only should extract links which include a parameter, given by a textfield (if I write “web” in the field, it only should extract links including the word “web”)
-> Needed specification: If I want, the crawler only should extract intern links (on this site, no outter links)

- If I want the crawler should extract the text of the html tags (activated by a checkbox)
-> Checkbox: title (extracts the text of the titel tag)
-> Checkbox: description (extracts the text of the meta tag description)
-> Checkbox: keywords (extracts the text of the meta tag description)
-> Checkbox: Body (extracts all text between the body tags)
-> Checkbox: Body only text (extracts ONLY text, all html tags will be stripped)
-> Checkbox: H1 Tag
-> Checkbox: H2 Tag
-> Checkbox: H3 Tag
-> Checkbox: b (for bold) tag

- The crawler should get the links by:
-> A text field in the crawler surface
-> or by a txt file
- The crawler should write the extracted links and maybe the tags and so on (like title,description,keywords,h1….) in:
-> A txt file
-> or a html file

- I want to set the linkdepth by a textfield
- I want to add own html tags for extraction (besides h1,h2,h3,title,description) like: p,tr,div and so on so it should extract the text of this tags – I want to add this tags manually by a textfield
- I want a statistic in the crawler with:
-> Runtime (how long does the crawler runs since I’ve presse dthe start button?)
-> URLs indexed
-> how many sites does the crawler index per second?

- The crawler even should:
-> read the status codes of a website (like 400,404,200….)

I want two more features but I don’t want to descripte them now – more if you will get the job.

I hope someone can help me!

Java Webcrawler

August 25th, 2009 Comments off

I need an application which can

1. Receive query term

2. Locate relevant wikipedia page on the query term

3. Extract URLs from corresponding wikipedia page

4. Visit each of these pages and shows the title and opening paragraph of each of the extracted URLs
4b. Extracts URLs from visited pages (based on the location of the link and the anchor term – details will be given later)

5. Opens a browser when any of the links is clicked and displays the page

Webcrawler To Db Transfer

August 19th, 2009 Comments off

This project is a transfer of webcrawler data into a single DB with 7 tables.

Java Webcrawler

August 2nd, 2009 Comments off

I need an application which can

1. Receive query term

2. Locate relevant wikipedia page on the query term

3. Extract URLs from corresponding wikipedia page

4. Visit each of these pages and shows the title and opening paragraph of each of the extracted URLs
4b. Extracts URLs from visited pages (based on the location of the link and the anchor term – details will be given later)

5. Opens a browser when any of the links is clicked and displays the page

Webcrawler

March 16th, 2009 No comments

Requirements: PHP, HTML, regular expressions a plus.

Description: A php program or class library to read an html page, extract data from within the html and return the data in a data structure. The data itself will consist of strings, numbers, or times. There will be 3 different html pages, each with their own specific data which will need to be extracted.

Categories: MySQL, PHP Tags: , , ,
Bear