is it possible to grab addresses from the following website:
g o u d e n g i d s . b e (g o l d e n p a g e s . b e)?
Click “Select a category” (you may have to switch between Dutch and English, and simple and advanced search).
I need the addresses from the following categories:
category: total:
Advocaten 19447
Apothekers 5690
Architecten 9632
Artsen 26782
Bandagisten 462
Boekhouders & fiscalisten 8053
Dierenartsen 4014
Gerechtsdeurwaarders 507
Interieurarchitecten 1794
Kinesitherapeuten 14771
Kunstschilders 744
Logopedisten 2496
Notarissen 1381
Osteopaten 1778
Podologen 1029
Psychologen 1863
Tandartsen 7932
Tuin- & landschapsarchitecten 1416
Vertalers 2312
total: 112103
The format is as follows:
Christoffels AdvocatenKoning Albertlaan 53 3620 Lanaken
TelefoonKaart & RouteNaar Mijn Gids
089 71 57 84 089 71 71 27
Website
E-mail
Deskundig raadgevers in alle rechtzaken sinds 1976
Rubriek: Advocaten
On the Dutch website it will say Rubriek, on the English website it will say Category.
I need an Excel file with the following fields:
Name Address Zip City Phone 1 Phone 2 Website E-mail Rubriek Description
Christoffels Advocaten Koning Albertlaan 53 3620 Lanaken 089 71 57 84 089 71 71 27 http://www.christoffels-law.be ludo(at)christoffels-law.be Advocaten Deskundig raadgevers in alle rechtzaken sinds 1976
Witters Patrick Paalsesteenweg 296 Bus 1 3583 Paal (Beringen)
Note:
* sometimes there is additional data in the address field such as “Bus 1″:
Witters PatrickPaalsesteenweg 296 Bus 1 3583 Paal (Beringen)
This belongs to the Address, not to the ZIP field.
* the zip field is always 4 number in Belgium, yuou can download a list with all Belgian zipcodes here:
http://www.post.be/site/nl/residential/customerservice/search/postal_codes.html
* the Phone 1 and Phone 2 fields seem to be encoded. Is it possible to decode them?
If it is really not possible we can just skip them, please let me know.
* there is no rush, if you grab an address I want your script to insert a random pause before grabbing the next address
For example: grab an address, wait 3 seconds, grab the next address, wait 5 seconds, grab another address, wait 4 seconds, etc.
What I need from you is an Excel file as per the above example with approximately 112103 addresses.
Is it possible?