Asp.net Screen Scrape/parse
I have a main real estate site (on Windows Server with ASP.NET available) with it’s own primary domain. Property searches are handled via a subdomain of the primary domain on a third party IDX (MLS Property search site.)
To have Google pick up the property listings from some pre-defined searches and ‘credit’ them to my domain, as if they were located on the main site, I want to have an .aspx page under the primary domain that will use the ASP.NET webrequest object to:
1. Retrieve a specified URL (which already has the property search parameters in it)
2. Parse out the section of the retrieved HTML that has the property listing data in it. (I’ve already placed <!– Startcomment –> and <!– Endcomment > comments in the html at the beginning and end of the section to pulled out of the body on the URL to be retrieved.
3. Place the parsed html into the appropriate section of the aspx page requesting the URL.
In effect, the aspx page on www.mymaindomain.com will be a duplicate of the page being requested at subdomain.mydomain.com.
The URL to retrieve and parse, e.g. http://subdomain.mydomain.com/newpropertylistings.html should be hard coded in the script code in the head section on the requesting .aspx page, so I can save copies of the page under other page names, e.g. propertypull2.aspx, propertypull3.aspx, etc., and change the URL to pull from that’s in the head section each new page. (I can do that in Dreamweaver.) I want the URL that is to be requested and parsed to be in each page, and not in some ASP.NET project variable to be pulled in to the page.
I do NOT need to store the retrieved html (or associated photos) on the main site (.e.g parse the html into a database). When the requesting aspx page is clicked, it can run the code to retrieve the latest data from the page being requested from the subdomain site. Also, at this point I am not looking for a replacement function for relative links being pulled over. (Most of the links coming over are fully qualified.)
There is no page design involved, just the C# ASP.NET coding on the page to make it work. I expect this to be fairly simple. My webhost is Webhost4life, with ASP.NET 3.5 available. I only need the one page done. I’ll dupe that page as I need, if I want to bring bring new groups of properties over to the main page. I have the aspx page “designed”, ready to plug in the aspx code.



