Website Extraction To Database
September 14th, 2009
We need a script to extract all text, images, and tables from the Electronic Code of Federal Regulations for inclusion into a database format. A database storing html code would be fine.
Example of data to be extracted can be found at
http://ecfr.gpoaccess.gov/cgi/t/text/text-idx?c=ecfr&rgn=div6&view=text&node=29:5.1.1.1.8.1&idno=29
I am open to XML as well.
Right now, we are interested in 29 CFR 1910. We would need to have fields for: 1. Title 2. Volume 3. Part 4. Subpart 5. Section 6. subsection 7. html code

Categories: MS Access, MySQL, Programming Code, Database, database programming mysql, extraction, text, Website, website extraction


