Home > Programming > Extract Names And Emails

Extract Names And Emails

November 13th, 2009

I need a list of names, titles, emails and organizations from all project pages at http://projectreporter.nih.gov.

Job will require writing a script that can access each project page at this site, extract this information, compile into a spreadsheet and delete redundant entries.

For an example of the information that is needed:

1) Go to: http://projectreporter.nih.gov
2) Using defaults (make sure “Current Projects is selected under “Fiscal Year”, click “Submit Query”)
3) On 11/13/09, this query returned 114,404 results.
4) First entry was project number: 1DP2OD001500-01
5) Click on project number – brings up Project Information page
6) For this project (1DP2OD001500-01) it lists under “Contact PI Information”
Name: AAGAARD-TILLERY, KJERSTI MARIE
Email: aagaardt(at)bcm.tmc.edu
Title: ASSISTANT PROFESSOR
Organization: BAYLOR COLLEGE OF MEDICINE, 1 BAYLOR PLAZA, HOUSTON, TX 77030-3498

For each of the 114,404 projects identified with the query, I need this information. Project must deliver a spreadsheet with Last Name, First Name, Email, Title and Organization columns for all 114,404 projects. All redundant entries should be deleted. The final results can be split into two spreadsheets if it exceed the maximum rows.


Extract Names And Emails

Comments are closed.
Bear