Archive

Posts Tagged ‘the program would’

Copyright Infringement Script

July 20th, 2009 Comments off

I would like a program/script written in PHP and/or Java (or ANY other language) that will help someone locate instances of copyright infringement on the Internet of documents that the author has written.

This project would be divided into THREE PHASES.

In PHASE ONE of this project, you would develop the core program code. The program would do the following:

1. Search through a specified folder (on a web server’s hard drive) for any Microsoft Word, RTF, PDF, or plain text documents that may exist in that folder (and, optionally, any sub-folders). The software would then take a user-defined number of random samples of contiguous text (i.e., a user-defined number consecutive words) for EACH document in that folder/sub-folder).
a. The user should have the ability to specify the amount of “distance” (measured in words) between the samples that are taken, i.e. the user can specify that the samples are to be taken every 250 words, 500 words, 750 words, etc.
b The user should be able to select the document types that are searched for (Microsoft Word, RTF, PDF, or plain ASCII text documents).

2. The software would then take those samples (strings of text) and query Google.com and/or Docstoc.com (pre-selected by the user) to identify and find any matches (instances of copyright infringement).

3. For any matches are found, the software would then log:
a. the URL of the page on which the match is found
b. the TITLE of the html page on which the match is found
c. the file name of the source document from the user’s local hard drive
d. the date and time of the query in which the match was found on the offending web site
e. whether the match was found on Google.com or Docstoc.com
f. the exact string of text that was discovered on Google.com or Docstoc.com.

4. The log generated in #3 above should then be exportable as a comma delimited text file (CSV file). It should also be displayed on screen. The user will select if he wants to view the report on screen or export to CSV.

*** REQUESTED PMB COMMENTS: I do not really care what programming language you use, although I tend to prefer PHP and Java. If you think another programming language would be better than PHP, please specify in the PMB what language you would use and WHY that would be better than PHP. NOTE: Please state in the PMB how many DAYS it would take you to complete PHASE ONE of this project. ***

In the SECOND PHASE of this project, I would like you to develop a Windows compatible stand-alone application that the user could install/execute on his computer that will do the same thing as 1-4 above, except that the initial queries and sampling of the documents would take place ONLY on the user’s LOCAL hard drive. No text would be relayed back to the web server. The program would still search for the matches on the Internet in the same way and create a CSV report and screen report.

In the THIRD PHASE of the project, I would like for you to add the capability to the web site version so that the program/script can query the user’s local hard drive and retrieve the samples of the documents on the user’s local computer, and then relay them back to the web server for processing and querying on the Internet. The CSV file and screen report would then be accessible from the web server to the user.

NOTE: I am very price sensitive for this project, so please bid accordingly. If someone bids the right amount and can commit to finishing the job quickly, I will likely end the bidding process early and select you. Please write the word “Excel” in your bid comments so that I know read these specs carefully and that you understand English. Thank you for your interest, and I look forward to working with you!

Gather Search Engine Results

April 30th, 2009 Comments off

I need a program that will automatically put the “view source” results of phrases that would normally be entered into search engines into a .dat file.

I currently have a program that drops multiple requested phrases into a .dat file.

And the search engine names are dropped into another .dat file.

I need an add on program that will run on the client’s side. It will grab the information from the .dat files and automatically get the results from the search engines without any input or interaction from the client. The program would run in the background, gather the results from each phrase and drop the results into another .dat file.

As an example, when the program would be run, if one of the phrases was buying and selling widgets and Google was one of the search engines, everything you would normally see if you manually clicked on “view source” would automatically be dropped into a file called /reports/google/buying_and_selling_widgets.dat

I would also need the source code to the program.

Bear