Hello,
I need a content synonymizer script which shall work as follows-
1. Hold “Synonymizer” data in one folder on the server. The data is the root on which the script has to function and can be changed by the user. It consists of the words along with their synonyms in a particular format. Sample-
started|jumped
started|initiated
started|commenced
tough|bad
tough|difficult
tough|negative
tough|serious
tough|problematic
tough|troublesome
track|course
track|path
track|give chase
conflicting|opposed
conflicting|inconsistent
conflicting|incompatible
conflicting|self-contradictory
information|data
information|info
there|on that point
contradictory|inconsistent
This particular format has advantages. Each line has got two entries.
It’s like –
Column A|Column B
Now some “words” are present only in column A, some only in column B and some are present both in column A and B.
(There can be a step in which the script creates a database and stores the data taken from this text file..)
2. The script should have the options to input the file whose content has to be synonymized by uploading a file. It will usually be a paragraph of sentences or list of sentences.
3. User will be given options-
a). Maximum number of words to be replaced/synonymized in a sentence –
b). Maximum number of synonyms to be used –
Option a). Words in a sentence that match in the data in Part 1 should be selected by the script and replaced using a ["..|..|.."] syntax. Words in the sentence should be selected randomly when we give option as to how many maximum words in a sentence to be synonymized.
Eg. This sentence –
It is tough to get started.
With option a. input as 1.
The processed output will be –
—————————————-
It is ["tough|bad|challenging|problematic|terrible"] to get started.
OR
It is tough to get ["started|initiated|set out"].
Notice that the word ‘tough’ , ‘started’ will be selected randomly when MAX no. of words can be replaced = 1
——————————————-
With option a. input as 2 or 3 or 4 etc..
The processed output will be –
—————————————
It is ["tough|bad|challenging|problematic|terrible"] to get ["started|initiated|set out"].
—————————————–
Given that other words in this original sentence were not their synonymizer database of words.
Option b). This will eventually control maximum number of words that will appear inside ["..|.."]. Randomly the script will select that many possible synonyms for selected words in the sentence.
OutPut- It’s better if the script does it in background processing especially good for larger files. It should be bug free.
The user should be given a download link to the final output file.
Best Bid Wins!