I need a Regex script that marks position of duplicates (line numbers, field) of a csv file in a new, updated file.
EXAMPLE
1 First Second Third
2 First 2 Third
3 First Second Third
4 First 2 Third
5 Different New Third
I need to find duplicates by field. I can specify which fields to check. (TS can be specified and delimiter around fields (e.g. “”)
So, if the search is for field 2 duplicates, the search output will be added to the front of the lines (if no duplicates, add fields 2, 3, etc. as blank).
EXAMPLE – checking three (2,3,4) fields
OUTPUT :
Line number 2nd field’s duplicate line 3rd field’s duplicate line, 4th field’s duplicate line (etc.) 2nd Field, 3rd Field, etc.
1 1,2,3,4 1,3 1,2,3,4,5 First Second Third
2 1,2,3,4 2,4 1,2,3,4,5 First 2 Third
3 1,2,3,4 1,3 1,2,3,4,5 First Second Third
4 1,2,3,4 2,4 1,2,3,4,5 First 2 Third
5 1,2,3,4,5 Different New Third
SECOND PROJECT
Linux web page search (local or server script)
Start search and download of pages
Convert character encoding
Remove (html) tags and other code
Extract strings (consecutive characters) within a certain range of Unicode code points into files with web URL address in header line.