> http://example.com/some/a-test-link.html
> http://example.com/some/a-test-link.html
> http://example. com/some/another-link.html
> http://example.com/some/another-link.html
> http://example.com/some/again-link.html< br />> http://example.com/some/again-link.html
I don’t need the same link twice, so I need to delete duplicates and keep only one link. How can I use Does regular expression do this? Or SED/AWK (I’m not sure which technique is best.)? I am using ubuntu operating system and text editing sublime text 3.
Thank you
awk'!seen[$0]++' file
This basically means:
awk "!($0 in seen) {seen[$0];print}"
So if the line is not in the array, it will add it and print it .All subsequent lines will be skipped (if they exist in the array).
$cat file> http://example.com/some/a-test-link. html
> http://example.com/some/a-test-link.html
> http://example.com/some/another-link.html
> http: //example.com/some/another-link.html
> http://example.com/some/again-link.html
> http://example.com/some/again- link.html$awk'!seen[$0]++' file> http://example.com/some/a-test-link.html> http://example.com/some/another-link.html> http://example.com/some/again-link.html
I have a list of html links, most of which are repeated, as in the following example – >
> http://example.com/some/a-test-link.html
> http://example.com/some /a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/s ome/another-link.html
> http://example.com/some/again-link.html
> http://example.com/some/again-link.html
I don’t need the same link twice, so I need to remove duplicates and keep only one link. How can I do this with regular expressions? Or SED/AWK (I’m not sure which technique is best.)? I am using ubuntu operating system and text editing sublime text 3.
Thank you
Using awk is very simple:
awk'!seen[$0]++' file
This basically means:
awk "!($0 in seen) {seen[$0];print}"
So if the line is not in the array, it will add and print it. All subsequent lines will be skipped (if they exist in the Array).
$cat file> http://example.com/some/a-test-link.html
> http://example. com/some/a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/some/another-link. html
> http://example.com/some/again-link.html
> http://example.com/some/again-link.html$awk'!seen[$0]+ +'file> http://example.com/some/a-test-link.html> http://example.com/some/another-link.html> http://example.com/some/again- link.html