Use regular expressions to delete duplicate rows

I have a list of html links, most of which are repeated, as in the following example->

> http://example.com/some/a-test-link.html
> http://example.com/some/a-test-link.html
> http://example. com/some/another-link.html
> http://example.com/some/another-link.html
> http://example.com/some/again-link.html< br />> http://example.com/some/again-link.html

I don’t need the same link twice, so I need to delete duplicates and keep only one link. How can I use Does regular expression do this? Or SED/AWK (I’m not sure which technique is best.)? I am using ubuntu operating system and text editing sublime text 3.

Thank you

Using awk is very simple:

awk'!seen[$0]++' file

This basically means:

awk "!($0 in seen) {seen[$0];print}"

So if the line is not in the array, it will add it and print it .All subsequent lines will be skipped (if they exist in the array).

$cat file> http://example.com/some/a-test-link. html
> http://example.com/some/a-test-link.html
> http://example.com/some/another-link.html
> http: //example.com/some/another-link.html
> http://example.com/some/again-link.html
> http://example.com/some/again- link.html$awk'!seen[$0]++' file> http://example.com/some/a-test-link.html> http://example.com/some/another-link.html> http://example.com/some/again-link.html

I have a list of html links, most of which are repeated, as in the following example – >

> http://example.com/some/a-test-link.html
> http://example.com/some /a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/s ome/another-link.html
> http://example.com/some/again-link.html
> http://example.com/some/again-link.html

I don’t need the same link twice, so I need to remove duplicates and keep only one link. How can I do this with regular expressions? Or SED/AWK (I’m not sure which technique is best.)? I am using ubuntu operating system and text editing sublime text 3.

Thank you

Using awk is very simple:

awk'!seen[$0]++' file

This basically means:

awk "!($0 in seen) {seen[$0];print}"

So if the line is not in the array, it will add and print it. All subsequent lines will be skipped (if they exist in the Array).

$cat file> http://example.com/some/a-test-link.html
> http://example. com/some/a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/some/another-link. html
> http://example.com/some/again-link.html
> http://example.com/some/again-link.html$awk'!seen[$0]+ +'file> http://example.com/some/a-test-link.html> http://example.com/some/another-link.html> http://example.com/some/again- link.html

Leave a Comment

Your email address will not be published.