Computing.Net > Forums > Unix > How to get Duplicate rows in a file

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

How to get Duplicate rows in a file

Reply to Message Icon

Name: raghu.iv85
Date: April 2, 2009 at 06:01:03 Pacific
OS: HP-UX
Subcategory: General
Comment:

Hi all,

I have written one shell script. The output file of this script is having sql output.

In that file, I want to extract the rows which are having multiple entries(duplicate rows).
For example, the output file will be like the following way.

===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653


In this output, I want numeric duplicate rows only. Suppose this file is having lines to separate the values, those lines also considered as duplicate rows. So I want only the out put from this file which is having more than one entry and which is related to numbers.

Raghunadh



Sponsored Link
Ads by Google

Response Number 1
Name: James Boothe
Date: April 2, 2009 at 14:56:10 Pacific
Reply:

This awk code requires a line to start with a digit, otherwise will be ignored. If you might have valid lines with leading white space, you would need a slight adjustment to the code.

This awk code summarizes distinct lines into an array. A huge number of lines would overflow the memory, so you would need a different approach, such as sorting all the lines.

This solution prints lines found to be duplicated, and shows the count of lines found. If you want just the lines without the count, the print command should be just "print i"

awk '/^[0-9]/ {lsum[$0]++}
END {
for (i in lsum)
   if (lsum[i] > 1)
      print i, "(" lsum[i] ")"
}' myfile

397 44847 (2)
400 33653 (2)


0

Response Number 2
Name: ghostdog
Date: April 2, 2009 at 21:38:12 Pacific
Reply:

# sort file |uniq -d|grep '[0-9].*'
397 44847
400 33653

Unix Win32 tools | Gawk for Windows


0

Sponsored Link
Ads by Google
Reply to Message Icon

Related Posts

See More






Use following form to reply to current message:

Login or Register to Reply
LoginRegister


Sponsored links

Ads by Google


Results for: How to get Duplicate rows in a file

deleting rows in a file www.computing.net/answers/unix/deleting-rows-in-a-file/3676.html

How to delete duplicate lines ?? www.computing.net/answers/unix/how-to-delete-duplicate-lines-/8277.html

how to identify the size of a file www.computing.net/answers/unix/how-to-identify-the-size-of-a-file-/7523.html