delete duplicate rows in four columns

Microsoft Excel 2003 (full product)
May 5, 2010 at 08:17:14
Specs: Windows XP
Hi,

I have four columns of data with thousands of rows and I need a macro to check of 4 columns and delete duplicate rows.

Thanks very much.


See More: delete duplicate rows in four columns

Report •


#1
May 5, 2010 at 10:08:44
re: check of 4 columns and delete duplicate rows.

Do you mean that all 4 columns have to match exactly?

This is a match:

Red Blue Green Yellow
Red Blue Green Yellow

But this isn't, right?

Red Blue Green Yellow
Red Green BlueYellow


Report •

#2
May 5, 2010 at 12:30:59
Each row of data should be unique. Some rows are repeated 4 or five times so I need only the unique rows of data.

It has to be a macro because the data on the sheet changes all the time but comes with duplicate rows repeating the same thing for some rows.

I have a macro that looks into the first column and deletes if repeated but that doesnt work due to the fact that the first cell can be repeated 1000 times but the rest of the row changes.

So what I need is a unique set of data for each row.

Red Blue Green
Red Blue Green
Red Green Blue
Blue Red Green

I want just the unique rows pls (Red Blue Green) is a duplicate that should be deleted in my macro.

Thanks very much!


Report •

#3
May 5, 2010 at 13:26:31
Why not post the macro that checks the first column?

It should only take some minor modifications to make it look at 4 columns.

For example, this code first displays the contents of just Column A, then displays the contents of Columns A, B, C and D.

Sub Show4Cells()
 For rw = 1 To 10
  MsgBox Cells(rw, 1)
  MsgBox Cells(rw, 1) & Cells(rw, 2) & _
         Cells(rw, 3) & Cells(rw, 4)
 Next
End Sub

I'm sure something similar can be done to compare 4 columns in the code that already compares and deletes just 1.

Post the code and we'll see what we can do.

BTW...your example only shows 3 columns, not 4. What's up with that?


Report •

Related Solutions

#4
May 5, 2010 at 13:36:34
The code below will delete all three that start wih Red but as you can see there are two unique records with Red in the 1st column.

Red Blue Green Black
Red Blue Green Black
Red Green Blue Black
Blue Red Green Black

Public Sub DeleteDuplicateRows()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' DeleteDuplicateRows
' This will delete duplicate records, based on the Active Column. That is,
' if the same value is found more than once in the Active Column, all but
' the first (lowest row number) will be deleted.
'
' To run the macro, select the entire column you wish to scan for
' duplicates, and run this procedure.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Dim R As Long
Dim N As Long
Dim V As Variant
Dim Rng As Range

On Error GoTo EndMacro
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual


Set Rng = Application.Intersect(ActiveSheet.UsedRange, _
ActiveSheet.Columns(ActiveCell.Column))

Application.StatusBar = "Processing Row: " & Format(Rng.Row, "#,##0")

N = 0
For R = Rng.Rows.Count To 2 Step -1
If R Mod 500 = 0 Then
Application.StatusBar = "Processing Row: " & Format(R, "#,##0")
End If

V = Rng.Cells(R, 1).Value
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Note that COUNTIF works oddly with a Variant that is equal to vbNullString.
' Rather than pass in the variant, you need to pass in vbNullString explicitly.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
If V = vbNullString Then
If Application.WorksheetFunction.CountIf(Rng.Columns(1), vbNullString) > 1 Then
Rng.Rows(R).EntireRow.Delete
N = N + 1
End If
Else
If Application.WorksheetFunction.CountIf(Rng.Columns(1), V) > 1 Then
Rng.Rows(R).EntireRow.Delete
N = N + 1
End If
End If
Next R

EndMacro:

Application.StatusBar = False
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
MsgBox "Duplicate Rows Deleted: " & CStr(N)

End Sub


Report •

#5
May 5, 2010 at 15:48:37
It's not as "fancy" as the code you posted, but it does the trick. I could modify the code you posted, but it was just as easy to write my own.

It assumes that the 4 columns you are comparing are A, B, C and D. If that's not the case, then let me know and I'll make adjustments if you don't know how.

What the code does is insert a "temporary" column A and then concatenate the 4 columns into one string.

Red Blue Green Black
Red Blue Green Black
Red Green Blue Black
Blue Red Green Black

becomes

RedBlueGreenBlack
RedBlueGreenBlack
RedGreenBlueBlack
BlueRedGreenBlack.

The code then searches for duplicates in the temporary Column A, deletes the duplicate rows and then deletes the temporary column.

Let me know what you think.

Option Explicit
Sub DelDup4Cells()
Dim lstRw, nxtRw As Integer
'Determine Last Row in Column A
    lstRw = Range("A" & Rows.Count).End(xlUp).Row
'Insert a Temp Column A
    Columns("A:A").Insert Shift:=xlToRight
'Concatenate the 4 values into Columns A
    Range("A1:A" & lstRw).FormulaR1C1 = "=RC[1]&RC[2]&RC[3]&RC[4]"
'Loop from Last Row to top
  For nxtRw = lstRw To 2 Step -1
'Check for Duplicates
    If WorksheetFunction.CountIf(Range("A1:A" & lstRw), Range("A" & nxtRw).Value) > 1 Then
'Delete the Row if it appears more than once
      Range("A" & nxtRw).EntireRow.Delete
    End If
  Next
'Delete temp Column A
    Columns("A:A").Delete
End Sub


Report •

#6
May 6, 2010 at 07:51:21
Yes it works. Thanks very much for your help. I really, really appreciate it.

Report •

#7
May 6, 2010 at 08:34:15
Hi, Just tested the above macro on about 2000 rows of data and it takes about 12 to 15 mins for it to finish. Any ideas on how to spped up the macro?

Report •

#8
May 6, 2010 at 10:25:20
12 to 15 minutes for 2000 lines?

I just did 20,000 lines, with only 8 "unique" sets data and it took just over 2 minutes to delete 19,992 lines.

2,000 lines took about 2 seconds.

You can try adding this line right after the DIM line, but it didn't make much of difference on my system. This line will prevent Excel from showing you what it is doing until the code has finished deleting all of lines. It should eliminate the screen flickering that you might be seeing.

Application.ScreenUpdating = False

BTW...what are your system specs?

I tested it on a 3 Ghz machine with 3.5 Gb RAM and a bunch of other apps open at the time.


Report •

#9
May 6, 2010 at 12:25:56
Hi,

The thing is I have three macros including your's that run on one sheet.

Since I added all three in my workbook its become really slow. Running all 3 one after another takes 30 minutes but if you do it one at a time closing the work book each time, they all run fast. Do you have any idea why? Please see below, if you can help. Thanks very much!

NB(The sheet is a drop down list)


One deletes blank rows
Another duplicates(your macro)
finally and insert query that takes the same data to insert into another sheet starting from A15 to every 9 rows.

Duplicate

Option Explicit

Sub DelDup4Cells()

Dim lstRw, nxtRw As Integer

'Determine Last Row in Column A

lstRw = Range("A" & Rows.Count).End(xlUp).Row

'Insert a Temp Column A

Columns("A:A").Insert Shift:=xlToRight

'Concatenate the 4 values into Columns A

Range("A1:A" & lstRw).FormulaR1C1 = "=RC[1]&RC[2]&RC[3]&RC[4]"

'Loop from Last Row to top

For nxtRw = lstRw To 2 Step -1

'Check for Duplicates

If WorksheetFunction.CountIf(Range("A1:A" & lstRw), Range("A" & nxtRw).Value) > 1 Then

'Delete the Row if it appears more than once

Range("A" & nxtRw).EntireRow.dELETE

End If

Next

'Delete temp Column A

Columns("A:A").dELETE

End Sub


Insert query


Sub DemandPPCAINS()

Dim rngDest As Range

Dim rngSrcStart As Range

Dim rngSrcEnd As Range

Dim rngCell As Range

Dim intOffset As Integer

On Error GoTo ErrHnd

'set start of source range

Set rngSrcStart = Worksheets("PPCAINS").Range("A3")

'set end of source range

Set rngSrcEnd = Worksheets("PPCAINS"). _

Range("A" & CStr(Application.Rows.Count)).End(xlUp)

'set destination range start

Set rngDest = Worksheets("Demand").Range("A13")

'set destination offset

intOffset = 0

'get each row of source data

For Each rngCell In Range(rngSrcStart, rngSrcEnd)

'copy five columns to destination row

rngCell.Resize(1, 4).Copy Destination:=rngDest.Offset(intOffset, 0)

'jump a row

intOffset = intOffset + 9

Next rngCell

Exit Sub

'error handler

ErrHnd:

Err.Clear

End Sub


Deletes Blanks


Sub DeleteBlankS()

Dim R As Long

Dim C As Range

Dim N As Long

Dim Rng As Range

On Error GoTo skip

Application.ScreenUpdating = False

Application.Calculation = xlCalculationManual

If Selection.Rows.Count > 1 Then

Set Rng = Selection

Else

Set Rng = ActiveSheet.UsedRange.Rows

End If

N = 0

For R = Rng.Rows.Count To 1 Step -1

If Application.WorksheetFunction.CountA(Rng.Rows(R).EntireRow) = 0 Then

Rng.Rows(R).EntireRow.dELETE

N = N + 1

End If

Next R

skip:

Application.ScreenUpdating = True

Application.Calculation = xlCalculationAutomatic

End Sub


Report •

#10
May 7, 2010 at 06:53:39
First, a posting tip:

If you click on the pre symbol above the Comments box, it will place the pre tags in your post. If you then paste your code between the pre tags, it will retain it's formatting and be much easier to read. Look at the code I posted vs the code you posted.

Second, you can combine 2 of the macros to make your life a little easier.

By adding one line to my code, you can delete the blank lines as well as the duplicates.

I am assuming that a "blank line" means that the four columns that we are checking are all empty. Therefore, if the Concatenation formula evaluates to an empty cell then that line is blank.

Note the new line below the comment 'Check for Blanks

Option Explicit
Sub DelDup4Cells()
Dim lstRw, nxtRw As Integer
'Determine Last Row in Column A
    lstRw = Range("A" & Rows.Count).End(xlUp).Row
'Insert a Temp Column A
    Columns("A:A").Insert Shift:=xlToRight
'Concatenate the 4 values into Columns A
    Range("A1:A" & lstRw).FormulaR1C1 = "=RC[1]&RC[2]&RC[3]&RC[4]"
'Loop from Last Row to top
  For nxtRw = lstRw To 2 Step -1
'Check for Blanks and Delete the Row
    If Range("A" & nxtRw) = "" Then Range("A" & nxtRw).EntireRow.Delete
'Check for Duplicates
    If WorksheetFunction.CountIf(Range("A1:A" & lstRw), Range("A" & nxtRw).Value) > 1 Then
'Delete the Row if it appears more than once
      Range("A" & nxtRw).EntireRow.Delete
    End If
  Next
'Delete temp Column A
    Columns("A:A").Delete
End Sub

Third, I see no reason why all 3 macros couldn't be combined into one, but I don't know if you will always want to run all three, or in what order, etc. so I won't address that at this time.

Finally, I have no clue why running the code(s) should slow your system down. None of the macros are doing anything "sophisticated" like web queries or system calls or object creation, so I can't explain the need to save the file each time to speed things up. It certainly doesn't happen when I run the 3 macros one at a time.

I'll ask some of my compadres to see what they have to say.


Report •

#11
May 7, 2010 at 07:21:41
Hi again,

Thank you sooooo much! You are star :-)
I really, really really appreciate your help!!!!!

I think I found something earlier, because my cells are also drop down lists, I think the macros pick up data while running, then hide it.

This will slow down the insert query or confuse it I guess. Hmm not too sure??????

After running the blanks and duplicates, I discovered it shows a filter at the end and it holds data I never pasted there in the first place. Is it possible the macros are picking up data from empty cells in the drop down list while running?

The cells have to be a drop down to get accurate data but mostly copy and paste. What do you think?


Report •

#12
May 7, 2010 at 08:03:27
Thanks again. It works fine but just tested the new macro above, I believe it goes into filter mode and hides the data rather than delete.

This will definitely knock over the macro inserting into another sheet. Any ideas on how to solve this please?


Report •

#13
May 10, 2010 at 07:13:05
re: I believe it goes into filter mode and hides the data rather than delete.

I have no idea what this means. Nothing in any of the code above should "Hide" any data. It specifically says .EntireRow.Delete.

VBA is not going to interpret that as .EntireRow.Hidden = True.

You'll need to explain this further, assuming the following code doesn't help.

Back to the original problem...

I asked around and learned a much more efficient way to delete the Duplicate and empty rows.

I created a sheet with 2000 rows of 4 columns, with a DropDown in every cell. Sprinkled throughout these 2000 rows were some sets of 4 empty cells. Within these 2000 rows, there were 7 unique sets of 4 values.

The following code took about 2 seconds to delete the Duplicate and empty sets of data.


Option Explicit
Sub DelDup4Cells()
Dim lstRw, nxtRw As Integer
'Determine Last Row in Column A
    lstRw = Range("A" & Rows.Count).End(xlUp).Row
'Insert Temp Columns A & B
    Columns("A:B").Insert Shift:=xlToRight
'Concatenate the 4 values into Columns B
    Range("B1:B" & lstRw).FormulaR1C1 = "=RC[1]&RC[2]&RC[3]&RC[4]"
'Place TRUE or FALSE in Column A using COUNTIF
    With Sheet1.Range("A1:A" & lstRw)
      .Formula = "=IF(COUNTIF($B$1:$B$" & lstRw & ",B1)=1,FALSE,NOT(COUNTIF($B$1:B1,B1)=1))"
      .Calculate
      .Value = Sheet1.Range("A1:A" & lstRw).Value
'Replace all TRUE with empty cells
      .Replace "TRUE", ""
'Delete rows with Empty Cells (the Duplicates)
      .SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    End With
'Delete temp Columns A & B
    Columns("A:B").Delete
'Delete the one remaining blank row
  lstRw = Range("A" & Rows.Count).End(xlUp).Row
    With Sheet1.Range("A1:A" & lstRw)
     .SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    End With
End Sub


Report •

#14
May 10, 2010 at 12:13:44
Thanks very much for all your help. I really appreciate and yes it works perfectly now.

Report •


Ask Question