Remove Duplicate Domains, Excel VBA

Individual software Individual professor...
October 25, 2010 at 03:54:01
Specs: Windows 7
I need a VBA script for Excel that can remove duplicate domains in a column. So if I have and after 3 rows in the same column have: and after 2 rows, I want the script to just leave and delete and

See More: Remove Duplicate Domains, Excel VBA

October 25, 2010 at 06:58:29
Thanks for letting us know what you need and what you want.

If you need or want any help with that, feel free to ask.

Report •

October 25, 2010 at 07:26:05
I actually need someone to make this script, if you can do it would be great. Looking from algoritmic perspective it would be something like:

Take this text up to the third "/" (because"), the domain name is till the third "/" and if you find the text till the third "/" is the same then delete the whole row.

Report •

October 25, 2010 at 09:26:59
I know exactly what you want to do, but it appears that you missed my point.

We are all volunteers here, offering our assistance for free. I assume that you wouldn't walk into a co-worker's office and just blurt out what you need and want. I assume you would ask nicely, and maybe even throw in a "Please" and/or a "Thank You" along with your requirements.

It should be no different when asking for help in an online forum.

That said...

You can do it without VBA:

I'm going to assume that your URLs are in Column A and that you do not have Column Labels in Row 1. If that's not the case, modify these instructions as required.

1 - Insert a new Column A
2 - Insert a new Row 1
3 - In A1 enter Domain
4 - In B1 enter URL
5 - In A2 enter this formula and drag it down:


This should put a list of Domain Names in Column A, which will include duplicates.

6 - Select the list of domain names, including A1
7 - Data...Filter...Advanced Filter
- a - Check "Copy to another location"
- b - Copy to: C1
- c - Check "Unique records only

This should put a list of unique Domain Names in Column C.

8 - In D2 enter this formula and drag it down:


This will find the first occurrence of each Domain Name in Column A and return the corresponding URL from column B.

Report •

Related Solutions

October 25, 2010 at 14:41:30
Thanks. The problem is, I have over 300 files, each having over 5000 URLs in column A so a script would be a lot more practical really. Is this possible with a script?

Thanks :)

about your answer:

3 - In A1 enter Domain
4 - In B1 enter URL

How do you mean? The domain/URL are in the SAME column. I don't know which domains are duplicate, I only know I have 5k+ URLs, and some of them are from same domains, and what would be great if there's a script that can identify whether the URL contains the domain again and remove it.

Report •

October 25, 2010 at 15:05:50
1 - Insert a new Column A
2 - Insert a new Row 1
3 - In A1 enter "Domain" as in D-o-m-a-i-n
4 - In B1 enter "URL" as in U-R-L

These are nothing more than column labels, which are needed (at least in Column A) in order to produce the Filtered list in Column C.

I'll work on macro when I get a chance.

Report •

October 25, 2010 at 16:43:09
I suggest that you try this code in a backup copy of your workbook because macros can not be undone.

This code assumes that your list resides in Column A, starting in A1.

This code will sort your list of domains in ascending order and the delete the duplicates.

If you need the domains put back in the original order, then a few additional lines of code will need to be added.

Option Explicit
Sub CleanDomainNames()
Dim lastDomain, nxtDomain As Integer
Dim curDomain, curDomainUp1 As String
'Stop screen flickering
 Application.ScreenUpdating = False
'Sort Column A
  Columns("A:A").Sort Key1:=Range("A1"), _
                      Order1:=xlAscending, _
'Find last row with data in COlumn A
 lastDomain = Range("A" & Rows.Count).End(xlUp).Row
'Loop through Column A in reverse order
  For nxtDomain = lastDomain To 2 Step -1
'Extract Domain name from current cell
   curDomain = _
     Mid(Range("A" & nxtDomain), 1, _
     InStr(InStr(1, Range("A" & nxtDomain), "//") + 3, _
     Range("A" & nxtDomain), "/") - 1)
'Extract Domain name from cell above current cell
   curDomainUp1 = _
     Mid(Range("A" & nxtDomain - 1), 1, _
     InStr(InStr(1, Range("A" & nxtDomain - 1), "//") + 3, _
     Range("A" & nxtDomain - 1), "/") - 1)
'Compare the Domain names, if they match
'Delete the current domain name
    If curDomain = curDomainUp1 Then
      Range("A" & nxtDomain).EntireRow.Delete
    End If
'Repeat until only unique domain names are left
'Loop through unique naems and strip off last "/"
  lastDomain = Range("A" & Rows.Count).End(xlUp).Row
    For nxtDomain = 1 To lastDomain
      Range("A" & nxtDomain) = _
      Left(Range("A" & nxtDomain), Len(Range("A" & nxtDomain)) - 1)
'Show results
 Application.ScreenUpdating = True
End Sub

Report •

October 26, 2010 at 01:15:20
Hey man, thanks a lot. The first URL is actually starting at A2, I believe that wouldn't be hard to change in the VBA. The thing is, I need them in ORIGINAL order, I already found a way to remove duplicates and sort them in ascending order, that was the reason why I am looking for sorting by original order. So I have A1=URL, B1= something else associated with the URL, that's why I need the whole row to be deleted, not just the A(something) cell.

So if you can make it start from A2 + make the original order kept (so just the first URL is kept and the other are deleted) that would be amazing

Report •

April 1, 2011 at 10:49:25
I've been trying to run the script above but it comes back with a runtime 5 error, invalid procedure call. That said it's the first time that I have tried a VBA script and so I am probably making a mistake. I'm on a Mac.
Could you tell me how to run the script?
Sorry if the question is a bit basic :(
Thanks in advance...

Report •

Ask Question