Delete Rows in FileB FileA list

May 1, 2010 at 07:51:48
Specs: Windows XP
I have two text files. FileA.txt and FileB.txt
FileA contains approximately 150 numbers in a list i.e.


FileB has approximately 66,000 lines of text in a csv format. I need to delete all the rows that do not contain any of my numbers from FileA. The numbers would always be the third column in.

The lines in FileB look like:

"0","CE7721","034111","Barton Coll","Y","FE Coll","L","
"0","CE0047","025698","Ketchley Coll","Y","FE Coll","L","
"0","CE5599","025999","Westway Coll","Y","FE Coll","L","

In the above example line 2 has one of FileA.txt listed numbers i.e. 025698. Lines 1 and 3 do not so they would be deleted. At the end of the process I need to save FileB.txt which should now only contain rows whose third column matches FileA.txt. Can anyone help with a batch file that can do this?

See More: Delete Rows in FileB FileA list

May 1, 2010 at 08:17:06
Are you willing to copy the data into Excel and use a VBA Macro to delete the lines?

Report •

May 1, 2010 at 08:59:04

I have tried Excel and found it took too long to delete the unwanted data i.e. I have tried...

Loading the file onto two worksheets in a workbook to get around XL maximum # rows

I then tried some VBA array code to remove the lines without any luck while all the data resided in column 1.

I then applied TxtToColumns which took an age on 65,000 rows and again ran some VBA array code which did delete the rows but it took so long to do it I could have cut and pasted 150+ times to a new worksheet.

If you do a have a quick way to do this I would be very interested. Thanks.

Report •

May 1, 2010 at 11:21:26
One more question:

You say that line 2 should remain since it contains 025698.

Will any other of the 66,000 lines contain 025698 or will there only be one occurrence of a given number from the 150 we're searching for?

Report •

Related Solutions

May 1, 2010 at 12:06:16
Good question

The answer is yes. While the rest of the csv row will vary to a degree there will be a lot of instances where the number in column 3 will be the same and I need each row that has a corresponding number with FileA.txt.

Report •

May 1, 2010 at 14:11:14
I set up a workbook as follows:

Sheet1 contains 150 numbers in A1:A150, all of which can be found in the data strings Sheet2 Column A

Sheet2 contains 65,100 rows of data, similar to yours, all in Column A. In addition, Sheet2 was set up such that all of the 150 numbers from the list would be found multiple times - for a total of 21,700 "found data strings".

The following code copies the found strings to Sheet3, leaving a list of data strings that contain the numbers from Sheet1 A1:A150.

Copying the 21,700 data strings took about 5 1/2 minutes.

Option Explicit
Sub DelRowNoMatch()
Dim myNum, c, firstAddress, nxtRow
'Record Start time
  Sheets(3).Range("A1") = Now()
'Loop through numbers in Sheet1 A1:A150
   For Each myNum In Sheets(1).Range("A1:A150")
'Find each numer in Sheet 2's data strings
    With Sheets(2).Range("A1:A65100")
     Set c = .Find("""" & myNum & """", lookat:=xlPart)
'If found...
''Store address of first string found
      If Not c Is Nothing Then
       firstAddress = c.Address
''Find next emtpy row in Sheet 3
         nxtRow = Sheets(3).Range("A" & Rows.Count).End(xlUp).Row + 1
''Copy string to Sheet 3
         c.EntireRow.Copy Destination:=Sheets(3).Range("A" & nxtRow)
''Find next string with the same number
          Set c = .FindNext(c)
''Loop until all strings containing that number exists
        Loop While Not c Is Nothing And c.Address <> firstAddress
      End If
    End With
'Repeat for next number in Sheet 1 A1:A150
'Record End Time
  Sheets(3).Range("B1") = Now()
End Sub

Report •

May 2, 2010 at 03:34:55

I set up my worksheets as suggested and ran the code. It took 55 minutes to run and extracted 22, 769 rows to Sheet3.

It might be down to the processor on my pc but I do need to find a method that speeds up the process.

A big thanks for your contribution.

Report •

May 2, 2010 at 07:42:47
55 minutes? Wow!

What speed is your processor and how much RAM do you have installed?

I ran the macro on 3 different machines.

All machines have either a 2.7 or 2.8 Ghz CPU's. 2 machines have 2 Gb of RAM, one has 512Mb.

The "beefiest" of these machines ran the code (21K results) in under 2 minutes. Even the one with 512 RAM ran it in under 3.

Check My Computer...System Information to see what your specs are.

Report •

May 2, 2010 at 11:51:45
The PC I am testing this on has 1.09GHz Processor with 1.50GB RAM.

When the code was running I noticed that every now and then the top of the worksheet would flash a message after Microsoft Excel - Sheet1 saying (Not Responding) but the code kept running. I have tried this on smaller worksheets and it ran very well. I am sure its down to the PC spec. Will try this next week on another PC to see how quickly it runs. Again, a big thanks for your help on this.


Report •

May 3, 2010 at 05:58:53

I ran DerbyDad03's code on data I created based on your sample data - 65,535 rows and numbers in 150 cells.

I ran this on my laptop which has an AMD processor running at 1.58Ghz and 2GB Ram

It took 2 minutes 56 seconds to move 24,545 matching records to sheet3.

This suggests that there is something else affecting the performance of your PC.

Check to see if other programs are using significant resources (Ctrl-Shift-Esc to show the Task manager)
Look at processes to see which programs are using processor time - also check to see that there is 'Available' memory' (see Physical memory section)


Report •

May 3, 2010 at 11:46:47
Hi Humar

I agree. I decided to reformat the PC today. Hopefully this will sort out my time issue. Thanks for the timings.

Report •

Ask Question