Solved Sorting of data using multiple (2) conditions Excel 2007

July 17, 2012 at 08:28:43
Specs: Windows XP
Problem: how to sort large amounts of data based on 2 conditions.
-----
I have sample data collected from a electronics device. It has 50 channels, each is sampled 0-2 Sample/s, but normally 1 Sample/s. The data for ALL channels is saved in the same 4 columns (ChNo, ValueIn, ValueOut, TimeStamp) file.

Size of data is ~400k samples. Each channel between 8k-12k samples.

Note:
Not all channels will be sampled the same amount of times during the entire data collection.
The order order in which the channels are sampled also varies (can be 1,2,3,4 or 4,2,3,1).
The Start and Stop time is always the same for ALL channels.

I am interested in 20 of the 50 channels. Channels are numbered in format 34XX.

=================
*** The raw data looks like this:
ChNo ValueIn ValueOut TimeStamp
3466 0 2012-07-11 14:37:40
3465 0 2012-07-11 14:37:40
3464 0 2012-07-11 14:37:40
3463 0 2012-07-11 14:37:40
3462 0 2012-07-11 14:37:41
3461 0 2012-07-11 14:37:41
3460 -0,01235 2012-07-11 14:37:41
3459 0,000215 2012-07-11 14:37:41
3458 -0,000165 2012-07-11 14:37
...
=================

*** I want to re-sort my data in this way:

TimeStamp ValueIn(3466) ValueIn(3465) ValueIn(3464) ... ValueIn(3460)
14:37:40 0 0 0 ... 0
14:37:41 -0,01235 -0,000165 0 ... 0,000215
...
=================

Ideally I would import the sample data in one worksheet WS1. In WS2 I would specify what time range I want, and list which channels, and WS3 would sort the data in the way specified above.


I have already done this manually, by sorting all the data after TimeStamp, and then using filters to go through and and copy/paste the samples for each channel into a new worksheet. However, since the channels have different number of samples I need to keep ChNo together with TimeStamp for _each_ channel. This is very cumbersome when plotting, I would rather have only a single TimeStamp and all channel samples for that time. The Start and Stop time for all channels are the same, it is just the number of samples taken from start to stop that varies.


My current idea:

Use some conditional search. Create a list with all the TimeStamps I am interested in. For each of those TimeStamps I would then lookup the correct channel and print its value. However, this runs into problems since I can get between 0-2 returns for a specific Channel&TimeStamp. Also Vlookup can only search for one condition (in my case TimeStamp). I cannot use "helper columns" as alot of people use, since I have such large amount of data.

What I need is some kind of lookup that
1: Finds all samples with the correct TimeStamp
2: Finds the specified channel nr
3: Prints the value in correct cell.


I am not hugely experienced with Excel 2007, but willing to learn so all help or pointers are appreciated! Just a general idea of how to go about with the sorting is appreciated, and I will read up on it.


See More: Sorting of data using multiple (2) conditions Excel 2007

Report •

✔ Best Answer
July 19, 2012 at 18:13:13
OK, I'm jumping in a little late here, but I'll give this a shot.

First, the example output data that you provided doesn't seem to have come from the
example input data that you provided.

For example, your input does not show a value for Channel 3466 at 14:37:41 yet your
output does.

I hope I can assume that you posted what you posted for brevity.

If that is correct, then here is what I did:

1 - I copied your input data into Sheet1, so it looks like your raw data from
Response #1.

2 - I created this table on Sheet2 by first using an Advanced Filter to get the
unique Time Stamps into Column A, then using Copy...PasteSpecial...Transpose
to get the Channel numbers into Row 1.

Sheet2:

          A                B     C     D     E     F     G     H     I     J
1  TimeStamp             3466  3465  3464  3463  3462  3461  3460  3459  3458
2  07/11/2012 14:37:40                                                                     
3  07/11/2012 14:37:41   

4 - I then ran this code:

Sub SortBy2Criteria()
'Determine last Row with Channel in Sheet1!Column A (Source)
  lastSourceChannelRow = Sheets(1).Cells(Rows.Count, "A").End(xlUp).Row
'Determine last Row with Time Stamp Sheet2!Column A (Destination)
  lastDstTimeRow = Sheets(2).Cells(Rows.Count, "A").End(xlUp).Row
'Determine last Column with Channel in Sheet2!Row 1 (Destination)
  lastDstChannelColumn = Sheets(2).Cells(1, Columns.Count).End(xlToLeft).Column
'Loop through Channel Rows on Sheet1
    For chanRow = 2 To lastSourceChannelRow
'Find each Source Channel from Sheet1!Column A in Sheet2!Row 1
'Set Column Variable based on result
     With Sheets(2).Range("B1:J" & lastDstChannelColumn)
       Set c = .Find(Sheets(1).Range("A" & chanRow))
         cCol = c.Column
     End With
'Find corresponding Time Stamp in Sheet2!Column A
'Set Row Variable based on result
     With Sheets(2).Range("A2:A" & lastDstTimeRow)
       Set t = .Find(Sheets(1).Range("D" & chanRow))
         tRow = t.Row
     End With
'Use Row and Column variables to copy ValueIn from Sheet1 to Sheet2
       Sheets(2).Cells(tRow, cCol) = Sheets(1).Cells(chanRow, 2)
    Next
End Sub


I ended up with this on Sheet2:

          A               B         C         D         E         F         G         H         I         J
1  TimeStamp            3466      3465      3464      3463      3462      3461      3460      3459      3458
2  07/11/12 14:37:40  0,000162  0,000139  0,000146  0,000098					
3  07/11/12 14:37:41					      0,000161  0,000098  -0,01235  0,000215  -0,000165

Obviously I don't know how this will work with your entire input dataset, but I think
it will do what you want it to.

Click Here Before Posting Data or VBA Code ---> How To Post Data or Code.



#1
July 18, 2012 at 05:40:13
My apologizes for not posting with PRE tag. This is my data.

Raw data:

ChNo	ValueIn	ValueOut	TimeStamp
3466	0,000162		2012-07-11 14:37:40
3465	0,000139		2012-07-11 14:37:40
3464	0,000146		2012-07-11 14:37:40
3463	0,000098		2012-07-11 14:37:40
3462	0,000161		2012-07-11 14:37:41
3461	0,000098		2012-07-11 14:37:41
3460	-0,01235		2012-07-11 14:37:41
3459	0,000215		2012-07-11 14:37:41
3458	-0,000165		2012-07-11 14:37:41
...
<PRE>

Wanted formatted data:
<PRE>
TimeStamp		ValueIn_ChNo3466	ValueIn_ChNo3465	ValueIn_ChNo3464	...	ValueIn_ChNo3458
2012-07-11 14:37:40	0,000161		0,000161		0,000161		...	-0,01235
2012-07-11 14:37:41	-0,01221		-0,00231		-0,000141		...	
2012-07-11 14:37:42	0,000098		0,000098		0,000161		...	-0,01235
2012-07-11 14:37:43	0,000161		0,000161		0,000161		...	-0,01235
2012-07-11 14:37:44	0,000162		-0,000165		0,000098		...	-0,01235
2012-07-11 14:37:45	0,000098		0,000161		0,000161		...	-0,01235

How I would do it manually:
(0) List all wanted channels ChList
(1) Find firstTimeStamp and lastTimeStamp
(2) Make a list of all TimeStamps between first and last from (1)
(3) Run through each of the TimeStamps from (2) and find ValueIn for all channels in ChList
(4) Copy the value to the correct cell from Raw data to the Sorted Data worksheet

It is (3) I need help with.

If I only had 1 channel I could do (3) with VLOOKUP on the TimeStamp. However I have multiple values per TimeStamp


Report •

#2
July 18, 2012 at 10:02:44
How is your Value_In number formatted?

It appears to be TEXT, unless your using a Custom Format.

Is the comma in your numbers supposed to be a decimal point?


MIKE

http://www.skeptic.com/


Report •

#3
July 18, 2012 at 13:29:08
ValueIn is formatted as a decimal number. The comma is the default decimal mark, since I am swedish :)

ValueOut is empty.

The TimeStamp however is a custom format, I couldn't find any default formats with YYYY-MM-DD hh:mm:ss.


Report •

Related Solutions

#4
July 18, 2012 at 13:46:09
Ok, try this:

With your data looking like:

     A         B            C              D
 1) ChNo    ValueIn      ValueOut        TimeStamp
 2) 3466    0.000162                07/11/2012 14:37:40
 3) 3465    0.000139                07/11/2012 14:37:40
 4) 3464    0.000146                07/11/2012 14:37:40
 5) 3463    0.000098                07/11/2012 14:37:40
 6) 3462    0.000161                07/11/2012 14:37:41
 7) 3461    0.000098                07/11/2012 14:37:41
 8) 3460    -0.012350               07/11/2012 14:37:41
 9) 3459    0.000215                07/11/2012 14:37:41
10) 3458    -0.000165               07/11/2012 14:37:41

To begin,
Select Column A, all your Channel Nmbrs
do not include the heading. IE: A2:A10
Select Copy
Put your Cursor on Cell 1 Column G
Select Paste / Special / Transpose / Click OK
All your selected Channel Nmbrs should now be in Row 1
If there are more then one instance of your channel numbers,
see the next section on how to delete duplicates.

Next
Select Column D, all your Time Stamps
including the heading. IE D1:D10
Select Copy
Put your Cursor on Cell 1 Column F
Select Paste
All your selected Time Stamps should not be in COLUMN F
Your selected cells should still be highlighted.
Select, on the Ribbon, Data
Select Remove Duplicates,
A popup window should appear, Click OK
Another popup should appear, telling you how many Duplicates were removed.

Your data should now look like:

          F                G     H     I     J     K     L     M     N     O
1) TimeStamp             3466  3465  3464  3463  3462  3461  3460  3459  3458
2) 07/11/2012 14:37:40                                                                     
3) 07/11/2012 14:37:41                                                                     

Next
In cell G2 enter the formula:

=SUMPRODUCT(--($A$2:$A$10=G$1),--($D$2:$D$10=$F2),$B$2:$B$10)

Drag down & across as many Row/Columns as needed.

If there is no match, then 0.000000 is displayed.

If you want something other than zeros, we will need to
do a bit more.

MIKE

http://www.skeptic.com/


Report •

#5
July 19, 2012 at 08:35:20
Thank you for the reply and suggestion! Very clever use of PRODSUM and the arrays with conditions. I had the idea to use PRODSUM but never thought of nesting a comparison inside it.

This method works, but it is somewhat slow.

My data is 400 000 samples, and 20 channels wanted. In each cell we do Prodsum. That executes a comparison to a number, calculates the product and sum of three 400k long arrays, in each cell of a 400000x20 matrice. That is something like 10^13 operations.

Just doing the first 500x20 cells took around 20-30 minutes.


So, some faster method is needed.

------

I had an idea about using the ChNo and TimeStamp as indices for a matrix.

The TimeStamp (Col D) would represent the row index and the ChNo would represent Col index.

Then just loop through all the rows in A1:D10 (using the example above), and copy the ValueIn to corresponding cell in the sorted data matrice. I imagine this is possible with VBA macro.

The benefit of this is that each row only needs to be read once.

     A         B            C              D
 1) ChNo    ValueIn      ValueOut        TimeStamp
 2) 3466    0.000162                07/11/2012 14:37:40
 3) 3465    0.000139                07/11/2012 14:37:40
 4) 3464    0.000146                07/11/2012 14:37:40
 5) 3463    0.000098                07/11/2012 14:37:40
 6) 3462    0.000161                07/11/2012 14:37:41
 7) 3461    0.000098                07/11/2012 14:37:41
 8) 3460    -0.012350               07/11/2012 14:37:41
 9) 3459    0.000215                07/11/2012 14:37:41
10) 3458    -0.000165               07/11/2012 14:37:41

          F                G     H     I     J     K     L     M     N     O
1) TimeStamp             3466  3465  3464  3463  3462  3461  3460  3459  3458
2) 07/11/2012 14:37:40                                                                     
3) 07/11/2012 14:37:41   

I don't know VBA code, so Ill post my idea in some made up pseudo code. I assume ValueIn, TimeStamp, ChNo are arrays that I can get values from with a index.

The code would be something like

FOR rowIdx = 1:10
ValueToCopy = ValueIn(rowIdx)
CopyToRow = TimeStamp(rowIdx)
CopyToCol = ChNo(rowIdx)
COPY ValueToCopy -> CopyToRow,CopyToCol
END

Some corrections:
CopyToRow needs to lookup the row that specific TimeStamp has, and not the time stamp itself
CopyToCol needs to lookup the column index of the channels (in this case G, H, I, J... O)

Im currently trying to figure out how I do VBA scripts. I think this is the best bet for getting a resonably fast script.

Thanks for any further help! At least now I know I probably cannot do this sorting of data without scripts.


Report •

#6
July 19, 2012 at 11:38:02
Sorry I can't offer more assistance, but my VBA skills are just above nil.

Hopefully someone will offer a quicker solution.

Good luck.

MIKE

http://www.skeptic.com/


Report •

#7
July 19, 2012 at 18:13:13
✔ Best Answer
OK, I'm jumping in a little late here, but I'll give this a shot.

First, the example output data that you provided doesn't seem to have come from the
example input data that you provided.

For example, your input does not show a value for Channel 3466 at 14:37:41 yet your
output does.

I hope I can assume that you posted what you posted for brevity.

If that is correct, then here is what I did:

1 - I copied your input data into Sheet1, so it looks like your raw data from
Response #1.

2 - I created this table on Sheet2 by first using an Advanced Filter to get the
unique Time Stamps into Column A, then using Copy...PasteSpecial...Transpose
to get the Channel numbers into Row 1.

Sheet2:

          A                B     C     D     E     F     G     H     I     J
1  TimeStamp             3466  3465  3464  3463  3462  3461  3460  3459  3458
2  07/11/2012 14:37:40                                                                     
3  07/11/2012 14:37:41   

4 - I then ran this code:

Sub SortBy2Criteria()
'Determine last Row with Channel in Sheet1!Column A (Source)
  lastSourceChannelRow = Sheets(1).Cells(Rows.Count, "A").End(xlUp).Row
'Determine last Row with Time Stamp Sheet2!Column A (Destination)
  lastDstTimeRow = Sheets(2).Cells(Rows.Count, "A").End(xlUp).Row
'Determine last Column with Channel in Sheet2!Row 1 (Destination)
  lastDstChannelColumn = Sheets(2).Cells(1, Columns.Count).End(xlToLeft).Column
'Loop through Channel Rows on Sheet1
    For chanRow = 2 To lastSourceChannelRow
'Find each Source Channel from Sheet1!Column A in Sheet2!Row 1
'Set Column Variable based on result
     With Sheets(2).Range("B1:J" & lastDstChannelColumn)
       Set c = .Find(Sheets(1).Range("A" & chanRow))
         cCol = c.Column
     End With
'Find corresponding Time Stamp in Sheet2!Column A
'Set Row Variable based on result
     With Sheets(2).Range("A2:A" & lastDstTimeRow)
       Set t = .Find(Sheets(1).Range("D" & chanRow))
         tRow = t.Row
     End With
'Use Row and Column variables to copy ValueIn from Sheet1 to Sheet2
       Sheets(2).Cells(tRow, cCol) = Sheets(1).Cells(chanRow, 2)
    Next
End Sub


I ended up with this on Sheet2:

          A               B         C         D         E         F         G         H         I         J
1  TimeStamp            3466      3465      3464      3463      3462      3461      3460      3459      3458
2  07/11/12 14:37:40  0,000162  0,000139  0,000146  0,000098					
3  07/11/12 14:37:41					      0,000161  0,000098  -0,01235  0,000215  -0,000165

Obviously I don't know how this will work with your entire input dataset, but I think
it will do what you want it to.

Click Here Before Posting Data or VBA Code ---> How To Post Data or Code.


Report •

#8
July 20, 2012 at 08:09:59
Hi DerbyDad! Thanks for the help, very appreciated.

You are correct that the data provided is just an example of how I want it, and not the actual result from the sorting.


I followed your method, but ran into an error. If you dont want to read wall of text, the error is:

Step 1-3 worked for me, step 4e was problem. I get "Run-time error '91'" for line "cCol = c.Column".


*****Long description of what I did:

0 - Started a new macro enabled Excel 2007 workbook

1 - Imported my data from a (MS Access file) into Sheet1.

a) Marked the entire dataset in sheet 1 (A2:D346402). Headers A1:D1 omitted.
b) Chose Custom SOrt
-Sort by: TimeStamp, Order: Oldest to Newest
-Then by: ChNo, Order: Smallest to Largest

2 - Used the Advanced Filter and found all unique ChNo from smallest to largest. I then copied the first 20 out of 50. Copy Paste Special and transpose in Sheet2!B1 to U1

3 - Created a list of all TimeStamps from start to stop in Sheet2
a) Copied Sheet1!A2 to Sheet2!A2 and autoFilled so there exists a time stamp for every second between Start and Stop.
b) Using the function to remove non-unique entries is VERY time consuming, that is why I used auto-fill instead. Also I want TimeStamps for all seconds, the Raw data may have skipped a few seconds.

4 - To run the code, I went into Developer -> Design Mode.
a) Insert -> Command Button (ActiveX Control) into Sheet1
b) Right click, View Code.
c) Pasted your code. The button already contained some code.

Private Sub CommandButton1_Click()
End Sub

I pasted the code between the 2 existing lines, and I commented out the first and last line in your code.
d) Saved and closed the VBA editor. Exited Design Mode.
e) Clicked on the button to run the script.
==========
ERROR:
Microsoft Visual Basic
Run-time error '91':
Object variable or With block variable not set.
==========
When I click "Debug" I go to the code for the button, and line with cCol = c.Column is marked with yellow.


So, very lengthy description. I tried not to omit any info that might be important :)

This is the exact code I have on my ActiveX Button:

 
Private Sub CommandButton1_Click()
    ''Sub SortBy2Criteria()
    'Determine last Row with Channel in Sheet1!Column A (Source)
      lastSourceChannelRow = Sheets(1).Cells(Rows.Count, "A").End(xlUp).Row
    'Determine last Row with Time Stamp Sheet2!Column A (Destination)
      lastDstTimeRow = Sheets(2).Cells(Rows.Count, "A").End(xlUp).Row
    'Determine last Column with Channel in Sheet2!Row 1 (Destination)
      lastDstChannelColumn = Sheets(2).Cells(1, Columns.Count).End(xlToLeft).Column
    'Loop through Channel Rows on Sheet1
        For chanRow = 2 To lastSourceChannelRow
    'Find each Source Channel from Sheet1!Column A in Sheet2!Row 1
    'Set Column Variable based on result
         With Sheets(2).Range("B1:U" & lastDstChannelColumn)
           Set c = .Find(Sheets(1).Range("A" & chanRow))
             cCol = c.Column
         End With
    'Find corresponding Time Stamp in Sheet2!Column A
    'Set Row Variable based on result
         With Sheets(2).Range("A2:A" & lastDstTimeRow)
           Set t = .Find(Sheets(1).Range("D" & chanRow))
             tRow = t.Row
         End With
    'Use Row and Column variables to copy ValueIn from Sheet1 to Sheet2
           Sheets(2).Cells(tRow, cCol) = Sheets(1).Cells(chanRow, 2)
        Next
    ''End Sub
End Sub

======

So the questions:
* How do I fix Error 91? I tried searching but since I am a complete VBA novice it didnt find any solution yet.
* Does the data in Sheet1 need to be sorted in any way? If not: can I skip step 2b above?
* Is there a better way to run the script then via ActiveX button? This was the way I learned when doing some VBA tutorials.

Thank you again if you have time to reply!

/Klars


Report •

#9
July 20, 2012 at 10:19:54
I can't test anything right now, but try this:

First, comment out this line:

lastDstChannelColumn = Sheets(2).Cells(1, Columns.Count).End(xlToLeft).Column

then replace this line:

With Sheets(2).Range("B1:U" & lastDstChannelColumn)

with:

With Sheets(2).Range("B1:U1")

I'm not sure why I did it that way, but it's not needed since all of your channel numbers are in Sheet2!Row1. Might have been a late night thing.

However, I don't think that that is your problem.

I assumed that the data set in Response #1 started in Column A. The code is searching for the values in A2 through the bottom of the list and trying to find matching values in Sheet2!Row 1.

Note the Column designation of "A"

Set c = .Find(Sheets(1).Range("A" & chanRow))

If your Channel numbers in Sheet1 are in a different column, you'll need to change the A to match.

In addition, the 2 in this line refers to Column B.

Sheets(2).Cells(tRow, cCol) = Sheets(1).Cells(chanRow, 2)

If your Channel numbers aren't in Column A, I gotta guess that the values you want to copy aren't in Column B.

Change that "2" to be the number of the column which holds the values you want to copy.

A = 1, B = 2, C = 3, etc.

As I said, I can't test this, so you're on your own for now.

As far as using a CommandButton1_Click, I don't see why it wouldn't work. I don't use them, I just assign my own buttons or run the code directly from the VBA editor.

If you are going to be using VBA, you might want to read this How To:

http://www.computing.net/howtos/sho...

Since you'll probably need to maintain this code, you should understand the basics of how to troubleshoot VBA code.

Click Here Before Posting Data or VBA Code ---> How To Post Data or Code.


Report •

#10
July 26, 2012 at 08:56:42
Hi! Sorry I was slow to reply.

I found the reason for the problem for error 91.

I have the items in exactly those colums in the post. However, Sheet2 row 1 does not (and should not) contain all the possible channels. Since I have 50 different channels but only want 20. However as the program was written it gave an error when it tried to find a column for a channel that wasnt in Sheet2.

It would need some conditions that skips to the next iteration if the channel from Sheet1 is not found in Sheet2.

The channels not specified in Sheet2 A1:U1 should not be sorted, those values can just be disregarded.

I worked around this by adding all the 50 channels numbers in A1:AY1.

With this fix, and the fix you wrote above, the program works fine. I ran the program on all my data (346402 rows). It worked and sorted the data up until A254593, where it crashed. The program runs somewhat slow, I left it running overnight.

The resulting matrix of sorted data is much more sparse then I first thought. Each channel only has a value every 30s, sometimes every 60s.

I will need to rething how I want my data structure. Since excel only can plot up to 32000 measurement points and I have 354951 time stamps.

I might need to go with a TimeStamp array for each array of measurement values.


Report •

#11
July 26, 2012 at 09:01:18
I marked this as solved, as I now know _how_ to sort the data. The remaining problems are:

1) speed of sorting
2) limit of 32000 rows in a plot in excel. Might need to re-think how the data is sorted. I wish I didnt have to use excel at all :/

Thanks everyone for helping! That VBA troubleshooting guide DerbyDad03 posted really helps! http://www.computing.net/howtos/sho...

Using "Watches" for the variables and stepping through the code helped me find the reason for the errors.


Report •

#12
July 26, 2012 at 09:30:58
Thanks for the validation of my How-To instructions.

Much appreciated!

Click Here Before Posting Data or VBA Code ---> How To Post Data or Code.


Report •

Ask Question