How to count multiple unique values across multiple columns?

June 29, 2020 at 09:54:48
Specs: Windows 10
I have a large excel spreadsheet with all of the drug/alcohol tests that our department completed in 2019. Because a client can be tested for multiple things using multiple equipment on a given day, they can appear multiple times on the sheet for the same day. I needed a way to count how instances we completed any test, not just how many times they appeared on the spreadsheet. I wrote the following expression,

=SUMPRODUCT(1/COUNTIFS('2019Testing_Qry'!A2:$A$4961,'2019Testing_Qry'!A2:$A$4961,
'2019Testing_Qry'!$G$2:$G$4961,'2019Testing_Qry'!$G$2:$G$4961))

This counts the number of times any test was completed, 3164. I want to take this one step further to see how many of these were "Negative" and how many were "Positive. I modified my expression as follows,

=SUMPRODUCT(1/COUNTIFS('2019Testing_Qry'!A2:$A$4961,'2019Testing_Qry'!A2:$A$4961,
'2019Testing_Qry'!$G$2:$G$4961,'2019Testing_Qry'!$G$2:$G$4961,
'2019Testing_Qry'!$H$2:$H$4961,"=Negative"))

I keep getting a, #DIV/0! error and cannot figure it out. Any assistance you can provide would be greatly appreciated. Thank you for your assistance!


See More: How to count multiple unique values across multiple columns?

Reply ↓  Report •

#1
June 29, 2020 at 20:05:17
Not quite sure of your formula,
but why are you Dividing 1 by the result of your CONTIFS()?

=SUMPRODUCT( 1/C OUNTIFS('2019Testing_Qry'!A2:$A$4961,'2019Testing_Qry'!A2:$A$4961,
'2019Testing_Qry'!$G$2:$G$4961,'2019Testing_Qry'!$G$2:$G$4961))

Also, without knowing what your data consist of, is it numerical or alphabetical or a combination, we have no way of knowing what else is happening, please post an example of your data.

MIKE

http://www.skeptic.com/


Reply ↓  Report •

#2
June 30, 2020 at 06:39:05
Thanks for asking for clarification, I can see now that I need to provide more information. In our office we complete hundreds of drug and alcohol test on our clients per month. Each time a client is tested, the officer can use different tools to complete the tests depending on the substances the officer is testing for. For example, we can test John Doe on 06/01/2020 for basic Drugs using one product and test him for alcohol specifically using another product resulting in two tests on one client on the same day. I used the expression:

=SUMPRODUCT(1/COUNTIFS('2019Testing_Qry'!A2:$A$4961,'2019Testing_Qry'!A2:$A$4961,
'2019Testing_Qry'!$G$2:$G$4961,'2019Testing_Qry'!$G$2:$G$4961))

to get an accurate number reflecting the number of testing events and NOT the number of products used to test John Doe, See example of the data below. The above formula works great to calculate the testing events, +3100 in 2019.
I need to now calculate the number of "Negative" Tests and the number "Positive" and when I include this conditions, I'm getting the #DIV/0! error. This commonly occurs when dividing by "0" or there are "Blank" spaces within the range. I have doubled and triple checked and this does not apply.

Example Data:

ID	LastNameFirstNameSexDateOfBirth	TestingTypeDrugTestAdministeredDate	Drug Test Results
14	Smith	Jane	Female	11/1/1984	Drug	8/28/2019	Negative
14	Smith	Jane	Female	11/1/1984	ETG	8/28/2019	Positive
21	Doe	John	Male	11/5/1973	Drug	6/6/2019	Positive
21	Doe	John	Male	11/5/1973	Drug	6/6/2019	Positive
21	Doe	John	Male	11/5/1973	Drug	9/25/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	4/29/2019	Positive
21	Doe	John	Male	11/5/1973	Drug	10/16/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	9/12/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	10/30/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	9/19/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	10/3/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	12/31/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	10/9/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	11/21/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	11/27/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	12/3/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	12/18/2019	Negative
21	Doe	John	Male	11/5/1973	Drug	12/26/2019	Negative

Th columns are A-H.

As you can see in the above sample data, Jane Smith was tested on 08/28/2019 and she appears twice because the Officer used two types of testing products, Drug & ETG. The first expression only counts her "ONCE" reflecting the testing event on 08/28/2019 and NOT the number of testing products used. The expression is looking in column A "ID", and Column G, "Drug Test Administered Date" to count the unique values only. I hope this explains the first expression well enough.

From here I intended to expand the same expression to not only count the unique values in Columns A "ID" and Column G "Drug Test Administered Date", but also only those in Column H "Drug Test Results" that are Negative.

I hope this better explains what I am attempting, any assistance you can provide would be hugely helpful. Thank you!


Reply ↓  Report •

#3
June 30, 2020 at 13:25:06
Thanks for posting the data.

So, for the data in the example you posted, your original formula return the number 16.
Is this correct?

IE
Jane Smith was tested twice on 8/28/20 so we have 1
John Doe was tested twice on 6/6/20 so we have 1
John Doe was then tested for the remainder of the 14 rows
So we get a total of 16.

Aren't we just counting the number of individual dates a test was given?

MIKE

http://www.skeptic.com/


Reply ↓  Report •

Related Solutions

#4
July 1, 2020 at 11:30:35
Yes, the expression returned 16 testing events for these two clients. The above data example is just a sample, the Data itself is 6000+ records. In the above example it does appear as those counting the unique dates would provide an accurate number. However, we often have 30+ testing event per day and supervise more that 1800 clients. As a result of this, we must not only count the unique dates, but also the unique ID that is assigned to each client. Doing this give us the accurate number.

The expression I wrote, =SUMPRODUCT(1/COUNTIFS('2019Testing_Qry'!A2:$A$4961,'2019Testing_Qry'!A2:$A$4961,
'2019Testing_Qry'!$G$2:$G$4961,'2019Testing_Qry'!$G$2:$G$4961,'2019Testing_Qry'!$H$2:$H$4961,
"=Negative"))

should work. It counts the unique text in column A and in column G, if column H is "negative". At least i think it should work, it obviously does not. Thank you in advance for your assistance. I'm looking forward to your feedback.


Reply ↓  Report •

#5
July 1, 2020 at 13:22:42
Perhaps if you posted some sample data that was not so "straight forward", along with the expected result of your formula and an explanation as to why you expect that result, we'd have a better idea of your ultimate goal.

Something like:

"Based on the following data I expect to get a value of X because there are A unique ID's that correspond to B unique dates and show C Negative test results."

Please keep in mind that your work process and understanding of the data is ingrained in your brain. We are walking into this with no prior knowledge of what to expect or ultimately what you are looking for in terms of a result.


message edited by DerbyDad03


Reply ↓  Report •

#6
July 2, 2020 at 06:16:20
First, let me thank you all for taking the time to help me. You are not required to do so and I appreciate your efforts. I'm sorry that I'm struggling to accurately describe what I'm trying to do. I think I've come up with an example using the sample data that I've provided previously.

So far, we all seem to be on the same page with the first expression. I am counting the unique text in column A "ID" and G "DrugTestAdministeredDate". Considering the sample data, we get a total of 16 testing events.

Example Data:

ID	LastNameFirstNameSexDateOfBirth	TestingTypeDrugTestAdministeredDate	Drug Test Results
<s>14	Smith	Jane	Female	11/1/1984	Drug	8/28/2019	Negative</s>
14	Smith	Jane	Female	11/1/1984	ETG	8/28/2019	Positive
21	Doe	John	Male	11/5/1973	Drug	6/6/2019	Positive
21	Doe	John	Male	11/5/1973	Drug	6/6/2019	Positive
<s>21	Doe	John	Male	11/5/1973	Drug	9/25/2019	Negative</s>
21	Doe	John	Male	11/5/1973	Drug	4/29/2019	Positive
<s>21	Doe	John	Male	11/5/1973	Drug	10/16/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	9/12/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	10/30/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	9/19/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	10/3/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	12/31/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	10/9/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	11/21/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	11/27/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	12/3/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	12/18/2019	Negative</s>
<s>21	Doe	John	Male	11/5/1973	Drug	12/26/2019	Negative</s>

In the above sample data, I've "strike through" the rows that I'm intending to only count. The goal of the expression is to count the number of negative testing events of the 16 total testing events in this sample data. Within the sample data, the result of the expression should be 14. The testing events for John Doe on 04/29/2020 & 06/06/2019 were Positive, With Jane Smith, she had one testing event on 08/28/2019 and two testing products were used, "Drug" & "ETG", and one of those products was negative and the other positive. In the original expression, we counted her once as there was one event on 08/28/2019. In this expression, she would be counted once because one of the testing products came back negative.

Another way of thinking about would be: Imagine that Jane Smith was tested on 08/28/2019 and 4 testing products were used. She would appear on the list 4 times and this would be one testing event. 3 of the 4 testing products used were negative, and one positive. In this example, there is one testing event on 08/28/2019 (first expression), and there is one negative test (second expression), the one I'm struggling with.

I hope this better explains what I'm trying to accomplish, I can see how this is confusing. It would seem to be easier to just count the number Positive and Negative. But, I'm trying to get an overall Positive and Negative. From there, I will break it down further by testing product, i.e. How many "Drug" tests were Negative and how many positive, etc...

Thanks again for all the assistance and taking the time.


Reply ↓  Report •

#7
July 2, 2020 at 09:34:01
I've been playing with this and in the SUMPRODUCT() section you are dividing 1 by the results of the COUNTIFS(), but when you try to add the

'2019Testing_Qry'!$H$2:$H$4961,"=Negative"

you are asking for a TRUE / FALSE comparison, and since
TRUE = 1 and FALSE = 0
the first time you hit a FALSE return, you get into a DIV/0 situation.

I'm still working on this, but I think you may be "over thinking" this.

To get the number of Positive or Negative results within the range specified in the spreadsheet, just do a

=COUNTIF(H2:H19,"Negative")

If your interest in a specific person, IE Jane Smith
you will need to add the ID

=COUNTIFS(H2:H19,"Negative",A2:A19,14)

MIKE

http://www.skeptic.com/


Reply ↓  Report •

#8
July 2, 2020 at 11:22:16
A posting tip...

Once you apply the pre tags to align data in this forum, no other tags will work. Thus, we see your attempt to strike-through the Negative events, but that format was not actually applied.

message edited by DerbyDad03


Reply ↓  Report •

#9
July 2, 2020 at 12:17:08
Mike:

re: '2019Testing_Qry'!$H$2:$H$4961,"=Negative"

you are asking for a TRUE / FALSE comparison, and since TRUE = 1 and FALSE = 0 the first time you hit a FALSE return, you get into a DIV/0 situation.

I had the same thought and was trying a few things. I know that you can use an expression within a COUNTIFS criteria (e.g. "=" & TODAY()+1)

I wonder if there is a way to add a minuscule number to the FALSE to force it to be just slightly larger than 0 in order to eliminate the #DIV/0 error. I've tried a bunch of expressions within the criteria argument, but none of them have worked so far.

message edited by DerbyDad03


Reply ↓  Report •

#10
July 2, 2020 at 14:32:17
DerbyDad03

I understand what you mean, but don't think it can be done.
True is 1 & False is 0
If it's not 0 then it's not False. Correct?

Haus

I can see how this is confusing. It would seem to be easier to just count the number Positive and Negative. But, I'm trying to get an overall Positive and Negative.

Isn't getting a Total Count of Positive or Negative the same as getting an overall picture? I don't see the distinction.

MIKE

http://www.skeptic.com/


Reply ↓  Report •

Ask Question