Investigate MDX DistinctCount discrepancy - sql-server

When I run this:
WITH MEMBER MEASURES.SETDISTINCTCOUNT AS
DISTINCTCOUNT([Student Term].[Student ID].MEMBERS)
SELECT {MEASURES.SETDISTINCTCOUNT} ON 0 ,
[Student Term].[Term Code].&[1151] on 1
FROM [Enrollment]
I get a student count that agrees with the following sql (I'll refer to this correct count as "count A"):
SELECT COUNT(DISTINCT([student_id]))
FROM dbo.Fact_Enrollments
WHERE Term_Code = '1151'
but unfortunately when I run this MDX (using a different but similar dimension) I get a different count that is less than "count A":
WITH MEMBER MEASURES.SETDISTINCTCOUNT AS
DISTINCTCOUNT([Student Term].[Student ID].MEMBERS)
SELECT {MEASURES.SETDISTINCTCOUNT} ON 0 ,
[Term].[Term Type].[Academic Term].&[1151] ON 1
FROM [Enrollment]
I am not sure how to figure out what is going wrong in the second mdx query (more directly, what is going wrong in that "Term" dimesion). At first I thought maybe the Dim_Term table wasn't completely joining to the fact table (Fact_Enrollments) but this query which joins the two, does return "count A" (the correct count):
SELECT COUNT(DISTINCT([student_id]))
FROM dbo.Fact_Enrollments
INNER JOIN dbo.Dim_Term ON Acad_Term_Cd=Term_Code
WHERE Acad_Term_Cd = '1151'
I thought that maybe the best way to see what is going on is to find a list of all the distinct Student IDs that went into the first count and then do the same for the second count and take a deeper look at those in the first list but not the second but I do not know how to determine what the student IDs are that lead to the CountDistinct results that I am seeing.
I have tried a couple things beyond what I am writing here (left out because this is already pretty long) but I keep coming up with "count A" as my total result.
How can I find the list of distinct members that are counted by a DistinctCount call?
(or, alternatively what is the best way to discover the cause of this discrepancy)

I am pretty sure your relationships are messed up.
There is an important thing you should know. DISTINCTCOUNT is actually Count of distinct NON-EMPTY.
DistinctCount
Instead of the second query, try the below:
WITH MEMBER MEASURES.SETDISTINCTCOUNT AS
COUNT(DISTINCT([Student Term].[Student ID].MEMBERS))
SELECT {MEASURES.SETDISTINCTCOUNT} ON 0 ,
[Term].[Term Type].[Academic Term].&[1151] ON 1
FROM [Enrollment]
This counts the empty cells too and thus should return a bigger value.

As a slight tweak to Sourav's script please add the EXISTING keyword:
WITH MEMBER MEASURES.SETDISTINCTCOUNT AS
COUNT(DISTINCT(EXISTING [Student Term].[Student ID].MEMBERS))
SELECT
{MEASURES.SETDISTINCTCOUNT} ON 0 ,
[Term].[Term Type].[Academic Term].&[1151] ON 1
FROM [Enrollment];

Related

MSSQL select query with prioritized OR

I need to build one MSSQL query that selects one row that is the best match.
Ideally, we have a match on street, zip code and house number.
Only if that does not deliver any results, a match on just street and zip code is sufficient
I have this query so far:
SELECT TOP 1 * FROM realestates
WHERE
(Address_Street = '[Street]'
AND Address_ZipCode = '1200'
AND Address_Number = '160')
OR
(Address_Street = '[Street]'
AND Address_ZipCode = '1200')
MSSQL currently gives me the result where the Address_Number is NOT 160, so it seems like the 2nd clause (where only street and zipcode have to match) is taking precedence over the 1st. If I switch around the two OR clauses, same result :)
How could I prioritize the first OR clause, so that MSSQL stops looking for other results if we found a match where the three fields are present?
The problem here isn't the WHERE (though it is a "problem"), it's the lack of an ORDER BY. You have a TOP (1), but you have nothing that tells the data engine which row is the "top" row, so an arbitrary row is returned. You need to provide logic, in the ORDER BY to tell the data engine which is the "first" row. With the rudimentary logic you have in your question, this would like be:
SELECT TOP (1)
{Explicit Column List}
realestates
WHERE Address_Street = '[Street]'
AND Address_ZipCode = '1200'
ORDER BY CASE Address_Number WHEN '160' THEN 1 ELSE 2 END;
You can't prioritize anything in the WHERE clause. It always results in ALL the matching rows. What you can do is use TOP or FETCH to limit how many results you will see.
However, in order for this to be effective, you MUST have an ORDER BY clause. SQL tables are unordered sets by definition. This means without an ORDER BY clause the database is free to return rows in any order it finds convenient. Mostly this will be the order of the primary key, but there are plenty of things that can change this.

COUNT, IIF usage for counting records that also have a specific field value matched

Using MS Access and I have two tables, one is categories and the other is content.
My initial SQL statement, included below,takes a count of each content associated to a category and returns the count associated with each category.
So for each CATEGORY, I'm simply trying to return another count in which I count CONTENT that have a specific user level and are not deleted for each CATEGORY.
Below is what I am struggling with as I am not certain you can actually use COUNT like this.
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2)) AS userLevelCount
This is the full select statement with my addition but not working.
SELECT
CATEGORY.categoryId,
CATEGORY.categoryTitle,
CATEGORY.categoryDate,
CATEGORY.userLevel,
Last(CONTENT.contentDate) AS contentDate,
CATEGORY.isDeleted AS categoryDeleted,
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) AS countTotal,
COUNT(IIf([CONTENT.isDeleted]=1,[CONTENT.contentID],Null)) AS countDeleted,
COUNT([CONTENT.categoryId]) - COUNT(IIf([CONTENT.isDeleted]=1,[CONTENT.contentID],Null))AS countDifference,
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2)) AS userLevelCount
FROM CATEGORY
LEFT JOIN CONTENT ON
CATEGORY.categoryId = CONTENT.categoryId
GROUP BY
CATEGORY.categoryId,
CATEGORY.categoryTitle,
CATEGORY.categoryDate,
CATEGORY.userLevel,
CATEGORY.isDeleted
HAVING (((CATEGORY.isDeleted)=0))
ORDER BY
CATEGORY.categoryTitle
you should be able to use the following
SUM(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2,1,NULL)) AS userLevelCount
COUNT will not count NULL, but it will count zero. SUM will calculate the sum of all 1's - that's a second way of achieving the same.
IIF exists in the newer SQL versions
I believe I found the solution
Count(IIf([CONTENT.userLevel]=2,[CONTENT.contentID],Null)) AS countDifference2
This will return the count difference for CONTENT for each CATEGORY that isn't deleted and has a specific user level.

Cypher returns exponential count result on 'join'

I am learning Cypher on Neo4j and I am having trouble in understanding how to perform an efficient 'join' equivalent in Cypher.
I am using the standard Matrix character example and I have added some nodes to the mix called 'Gun' with a relation of ':GIVEN_TO'. You can see the console with my query result here:
http://console.neo4j.org/r/rog2hv
The query I am using is:
MATCH (Neo:Crew { name: 'Neo' })-[:KNOWS*..]->(other:Crew),(other)<-[:GIVEN_TO]-(g:Gun),(Neo)<-[:GIVEN_TO]-(g2:Gun)
RETURN count(g2);
I have given Neo 4 guns, but when I perform the above I get a count of '12'. This seems to be the case because there are 3 'others' and 3*4 = 12. So I get some exponential result.
What should my query look like to get the correct count ('4') from the example?
Edit:
The reason I am not querying through Guns directly as suggested by #ceej is because in my real use case I have to do this traversal as described above. Adding DISTINCT does not do anything for my result.
The reason you get 12 guns instead of 4 is because your query produces a cartesian product. This is because you have asked for items in the same match statement without joining them. #ceej rightly pointed out if you want to find Neo's guns you would do as he suggested in his first query.
If you wanted to get a list of the crew members and their guns then you could do something like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN crew.name, collect(g.name)
Which finds all of the crew members with guns and returns their name and the guns that they were given.
If you wanted to invert it and get a list of the guns and the respective crew members they were give to you could do the following...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN g.name, collect(crew.name)
If you wanted to find all of the crew that knew Neo multiple levels deep that were given a gun you could write the query like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
WITH crew, g
MATCH (neo:Crew {name: 'Neo'})-[:KNOWS*0..]->(crew)
RETURN crew.name, collect(g.name)
That finds all the crew that were given guns and then determines which of them have a :KNOWS path to Neo.
Forgive me, but I am am unclear why you have the initial MATCH in your query. From your explanation it would appear that you are trying to get the number of :Gun nodes linked to Neo by the :GIVEN_TO relationship. In which case all you need is the latter part of your query. Which would give you something like
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count(g)
Furthermore, to make sure that you are only counting distinct :Gun nodes you can add DISTINCT to the RETURN statement.
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count( DISTINCT g )
This is possibly unnecessary in your case but can be helpful when the pattern that you are matching on can arrive at the same node by different traversals.
Have I misunderstood your requirement?

Equivalent of "IN" that uses AND instead of OR logic?

I know I'm breaking some rules here with dynamic SQL but I still need to ask. I've inherited a table that contains a series of tags for each ticket that I need to pull records from.
Simple example... I have an array that contains "'Apples','Oranges','Grapes'" and I am trying to retrieve all records that contain ALL items contained within the array.
My SQL looks like this:
SELECT * FROM table WHERE basket IN ( " + fruitArray + " )
Which of course would be the equivalent of:
SELECT * FROM table WHERE basket = 'Apples' OR basket = 'Oranges' OR basket = 'Grapes'
I'm curious if there is a function that works the same as IN ( array ) except that it uses AND instead of OR so that I can obtain the same results as:
SELECT * FROM table WHERE basket LIKE '%Apples%' AND basket LIKE '%Oranges%' AND basket LIKE '%Grapes%'
I could probably just generate the entire string manually, but would like a more elegant solution if at all possible. Any help would be appreciated.
This is a very common problem in SQL. There are basically two solutions:
Match all rows in your list, group by a column that has a common value on all those rows, and make sure the count of distinct values in the group is the number of elements in your array.
SELECT basket_id FROM baskets
WHERE basket IN ('Apples','Oranges','Grapes')
GROUP BY basket_id
HAVING COUNT(DISTINCT basket) = 3
Do a self-join for each distinct value in your array; only then you can compare values from multiple rows in one WHERE expression.
SELECT b1.basket_id
FROM baskets b1
INNER JOIN baskets b2 USING (basket_id)
INNER JOIN baskets b3 USING (basket_id)
WHERE (b1.basket, b2.basket, b3.basket) = ('Apples','Oranges','Grapes')
There may be something like that in full text search, but in general, I sincerely doubt such an operator would be very useful, outside the conjunction with LIKE.
Consider:
SELECT * FROM table WHERE basket ='Apples' AND basket = 'Oranges'
it would always match zero rows.
If basket is a string, like your example suggests, then the closest you could get would be to use LIKE '%apples%oranges%grapes%', which could be built easily with '%'.implode('%', $tags).'%'
The issue with this is if some of 'tags' might be contained in other tags, e.g. 'supercalifragilisticexpialidocious' LIKE '%super%' will be true.
If you need to do LIKE comparisons, I think you're out of luck. If you are doing exact comparisons invovling matching sets in arbitrary order, you should look into the INTERSECT and EXCEPT options of the SELECT statement. They're a bit confusing, but can be quite powerful. (You'd have to parse your delimited strings into tabular format, but of course you're doing that anyway, aren't you?)
Are the items you're searching for always in the same order within the basket? If yes, a single LIKE should suffice:
SELECT * FROM table WHERE basket LIKE '%Apples%Oranges%Grapes%';
And concatenating your array into a string with % separators should be trivial.

Query for sorting by name

i have a array of strings i want to write a query such that
first it will count the Id from a particular table say products on the basis of first product names that is in the array
if the value is more than one then select the top one id from the list randomly
otherwise if the count comes one or zero then perform the same query with next value from the string array.
can any body suggest a suitable query for this condition
I assume that the array data are in TMP_ARRAY.
You have two sets products and array, You want to retrieve from products only those entires that are in arrays. For this You should use inner join with proper on clause, next you want to set those results in groups GROUP BY by column name, as you want to operate only on those results that single group size is bigger then one, the clause having needs to be added at the end you want to pick from them 'top' id. And here we have a problem what You mean by 'top'? In data base the data don't have any specific order. I assume that is about the largest, other case in this situation mean only one result, so for both cases we select MAX(ID).
In result I came up with something like this
SELECT MAX(ID) FROM PRODUCTS p INNER JOIN TMP_ARRAY t ON t.NAME = p.NAME GROUP BY p.NAME HAVING COUNT(ID) > 1;
But I'm not sure that i have had understand everything correct.

Resources