Getting wrong results whith specific "WHERE" condition - sql-server

gurus.
I'm stuck with my problem and will appreciate any help or suggestion. Please check this pic.
I don't understand why I'm getting wrong result in bottom query. As you can see the difference with previuos query is only in "WHERE" clause, but this difference must lead to the same results since it's one-to-one join.
Important thing is that v_last_part_info is as a view and I changed it recently. I thought it's due to QEP cache, but i tried OPTION (RECOMPLIE) and even solution described here. The result is still same.
Please, help! What am I missing?
P.S.: [OBJECT_ID] is a column name, not built-in function
P.P.S: ANOTHER_DB has different collation that's the reason i need collate database_default
select Tracking
, SoItem
from v_last_part_info
where Tracking = '4170664293'
Tracking SoItem
4170664293 20
--================================================================================--
select
lpi.Tracking
, lpi.SoItem
from v_last_part_info lpi
join ANOTHER_DB..SO_HEADER h on lpi.Tracking = h.[OBJECT_ID] collate database_default
where Tracking = '4170664293'
Tracking SoItem
4170664293 20
--================================================================================--
select
lpi.Tracking
, lpi.SoItem
from v_last_part_info lpi
join ANOTHER_DB..SO_HEADER h on lpi.Tracking = h.[OBJECT_ID] collate database_default
where [OBJECT_ID] = '4170664293'
Tracking SoItem
4170664293 10

Thanks to GarethD I found out the reason. This happened because of row_number() function inside v_last_part_info. Accorgin to the definition at MSDN:
There is no guarantee that the rows returned by a query using
ROW_NUMBER() will be ordered exactly the same with each execution
unless the following conditions are true.
Values of the partitioned column are unique.
Values of the ORDER BY are unique.
Combinations of values of the partition column and ORDER BY columns are unique.
In my case option 2 was not secured.

Related

No results with DISTINCT - Multiple tables

I am trying to run the following query which gets data from two tables but I get no results:
SELECT DISTINCT([dbo].[SF].Assignment), [X].Code
FROM [dbo].[SF], [dbo].[X]
WHERE (CHARINDEX([X].Code, [dbo].[SF].Assignment COLLATE SQL_Latin1_General_CP1_CI_AS) > 0)
AND [dbo].[SF].Code = 'NULL'
When I remove the DISTINCT, I get way too many results because of my big data set that causes an out of memory exception.
Removing parentheses with the DISTINCT as well as changing to an INNER JOIN fixed my issues. #Tab Alleman and #squillman can add answer here so I could mark them as correct.

Using CAST, CONCAT and COLLATE with a LEFT OUTER JOIN

I'm trying to CONCAT two columns and also use CAST and COLLATE but keep getting a host of different errors when I try to fix them in a way I think would work.
Basically I am trying to CONCAT two columns together but I get a collation conflict. So then, I try and COLLATE the two columns and I then get a datatype is invalid for COLLATE error. After this I try to CAST the column giving me the error to change it to a varchar but it doesn't work. I'm just unsure how to make all 3 work together.
SELECT TransactionHeader.TransactionType,
TransactionHeader.TicketStub,
CAST ( TransactionHeader.TransactionNumber AS nvarchar(8)) AS [TN],
TransactionHeader.ActualAmount,
Currencies.SwiftCode,
TransactionHeader.CurrencyID,
Divisions.ShortName,
DealHeader.StartDateNumber,
DealHeader.EndDateNumber,
CONCAT (TransactionHeader.TicketStub,
TransactionHeader.TransactionNumber) AS [DealRef]
FROM Company.dbo.TransactionHeader TransactionHeader
LEFT OUTER JOIN Company.dbo.DealHeader DealHeader
ON TransactionHeader.THDealID=DealHeader.DHDealID
LEFT OUTER JOIN Company.dbo.Currencies Currencies
ON TransactionHeader.CurrencyID=Currencies.CRRecordID
LEFT OUTER JOIN Company.dbo.Divisions Divisions
ON TransactionHeader.PrimaryPartyID=Divisions.DVRecordID
WHERE TransactionHeader.TicketStub COLLATE DATABASE_DEFAULT
= TransactionHeader.TransactionNumber COLLATE DATABASE_DEFAULT
All in all, I just want to CONCAT the TicketStub and TransactionNumber Columns but I am not sure how to get past the errors I'm getting. As far as the COLLATE goes I'm still kind of usnsure how it even works, I just know to fix the collation error I need to do it. I am very new to T-SQL and have only been writing it for the past month and a half so please, any advice at all would be very helpful. Thank you!
Collation is a setting that determines how a DB should treat character data at either the server, database, or column level. There's a really good blog on this at red-gate.. Each server, and database, will have a collation. It's common for the databases and server to match, since by default a database will inherit this setting from the model database. It is uncommon to see column level collation, but that seems to be what you have here since all of your tables are coming from the same DATABASE.
You will need to figure out what the collation is on those columns. Dave Pinal has a good write up on this on his blog. You can also do this a few other ways. See the docs for that.
Once you have your collation, you can then collate the CONCAT. It will look something like the below. Here I just use the DATABASE_DEFUALT which would probably work in your case:
CONCAT(TransactionHeader.TicketStub COLLATE DATABASE_DEFAULT,TransactionHeader.TransactionNumber COLLATE DATABASE_DEFAULT) AS [DealRef]
You can find more examples of COLLATE WITH CONCAT in this answer and this one

How to determine which row caused an error - ERROR CHECKING

I saw that another poster (WolfiG) had asked a very similar question, but I don't see the answer.
(I understand the error message, and am NOT looking for how to fix the error, I AM trying to determine whether there is any type of error checking (or debugging technique) to use for other errors.
I found similar code and added to it, to get each table in each database on a server...and to show the PKs and rowcounts. (Working on an inventory and then a data dictionary - but this is not the issue, I am just explaining how the code came to be.)
SO in this case I had already narrowed down the data somewhat (meaning I know which database I am querying and encountered the error) so there were "only" 81 tables (rows) that could have been in error. BUT of course that would require too much manual checking to find the problem, so I was hoping that there was a way to see which "t.name" was being read when the sub-query got the error. (I narrowed it down the the first sub-query by commenting out a line at a time (there were 10 and I just did not copy all of them here.)
Again the question is (and it looks like I am not the only person asking this type of question) -
Is there a way to determine which row (data) caused an error in a query?
and in the example below,
Is there a way to display the t.name (or other column's data) was the most recent when an error occured?
Having worked in Mainframe - you can look at a dump or buffer. But I guess I don't want to expect too much since I keep seeing cryptic messages, where obviously the system has the info but doesn't display it.
Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
use [DBName]
SELECT '[DBName]' as DBName, t.NAME, it.xtype,i.rowcnt,
(select c.name from syscolumns c inner join sysindexkeys k on k.indid=i.indid and c.colid = k.colid and c.id=t.id and k.keyno=1 and k.id=t.id)as 'column1',
(select c.name from syscolumns c inner join sysindexkeys k on k.indid=i.indid and c.colid = k.colid and c.id=t.id and k.keyno=2 and k.id=t.id)as 'column2'
from sysobjects t inner join sysindexes i on i.id=t.id
LEFT OUTER JOIN sysobjects it on it.parent_obj=t.id and it.name = i.name
WHERE i.indid < 2 AND OBJECTPROPERTY(t.ID, 'IsMSShipped') = 0
Thanks!
By the way, how I was able to find the row causing this error (in case this would help anyone else) is that I changed the first sub-query to "select count(c.name) from syscolumns" and then looked for any rows where the number was > 1.
Sebastian Meine Thank you, but now I've hit another problem. (I had to say that I am pretty much sadly disputing your comment that SQL is better than other DBs for the dictionary. GRRRR)
I was using " i.indid < 2" because - first off intially I was trying to get rowcnt and honestly the samples all use it. But I see how the query you pasted got far more rows and that the original missed - apparently because of the check: i.indid < 2
HOWEVER, with your query I am now getting many more occurences of the same table and I cannot see how it would be possible to "summarize" /ARGHHH while I am typing this up I am seeing that the COLS are not only primary keys and the same table name shows up (and with different row counts and it looks like in many cases there was a row for every column in the table - and NOT in other cases). So looking for a pattern, I found that MAYBE sysindexes.status = might limit my results - but while that was close there still were duplicate rows. Again ARGGGH
SQL Server does not provide the ability to do row by row debugging. Everything is handled as a set. If one element does not conform to the rules the whole query fails.
So, to find the offending row you need to write a separate query, as you already did.
The catalog vies in SQL 2000 have not been that great but since have been greatly improved and provide a lot more detail than most other RDBMSs out there. But since you seem to have the requirement to write one query that runs on all, try this:
SELECT DB_ID() AS DBName,
o.name,
i.rowcnt,
i.keycnt,
cols.column1,
cols.column2,
cols.column3,
cols.column4
FROM sysobjects o
JOIN sysindexes i
ON o.id = i.id
JOIN (
SELECT k.id,
k.indid,
MAX(CASE WHEN k.keyno = 1 THEN c.name END) AS column1,
MAX(CASE WHEN k.keyno = 2 THEN c.name END) AS column2,
MAX(CASE WHEN k.keyno = 3 THEN c.name END) AS column3,
MAX(CASE WHEN k.keyno = 4 THEN c.name END) AS column4
FROM sysindexkeys k
JOIN syscolumns c
ON k.id = c.id
AND k.colid = c.colid
GROUP BY k.id, k.indid
) cols
ON i.id = cols.id
AND i.indid = cols.indid
I do not have a SQL 2000 version running anymore to try this on, but I believe it will run.
Two things to be aware of:
The sysindexes.rowcnt value was very unreliable in SQL 2000. The value you find in sys.indexes in later versions is reliable but not ACID compliant.
There can be up to 16 kolumns in an index. I added 4 to the query and also added the keycnt column, so you know how many you are missing.
What I find useful is re-structuring the sql to make it easier to trouble shoot. I would avoid sub-queries in general unless they are absolutely needed.
Otherwise, if you do use sub-queries, you also need to account for the case where they return more than one result in a scalar context.
Typically most sub-queries like this can be re-written as joins. Then, you can also directly see what data is duplicated -- you can join the table to itself on the duplicated field such that the id's are not equal.

Order Of Execution of the SQL query

I am confused with the order of execution of this query, please explain me this.
I am confused with when the join is applied, function is called, a new column is added with the Case and when the serial number is added. Please explain the order of execution of all this.
select Row_number() OVER(ORDER BY (SELECT 1)) AS 'Serial Number',
EP.FirstName,Ep.LastName,[dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole) as RoleName,
(select top 1 convert(varchar(10),eventDate,103)from [3rdi_EventDates] where EventId=13) as EventDate,
(CASE [dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole)
WHEN '90 Day Client' THEN 'DC'
WHEN 'Association Client' THEN 'DC'
WHEN 'Autism Whisperer' THEN 'DC'
WHEN 'CampII' THEN 'AD'
WHEN 'Captain' THEN 'AD'
WHEN 'Chiropractic Assistant' THEN 'AD'
WHEN 'Coaches' THEN 'AD'
END) as Category from [3rdi_EventParticipants] as EP
inner join [3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId
where EP.EventId = 13
and userid in (
select distinct userid from userroles
--where roleid not in(6,7,61,64) and roleid not in(1,2))
where roleid not in(19, 20, 21, 22) and roleid not in(1,2))
This is the function which is called from the above query.
CREATE function [dbo].[GetBookingRoleName]
(
#UserId as integer,
#BookingId as integer
)
RETURNS varchar(20)
as
begin
declare #RoleName varchar(20)
if #BookingId = -1
Select Top 1 #RoleName=R.RoleName From UserRoles UR inner join Roles R on UR.RoleId=R.RoleId Where UR.UserId=#UserId and R.RoleId not in(1,2)
else
Select #RoleName= RoleName From Roles where RoleId = #BookingId
return #RoleName
end
Queries are generally processed in the follow order (SQL Server). I have no idea if other RDBMS's do it this way.
FROM [MyTable]
ON [MyCondition]
JOIN [MyJoinedTable]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
SQL is a declarative language. The result of a query must be what you would get if you evaluated as follows (from Microsoft):
Logical Processing Order of the SELECT statement
The following steps show the logical
processing order, or binding order,
for a SELECT statement. This order
determines when the objects defined in
one step are made available to the
clauses in subsequent steps. For
example, if the query processor can
bind to (access) the tables or views
defined in the FROM clause, these
objects and their columns are made
available to all subsequent steps.
Conversely, because the SELECT clause
is step 8, any column aliases or
derived columns defined in that clause
cannot be referenced by preceding
clauses. However, they can be
referenced by subsequent clauses such
as the ORDER BY clause. Note that the
actual physical execution of the
statement is determined by the query
processor and the order may vary from
this list.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
The optimizer is free to choose any order it feels appropriate to produce the best execution time. Given any SQL query, is basically impossible to anybody to pretend it knows the execution order. If you add detailed information about the schema involved (exact tables and indexes definition) and the estimated cardinalities (size of data and selectivity of keys) then one can take a guess at the probable execution order.
Ultimately, the only correct 'order' is the one described ion the actual execution plan. See Displaying Execution Plans by Using SQL Server Profiler Event Classes and Displaying Graphical Execution Plans (SQL Server Management Studio).
A completely different thing though is how do queries, subqueries and expressions project themselves into 'validity'. For instance if you have an aliased expression in the SELECT projection list, can you use the alias in the WHERE clause? Like this:
SELECT a+b as c
FROM t
WHERE c=...;
Is the use of c alias valid in the where clause? The answer is NO. Queries form a syntax tree, and a lower branch of the tree cannot be reference something defined higher in the tree. This is not necessarily an order of 'execution', is more of a syntax parsing issue. It is equivalent to writing this code in C#:
void Select (int a, int b)
{
if (c = ...) then {...}
int c = a+b;
}
Just as in C# this code won't compile because the variable c is used before is defined, the SELECT above won't compile properly because the alias c is referenced lower in the tree than is actually defined.
Unfortunately, unlike the well known rules of C/C# language parsing, the SQL rules of how the query tree is built are somehow esoteric. There is a brief mention of them in Single SQL Statement Processing but a detailed discussion of how they are created, and what order is valid and what not, I don't know of any source. I'm not saying there aren't good sources, I'm sure some of the good SQL books out there cover this topic.
Note that the syntax tree order does not match the visual order of the SQL text. For example the ORDER BY clause is usually the last in the SQL text, but as a syntax tree it sits above everything else (it sorts the output of the SELECT, so it sits above the SELECTed columns so to speak) and as such is is valid to reference the c alias:
SELECT a+b as c
FROM t
ORDER BY c;
SQL query is not imperative but declarative, so you have no idea which the statement is executed first, but since SQL is evaluated by SQL query engines, most of the SQL engines follows similar process to obtain the results. You may have to understand how the query engine works internally to understand some SQL execution behavior.
Julia Evens has a great post explaining this, it is worth to check it out:
https://jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/
SQL is a declarative language, meaning that it tells the SQL engine what to do, not how. This is in contrast to an imperative language such as C, in which how to do something is clearly laid out.
This means that not all statements will execute as expected. Of particular note are boolean expressions, which may not evaluate from left-to-right as written. For example, the following code is not guaranteed to execute without a divide by zero error:
SELECT 'null' WHERE 1 = 1 OR 1 / 0 = 0
The reason for this is the query optimizer chooses the best (most efficient) way to execute a statement. This means that, for example, a value may be loaded and filtered before a transforming predicate is applied, causing an error. See the second link above for an example
See: here and here.
"Order of execution" is probably a bad mental model for SQL queries. Its hard to actually write a single query that would actually depend on order of execution (this is a good thing). Instead you should think of all join and where clauses happening simultaneously (almost like a template)
That said you could run display the Execution Plans which should give you insight into it.
However since its's not clear why you want to know the order of execution, I'm guessing your trying to get a mental model for this query so you can fix it in some way. This is how I would "translate" your query, although I've done well with this kind of analysis there's some grey area with how precise it is.
FROM AND WHERE CLAUSE
Give me all the Event Participants rows. from [3rdi_EventParticipants
Also give me all the Event Signup rows that match the Event Participants rows on SignUpID inner join 3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId
But Only for Event 13 EP.EventId = 13
And only if the user id has a record in the user roles table where the role id is not in 1,2,19,20,21,22
userid in (
select distinct userid from userroles
--where roleid not in(6,7,61,64) and roleid not in(1,2))
where roleid not in(19, 20, 21, 22) and roleid not in(1,2))
SELECT CLAUSE
For each of the rows give me a unique ID
Row_number() OVER(ORDER BY (SELECT 1)) AS 'Serial Number',
The participants First Name EP.FirstName
The participants Last Name Ep.LastName
The Booking Role name GetBookingRoleName
Go look in the Event Dates and find out what the first eventDate where the EventId = 13 that you find
(select top 1 convert(varchar(10),eventDate,103)from [3rdi_EventDates] where EventId=13) as EventDate
Finally translate the GetBookingRoleName in Category. I don't have a table for this so I'll map it manually (CASE [dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole)
WHEN '90 Day Client' THEN 'DC'
WHEN 'Association Client' THEN 'DC'
WHEN 'Autism Whisperer' THEN 'DC'
WHEN 'CampII' THEN 'AD'
WHEN 'Captain' THEN 'AD'
WHEN 'Chiropractic Assistant' THEN 'AD'
WHEN 'Coaches' THEN 'AD'
END) as Category
So a couple of notes here. You're not ordering by anything when you select TOP. You should probably have na order by there. You could also just as easily put that in your from clause e.g.
from [3rdi_EventParticipants] as EP
inner join [3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId,
(select top 1 convert(varchar(10),eventDate,103)
from [3rdi_EventDates] where EventId=13
Order by eventDate) dates
There is a logical order to evaluation of the query text, but the database engine can choose what order execute the query components based upon what is most optimal. The logical text parsing ordering is listed below. That is, for example, why you can't use an alias from SELECT clause in a WHERE clause. As far as the query parsing process is concerned, the alias doesn't exist yet.
FROM
ON
OUTER
WHERE
GROUP BY
CUBE | ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
See the Microsoft documentation (see "Logical Processing Order of the SELECT statement") for more information on this.
Simplified order for T-SQL -> SELECT statement:
1) FROM
2) Cartesian product
3) ON
4) Outer rows
5) WHERE
6) GROUP BY
7) HAVING
8) SELECT
9) Evaluation phase in SELECT
10) DISTINCT
11) ORDER BY
12) TOP
as I had done so far - same order was applicable in SQLite.
Source => SELECT (Transact-SQL)
... of course there are (rare) exceptions.

Count of Distinct Rows Without Using Subquery

Say I have Table1 which has duplicate rows (forget the fact that it has no primary key...) Is it possible to rewrite the following without using a JOIN, subquery or CTE and also without having to spell out the columns in something like a GROUP BY?
SELECT COUNT(*)
FROM (
SELECT DISTINCT * FROM Table1
) T1
You can do something like this.
SELECT Count(DISTINCT ProductName) FROM Products
but if you want a count of completely distinct records then you will have to use one of the other options you mentioned.
If you wanted to do something like you suggested in the question, then that would imply you have duplicate records in your table.
If you didn't have duplicate records SELECT DISTINCT * from table would be the same without the distinct.
No, it's not possible.
If you are limited by your framework/query tool/whatever, can't use a subquery, and can't spell out each column name in the GROUP BY, you are SOL.
If you are not limited by your framework/query tool/whatever, there's no reason not to use a subquery.
if you really really want to do that you can just "SELECT COUNT(*) FROM table1 GROUP BY all,columns,here" and take the size of the result set as your count.
But it would be dailywtf worthy code ;)
I just wanted to refine the answer by saying that you need to check that the datatype of the columns is comparable - otherwise you will get an error trying to make them DISTINCT:
e.g.
com.microsoft.sqlserver.jdbc.SQLServerException: The ntext data type cannot be selected as DISTINCT because it is not comparable.
This is true for large binary, xml columns and others depending on your RDBMS - rtm. The solution for SQLServer for example is to cast it from an ntext to an nvarchar(MAX) from SQLServer 2005 onwards.
If you stick to the PK columns then you should be OK (I haven't verified this myself but I'd have thought logically that PK columns would have to be comparable)

Resources