Primary Key on a temp-table messes up the results - sql-server

This is a "Why does this happen??? - Question"
I have the following script:
DECLARE #sql_stmt nvarchar(max)
SET #sql_stmt = '
select top 100000 id as id
from dat.sev_sales_event
order by id
'
DECLARE #preResult TABLE ( sales_event_id INT NOT NULL PRIMARY KEY)
INSERT INTO #preResult(sales_event_id)
EXEC sp_executesql #sql_stmt
SELECT * FROM #preResult
If I run this script, results may vary each time it's executed
By simply removing "PRIMARY KEY" from the temporary-table, results stay stable
Can someone tell me the theory to this behaviour?
Kind regards
Jürgen

The order of data in a database has no meaning.
If you want your results to be ordered then you must specify an ORDER BY clause.
This is irrespective of having a PRIMARY key or not.
The following scripts illustrate the issue nicely
Expecting order without ORDER BY (1).sql - gvee.co.uk
Expecting order without ORDER BY (2).sql - gvee.co.uk
Expecting order without ORDER BY (3).sql - gvee.co.uk

Are you sure the result set is different or just in a different order?
Adding a primary key to the temporary table should result in the contents of the table being ordered numerically ascending, and so appear 'stable'. Removing this will remove the inherent ordering.

Related

Order of XML nodes from document preserved in insert?

If I do:
INSERT INTO dst
SELECT blah
FROM src
CROSS APPLY xmlcolumn.nodes('blah')
where dst has an identity column, can one say for certain that the identity column order matches the order of the nodes from the original xml document?
I think the answer is no, there are no guarantees and that to ensure the ordering is able to be retained, some ordering information needs to also be extracted from the XML at the same time the nodes are enumerated.
There's no way to see it explicitly in an execution plan, but the id column returned by the nodes() method is a varbinary(900) OrdPath, which does encapsulate the original xml document order.
The solution offered by Mikael Eriksson on the related question Does the `nodes()` method keep the document order? relies on the OrdPath to provide an ORDER BY clause necessary to determine how identity values are assigned for the INSERT.
A slightly more compact usage follows:
CREATE TABLE #T
(
ID integer IDENTITY,
Fruit nvarchar(10) NOT NULL
);
DECLARE #xml xml =
N'
<Fruits>
<Apple />
<Banana />
<Orange />
<Pear />
</Fruits>
';
INSERT #T
(Fruit)
SELECT
N.n.value('local-name(.)', 'nvarchar(10)')
FROM #xml.nodes('/Fruits/*') AS N (n)
ORDER BY
ROW_NUMBER() OVER (ORDER BY N.n);
SELECT
T.ID,
T.Fruit
FROM #T AS T
ORDER BY
T.ID;
db<>fiddle
Using the OrdPath this way is presently undocumented, but the technique is sound in principle:
The OrdPath reflects document order.
The ROW_NUMBER computes sequence values ordered by OrdPath*.
The ORDER BY clause uses the row number sequence.
Identity values are assigned to rows as per the ORDER BY.
To be clear, this holds even if parallelism is employed. As Mikael says, the dubious aspect is using id in the ROW_NUMBER since id is not documented to be the OrdPath.
* The ordering is not shown in plans, but optimizer output using TF 8607 contains:
ScaOp_SeqFunc row_number order[CALC:QCOL: XML Reader with XPath filter.id ASC]
Under the current implementation of .nodes, the XML nodes are generated in document order. The result of that is always joined to the original data using a nested loops, which always runs in order also.
Furthermore, inserts are generally serial (except under very specific circumstances that it goes parallel, usually when you have an empty table, and never with an IDENTITY value being generated).
Therefore there is no reason why the server would ever return rows in a different order than the document order. You can see from this fiddle that that is what happens.
That being said, there is no guarantee that the implementation of .nodes won't change, or that inserts may in future go parallel, as neither of these is documented anywhere as being guaranteed. So I wouldn't rely on it without an explicit ORDER BY, and you do not have a column to order it on.
Using an ORDER BY would guarantee it. The docs state: "INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted."
Even using ROW_NUMBER as some have recommended is also not guaranteed. The only real solution is to get the document order directly from XQuery.
The problem is that SQL Server's version of XQuery does not allow using position(.) as a result, only as a predicate. Instead, you can use a hack involving the << positional operator.
For example:
SELECT T.X.value('text()[1]', 'nvarchar(100)') as RowLabel,
T.X.value('let $i := . return count(../*[. << $i]) + 1', 'int') as RowNumber
FROM src
CROSS APPLY xmlcolumn.nodes('blah') as T(X);
What this does is:
Assign the current node . to the variable $i
Takes all the nodes in ../* i.e. all children of the parent of this node
... [. << $i] that are previous to $i
and counts them
Then add 1 to make it one-based

A big 'like' matching query

I've got 2 tables,
'[Item] with field [name] nvarchar(255)
'[Transaction] with field [short_description] nvarchar(3999)
And I need to do thus :
Select [Transaction].id, [Item].id
From [Transaction] inner join [Item]
on [Transaction].[short_description] like ('%' + [Item].[name] + '%')
The above works if limited to a handful of items, but unfiltered is just going over 20 mins and I cancel.
I have a NC index on [name], but I cannot index [short_description] due to its length.
[Transaction] has 320,000 rows
[Items] has 42,000.
That's 13,860,000,000 combinations.
Is there a better way to perform this query ?
I did poke at full-text, but I'm not really that familiar, the answer was not jumping out at me there.
Any advice appreciated !!
Starting a comparison string with a wildcard (% or _) will NEVER use an index, and will typically be disastrous for performance. Your query will need to scan indexes rather than seek through them, so indexing won't help.
Ideally, you should have a third table that would allow a many-to-many relationship between Transaction and Item based on IDs. The design is the issue here.
After some more sleuthing I have utilized some Fulltext features.
sp_fulltext_keymappings
gives me my transaction table id, along with the FT docID
(I found out that 'doc' = text field)
sys.dm_fts_index_keywords_by_document
gives me FT documentId along with the individual keywords within it
Once I had that, the rest was simple.
Although, I do have to look into the term 'keyword' a bit more... seems that definition can be variable.
This only works because the text I am searching for has no white space.
I believe that you could tweak the FTI configuration to work with other scenarios... but I couldn't promise.
I need to look more into Fulltext.
My current 'beta' code below.
CREATE TABLE #keyMap
(
docid INT PRIMARY KEY ,
[key] varchar(32) NOT NULL
);
DECLARE #db_id int = db_id(N'<database name>');
DECLARE #table_id int = OBJECT_ID(N'Transactions');
INSERT INTO #keyMap
EXEC sp_fulltext_keymappings #table_id;
select km.[key] as transaction_id, i.[id] as item_id
from
sys.dm_fts_index_keywords_by_document ( #db_id, #table_id ) kbd
INNER JOIN
#keyMap km ON km.[docid]=kbd.document_id
inner join [items] i
on kdb.[display_term] = i.name
;
My actual version of the code includes inserting the data into a final table.
Execution time is coming in at 30 seconds, which serves my needs for now.

INSERT+SELECT with a unique key

The following T-SQL statement does not work because [key] has to be unique and the MAX call in the SELECT statement only seems to be called once. In other words it is only incrementing the key value once and and trying to insert that value over and over. Does anyone have a solution?
INSERT INTO [searchOC].[dbo].[searchTable]
([key],
dataVaultType,
dataVaultKey,
searchTerm)
SELECT (SELECT MAX([key]) + 1 FROM [searchOC].[dbo].[searchTable]) AS [key]
,'PERSON' as dataVaultType
,[student_id] as dataVaultKey
,[email] as searchTerm
FROM [JACOB].[myoc4Data].[dbo].[users]
WHERE [email] != '' AND [active] = '1'
AND [student_id] IN (SELECT [userID] FROM [JACOB].[myoc4Data].[dbo].[userRoles]
WHERE ([role] = 'STUDENT' OR [role] = 'FACUTLY' OR [role] = 'STAFF'))
If you can make the key column an IDENTITY column that would probably be the easiest. That allows SQL Server to generate incremental values.
Alternatively, if you are definite about finding your own way to generate the key then a blog post I wrote last month may help. Although it uses a composite key, it shows you what you need to do to stop the issue with inserting multiple rows in a single INSERT statement safely generating a new value for each new row and it is also safe across many simultaneous writers (which many examples don't deal with)
http://colinmackay.co.uk/2012/12/29/composite-primary-keys-including-identity-like-column/
Incidentally, the reason that you get the same value for MAX(Key) on each row in your SELECT is that this happens at the time the table is read from. So for all the rows that the SELECT statement returns the MAX(key) will always be the same. Unless you add some sort of GROUP BY clause for any SELECT statement any MAX(columnName) function will return the same value for each row returned.
Also, all aggregate functions are deterministic, so for each equivalent set of input it will always have the same output. So if your set of keys was 1, 5, 9 then it will always return 9.

SQL Server: select without order

I am using where in condition in SQL Server. I want to get result without order, because I gave a list into the 'where in' condition.
For example
select * from blabla where column in ('03.01.KO61.01410',
'03.02.A081.15002',
'03.02.A081.15016',
'03.02.A081.15003',
'02.03.A081.57105')
How can I do?
If you want the rows returned such that they're in the same order as the items in your IN, you need to find some way to specify that in an ORDER BY clause - the only way to get SQL Server to define an order. E.g.:
select * from blabla where column in ('03.01.KO61.01410',
'03.02.A081.15002',
'03.02.A081.15016',
'03.02.A081.15003',
'02.03.A081.57105')
order by
CASE column
when '03.01.KO61.01410' then 1
when '03.02.A081.15002' then 2
when '03.02.A081.15016' then 3
when '03.02.A081.15003' then 4
when '02.03.A081.57105' then 5
end
Due to my experience, SQL Server randomly order the result set for WHERE-IN Clause if you does not specify how to order it.
So, if you want to order by your WHERE-IN conditions, you must define some data item to order it as you passed. Otherwise, SQL Server will randomly order your resultset.
You're already doing it - if you don't explicitly specify an order by using ORDER BY, then there is no implied order.
If you want to totally randomize the output, you could add an ORDER BY NEWID() clause:
SELECT (list of columns)
FROM dbo.blabla
WHERE column IN ('03.01.KO61.01410', '03.02.A081.15002',
'03.02.A081.15016', '03.02.A081.15003', '02.03.A081.57105')
ORDER BY NEWID()
If you have an autoincrement id in your table, use it in an order clause. And if you don't, consider adding one...
Try this:
CREATE TYPE varchar20_list_type AS TABLE (
id INT IDENTITY PRIMARY KEY,
val VARCHAR(20) NOT NULL UNIQUE
)
DECLARE #mylist varchar20_list_type
INSERT #mylist (val) VALUES
('03.01.KO61.01410'),
('03.02.A081.15002'),
('03.02.A081.15016'),
('03.02.A081.15003'),
('02.03.A081.57105')
SELECT
*
FROM
blabla
JOIN #mylist AS t
ON
blabla.col = t.val
ORDER BY
t.id
More information from http://www.sommarskog.se/arrays-in-sql-2008.html
By the way, this can be easily done in PostgreSQL with VALUES: http://www.postgresql.org/docs/9.0/static/queries-values.html

adding an index to sql server

I have a query that gets run often. its a dynmaic sql query because the sort by changes.
SELECT userID, ROW_NUMBER(OVER created) as rownumber
from users
where
divisionID = #divisionID and isenrolled=1
the OVER part of the query can be:
userid
created
Should I create an index for:
divisionID + isenrolled
divisionID + isenrolled + each_sort_by_option ?
Where should I put indexes for this table?
I'd start with
CREATE INDEX IX_SOQuestion ON dbo.users (divisionID, isenrolled) INCLUDE (userID, created)
The created ranking is unrelated to the WHERE clause, so may as well just INCLUDE it (so it's covered) rather that in the key columns. An internal sort would be needed anyway, so why make the key bigger?
Other sort columns could be included too
userid is only needed for output, so INCLUDE it
perhaps take isenrolled into INCLUDE if it's bit. Only 2 states (OK, 3 with NULL), so kinda pointless to add to the key columns
Start with divisionID + isenrolled + userID as it will always be used

Resources