INSERT+SELECT with a unique key - sql-server

The following T-SQL statement does not work because [key] has to be unique and the MAX call in the SELECT statement only seems to be called once. In other words it is only incrementing the key value once and and trying to insert that value over and over. Does anyone have a solution?
INSERT INTO [searchOC].[dbo].[searchTable]
([key],
dataVaultType,
dataVaultKey,
searchTerm)
SELECT (SELECT MAX([key]) + 1 FROM [searchOC].[dbo].[searchTable]) AS [key]
,'PERSON' as dataVaultType
,[student_id] as dataVaultKey
,[email] as searchTerm
FROM [JACOB].[myoc4Data].[dbo].[users]
WHERE [email] != '' AND [active] = '1'
AND [student_id] IN (SELECT [userID] FROM [JACOB].[myoc4Data].[dbo].[userRoles]
WHERE ([role] = 'STUDENT' OR [role] = 'FACUTLY' OR [role] = 'STAFF'))

If you can make the key column an IDENTITY column that would probably be the easiest. That allows SQL Server to generate incremental values.
Alternatively, if you are definite about finding your own way to generate the key then a blog post I wrote last month may help. Although it uses a composite key, it shows you what you need to do to stop the issue with inserting multiple rows in a single INSERT statement safely generating a new value for each new row and it is also safe across many simultaneous writers (which many examples don't deal with)
http://colinmackay.co.uk/2012/12/29/composite-primary-keys-including-identity-like-column/
Incidentally, the reason that you get the same value for MAX(Key) on each row in your SELECT is that this happens at the time the table is read from. So for all the rows that the SELECT statement returns the MAX(key) will always be the same. Unless you add some sort of GROUP BY clause for any SELECT statement any MAX(columnName) function will return the same value for each row returned.
Also, all aggregate functions are deterministic, so for each equivalent set of input it will always have the same output. So if your set of keys was 1, 5, 9 then it will always return 9.

Related

select rows on condition + one row on different condition

I'm sorry for the bad title. But I dont know how to explain it shortly.
Consider a datatable with 'ID' column with an clustered index on ID Descending. Its containing another column, where Null-Values only exist in the upper most entries. Lets call this column 'NewValue' and it does not have an index.
Now I'm pulling a specific resultset and I can do so in two different ways:
First way:
Select Top (Select Count([NewValue]) + 1 from [testtable] where [NewValue] is Null) * from [testable]
Second way:
Select * from [testable] where [NewValue] is Null Union Select Top 1 * from [testtable] where [NewValue] is not Null
I do not understand enough from execution plans to know myself which is better, but I see there is quiet a difference. The second way does a sort that takes 53%, which doesnt happen in the first method. I'm using MSSQL-Server
So can you tell me which is better? I need all datarows with 'NewValue' = Null and the first datarow where it is not null

How can this expression reach the NULL expression?

I'm trying to randomly populate a column with values from another table using this statement:
UPDATE dbo.SERVICE_TICKET
SET Vehicle_Type = (SELECT TOP 1 [text]
FROM dbo.vehicle_typ
WHERE id = abs(checksum(NewID()))%21)
It seems to work fine, however the value NULL is inserted into the column. How can I get rid of the NULL and only insert the values from the table?
This can happen when you don't have an appropriate index on the ID column of your vehicle_typ table. Here's a smaller query that exhibits the same problem:
create table T (ID int null)
insert into T(ID) values (0),(1),(2),(3)
select top 1 * from T where ID = abs(checksum(NewID()))%3
Because there's no index on T, what happens is that SQL Server performs a table scan and then, for each row, attempts to satisfy the where clause. Which means that, for each row it evaluates abs(checksum(NewID()))%3 anew. You'll only get a result if, by chance, that expression produces, say, 1 when it's evaluated for the row with ID 1.
If possible (I don't know your table structure) I would first populate a column in SERVICE_TICKET with a random number between 0 and 20 and then perform this update using the already generated number. Otherwise, with the current query structure, you're always relying on SQL Server being clever enough to only evaluate abs(checksum(NewID()))%21once for each outer row, which it may not always do (as you've already found out).
#Damien_The_Unbeliever explained why your query fails.
My first variant was not correct, because I didn't understand the problem in full.
You want to set each row in SERVICE_TICKET to a different random value from vehicle_typ.
To fix it simply order by random number, rather than comparing a random number with ID. Like this (and you don't care how many rows are in vehicle_typ as long as there is at least one row there).
WITH
CTE
AS
(
SELECT
dbo.SERVICE_TICKET.Vehicle_Type
CA.[text]
FROM
dbo.SERVICE_TICKET
CROSS APPLY
(
SELECT TOP 1 [text]
FROM dbo.vehicle_typ
ORDER BY NewID()
) AS CA
)
UPDATE CTE
SET Vehicle_Type = [text];
At first we make a Common Table Expression, you can think of it as a temporary table. For each row in SERVICE_TICKET we pick one random row from vehicle_typ using CROSS APPLY. Then we UPDATE the original table with chosen rows.

determine order of operations in query

Say I have a query like this:
SELECT *
FROM Foo
WHERE Name IN ('name1', 'name2')
AND (Date<'2013-01-01' AND Date>'2010-01-01')
AND Type = 1
Is there a way to force the SQL server to evaluate the expressions in the order I determine and not what the query optimizer says? For example I want the IN clause evaluated first, the output of that evaluated by Type = 1 and finally the dates, in EXACTLY that order.
Yes it is largely possible (though there are some caveats and counter examples discussed in the answers here)
SELECT *
FROM Foo
WHERE 1 = CASE
WHEN Name IN ( 'name1', 'name2' ) THEN
CASE
WHEN Type = 1 THEN
CASE
WHEN ( Date < '2013-01-01'
AND Date > '2010-01-01' ) THEN 1
END
END
END
But why bother? There are only very limited circumstances in which I can see this would be useful (e.g. preventing divide by zero if an earlier predicate evaluated to 0).
Wrapping the predicates up like this makes the query completely unsargable and prevents index usage for any of the three (otherwise sargable) predicates. It guarantees a full scan reading all rows.
To see an example of this
CREATE TABLE Foo
(
Id INT IDENTITY PRIMARY KEY,
Name VARCHAR(10),
[Date] DATE,
[Type] TINYINT,
Filler CHAR(8000) NULL
)
CREATE NONCLUSTERED INDEX IX_Name
ON Foo(Name)
CREATE NONCLUSTERED INDEX IX_Date
ON Foo(Date)
CREATE NONCLUSTERED INDEX IX_Type
ON Foo(Type)
INSERT INTO Foo
(Name,
[Date],
[Type])
SELECT TOP (100000) 'name' + CAST(0 + CRYPT_GEN_RANDOM(1) AS VARCHAR),
DATEADD(DAY, 7 * CRYPT_GEN_RANDOM(1), '2012-01-01'),
0 + CRYPT_GEN_RANDOM(1)
FROM master..spt_values v1,
master..spt_values v2
Then running the original query in the question vs this query gives plans
Note the second query is costed as being 100% of the cost of the batch.
The Query optimizer left to its own devices first seeks into the 414 rows matching the type predicate and uses that as a build input for the hash table. It then seeks into the 728 rows matching the name, sees if it matches anything in the hash table and for the 4 that do it performs a key lookup for the other columns and evaluates the Date predicate against those. Finally it returns the single matching row.
The second query just ploughs through all the rows in the table and evaluates the predicates in the desired order. The difference in number of pages read is pretty significant.
Original Query
Table 'Foo'. Scan count 3, logical reads 23,
Table 'Worktable'. Scan count 0, logical reads 0
Nested case
Table 'Foo'. Scan count 1, logical reads 100373
Short answer: NO!
You can try to use brackets, hints, study query plan, etc.
But is that wise to mess up with engine/optimizer that way?
You ill need a lot of study and experience to outsmart the optimizer, that said, please let the engine take care of that details for you.

SEQUENCE in SQL Server 2008 R2

I need to know if there is any way to have a SEQUENCE or something like that, as we have in Oracle. The idea is to get one number and then use it as a key to save some records in a table. Each time we need to save data in that table, first we get the next number from the sequence and then we use the same to save some records. Is not an IDENTITY column.
For example:
[ID] [SEQUENCE ID] [Code] [Value]
1 1 A 232
2 1 B 454
3 1 C 565
Next time someone needs to add records, the next SEQUENCE ID should be 2, is there any way to do it? the sequence could be a guid for me as well.
As Guillelon points out, the best way to do this in SQL Server is with an identity column.
You can simply define a column as being identity. When a new row is inserted, the identity is automatically incremented.
The difference is that the identity is updated on every row, not just some rows. To be honest, think this is a much better approach. Your example suggests that you are storing both an entity and detail in the same table.
The SequenceId should be the primary identity key in another table. This value can then be used for insertion into this table.
This can be done using multiple ways, Following is what I can think of
Creating a trigger and there by computing the possible value
Adding a computed column along with a function that retrieves the next value of the sequence
Here is an article that presents various solutions
One possible way is to do something like this:
-- Example 1
DECLARE #Var INT
SET #Var = Select Max(ID) + 1 From tbl;
INSERT INTO tbl VALUES (#var,'Record 1')
INSERT INTO tbl VALUES (#var,'Record 2')
INSERT INTO tbl VALUES (#var,'Record 3')
-- Example 2
INSERT INTO #temp VALUES (1,2)
INSERT INTO #temp VALUES (1,2)
INSERT INTO ActualTable (col1, col2, sequence)
SELECT temp.*, (SELECT MAX(ID) + 1 FROM ActualTable)
FROM #temp temp
-- Example 3
DECLARE #var int
INSERT INTO ActualTable (col1, col2, sequence) OUTPUT #var = inserted.sequence VALUES (1, 2, (SELECT MAX(ID) + 1 FROM ActualTable))
The first two examples rely on batch updating. But based on your comment, I have added example 3 which is a single input initially. You can then use the sequence that was inserted to insert the rest of the records. If you have never used an output, please reply in comments and I will expand further.
I would isolate all of the above inside of a transactions.
If you were using SQL Server 2012, you could use the SEQUENCE operator as shown here.
Forgive me if syntax errors, don't have SSMS installed

Primary Key on a temp-table messes up the results

This is a "Why does this happen??? - Question"
I have the following script:
DECLARE #sql_stmt nvarchar(max)
SET #sql_stmt = '
select top 100000 id as id
from dat.sev_sales_event
order by id
'
DECLARE #preResult TABLE ( sales_event_id INT NOT NULL PRIMARY KEY)
INSERT INTO #preResult(sales_event_id)
EXEC sp_executesql #sql_stmt
SELECT * FROM #preResult
If I run this script, results may vary each time it's executed
By simply removing "PRIMARY KEY" from the temporary-table, results stay stable
Can someone tell me the theory to this behaviour?
Kind regards
Jürgen
The order of data in a database has no meaning.
If you want your results to be ordered then you must specify an ORDER BY clause.
This is irrespective of having a PRIMARY key or not.
The following scripts illustrate the issue nicely
Expecting order without ORDER BY (1).sql - gvee.co.uk
Expecting order without ORDER BY (2).sql - gvee.co.uk
Expecting order without ORDER BY (3).sql - gvee.co.uk
Are you sure the result set is different or just in a different order?
Adding a primary key to the temporary table should result in the contents of the table being ordered numerically ascending, and so appear 'stable'. Removing this will remove the inherent ordering.

Resources