SQL - Insert into table if not exists - with existing NULL values - sql-server

I need to import data into SQL from Excel via a .NET user application. I need to avoid duplication. However, some records might have NULLs in certain columns.
I'm using stored procedures to implement the imports, but I can't seem to provide a "universal" solution that checks for matching data if it exists or NULLS if the data doesn't exit.
Note that my Part table uses an Identity PK, but the import records won't include it.
Below is an example (I did not include all the columns for brevity):
CREATE PROCEDURE [dbo].[spInsertPart]
(#PartNo NCHAR(50),
#PartName NCHAR(50) = NULL,
#PartVariance NCHAR(30) = NULL)
AS
BEGIN
SET NOCOUNT OFF;
IF NOT EXISTS (SELECT PartNo, PartVariance
FROM Part
WHERE PartNo = #PartNo AND PartVariance = #PartVariance)
BEGIN
INSERT INTO Part (PartNo, PartName, PartVariance)
VALUES (#PartNo, #PartName, #PartVariance
END
END
The import data may or may not include a PartVariance, and the existing records may (or may not) also have NULL as the PartVariance.
If both are NULL, then I get a duplicate record - which I don't want.
How can I re-write the procedure to not duplicate, but to treat the NULL value like any other value? (That is, add a record if either contains NULL, but not both).

I think you need to provide clear information on the following before this questions can be correctly answered:
What are the columns based on which 'matching' of an incoming record is performed against the rows of the 'Part' table? What that means is having same values on which columns would require the rest of the columns of 'Part' table to be rather 'updated' with incoming values VS a new record would be 'inserted' into the 'Part' table.
Considering only 'PartNo' and 'PartVariance' columns to be used for 'matching' as seen in the query and only PartVariance column can have NULL, here would be the solution:
CREATE PROCEDURE [dbo].[spInsertPart]
(#PartNo NCHAR(50),
#PartName NCHAR(50) = NULL,
#PartVariance NCHAR(30) = NULL)
AS
BEGIN
SET NOCOUNT OFF;
IF NOT EXISTS (
SELECT 1
FROM Part
WHERE PartNo = #PartNo
AND COALESCE(PartVariance, '') = COALESCE(#PartVariance, '')
)
BEGIN
INSERT INTO Part (PartNo, PartName, PartVariance)
VALUES (#PartNo, #PartName, #PartVariance)
END
END
Note:- You have mentioned that only PartVarince can be NULL. If same is true with PartNo, then COALESCE can be used for matching PartNo column as well.

Well, NULL is a problem when it comes to SQL Server. You can't use equality checks on it (=, <>) since they both will return unknown which will be translated as false.
However, you can use a combination of is null, or and and to get the desired results.
With sql server 2012 or higher (in older versions change iif to a case), you can do this:
IF NOT EXISTS (SELECT PartNo, PartVariance
FROM Part
WHERE IIF((PartNo IS NULL AND #PartNo IS NULL) OR (PartNo = #PartNo), 0, 1) = 1
AND IIF((PartVariance IS NULL AND #PartVariance IS NULL) OR (PartVariance = #PartVariance), 0, 1) = 1)
If both PartNo and #PartNo are null, or they contain the same value (remember, null = any other value will be evaluated as false) - the IIF will return 0, otherwise, (meaning, The column and the variable contains different values, even if one of them is null), it will return 1.
Of course, the second iif does the same thing for the other column/variable combination.

Related

sql merge question - updating not matched by source

I have a stored procedure that mergers Local temp table and existing table.
ALTER PROCEDURE [dbo].[SyncProductVariantsFromServices]
#Items ProductVariantsTable readonly
AS
BEGIN
CREATE TABLE #ProductVariantsTemp
(
ItemCode nvarchar(10) collate SQL_Latin1_General_CP1_CI_AS,
VariantCode nvarchar(10) collate SQL_Latin1_General_CP1_CI_AS,
VariantDescriptionBG nvarchar(100) collate SQL_Latin1_General_CP1_CI_AS,
VariantDescriptionEN nvarchar(100) collate SQL_Latin1_General_CP1_CI_AS
)
insert into #ProductVariantsTemp
select ItemCode, VariantCode, VariantDescriptionBG, VariantDescriptionEN
from #Items
MERGE ProductVariants AS TARGET
USING #ProductVariantsTemp AS SOURCE
ON (TARGET.ItemCode = SOURCE.ItemCode AND TARGET.VariantCode= SOURCE.VariantCode)
WHEN NOT MATCHED BY TARGET THEN
INSERT (ItemCode, VariantCode, VariantDescriptionBG, VariantDescriptionEN)
VALUES (SOURCE.ItemCode, SOURCE.VariantCode, SOURCE.VariantDescriptionBG, SOURCE.VariantDescriptionEN)
OUTPUT INSERTED.ItemCode, INSERTED.VariantCode, GETDATE() INTO SyncLog;
The problem is - i know in the output clause i have access to inserted or deleted records in case of Not merged by source. But in case 'not merged by source' I want to update
Update ProductVariants Set Active = 0
// when not matched by source
What is the most efficient way to do this?
Necessarily use `WHEN NOT MATCHED BY SOURCE 'when you want to delete a record that is not in the target table. if you want to 'inactivate' a record this must necessarily be done when the clause 'MATCHED' adding exceptions.
If you want to keep history of the records evaluate using "Slowly Changing Dimensions", I leave you some examples that Kimball uses for this treatment of historical data.
Slowly Changing Dimensions - Part 1
Slowly Changing Dimensions - Part 2
Use the WHEN NOT MATCHED BY SOURCE clause of the MERGE with an UPDATE statement.
MERGE
ProductVariants AS TARGET
USING
#ProductVariantsTemp AS SOURCE ON (TARGET.ItemCode = SOURCE.ItemCode AND TARGET.VariantCode= SOURCE.VariantCode)
WHEN
NOT MATCHED BY TARGET THEN
INSERT (ItemCode, VariantCode, VariantDescriptionBG, VariantDescriptionEN)
VALUES (SOURCE.ItemCode, SOURCE.VariantCode, SOURCE.VariantDescriptionBG, SOURCE.VariantDescriptionEN)
WHEN
NOT MATCHED BY SOURCE THEN
UPDATE SET Active = 0
OUTPUT
INSERTED.ItemCode, INSERTED.VariantCode, GETDATE() INTO SyncLog;
Since the OUTPUT clause for the INSERTED table might return either inserted or updated records now, you can add the special column $action that will tell you the original operation as INSERT or UPDATE. Will have to change the SyncLog table to recieve this value though.
OUTPUT
INSERTED.ItemCode, INSERTED.VariantCode, GETDATE(), $action
INTO SyncLog;

MSSQL Procedure - Performance considerations when updating a massive table

I have two tables
existing_bactria (may contain millions of rows)
new_bactria (may contain millions of rows)
sample tables:
CREATE TABLE [dbo].[existing_bacteria](
[bacteria_name] [nchar](10) NULL,
[bacteria_type] [nchar](10) NULL,
[bacteria_sub_type] [nchar](10) NULL,
[bacteria_size] [nchar](10) NULL,
[bacteria_family] [nchar](10) NULL,
[bacteria_discovery_year] [date] NOT NULL
)
CREATE TABLE [dbo].[new_bacteria](
[existing_bacteria_name] [nchar](10) NULL,
[bacteria_type] [nchar](10) NULL,
[bacteria_sub_type] [nchar](10) NULL,
[bacteria_size] [nchar](10) NULL,
[bacteria_family] [nchar](10) NULL,
[bacteria_discovery_year] [date] NOT NULL
)
I need to create a stored proc to update new_bactria table with a possible match from existing_bactria (update field new_bactria.existing_bacteria_name
By finding a match on the other fields from [existing_bacteria] (assuming only single record in existing_bacteria)
Since the tables are massive (millions of records each) I would like your opinion on how to go about the solution, here is what I got so far:
Solution 1:
the obvious solution is to fetch all into a cursor and iterate over the results and update existing_bacteria
But since there are million records - its not an optimal solution
-- pseudo code
db_cursor as select * from new_bacteria
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #row
WHILE ##FETCH_STATUS = 0
BEGIN
IF EXISTS (
SELECT
#bacteria_name = [bacteria_name]
,#bacteria_type = [bacteria_type]
,#bacteria_size = [bacteria_size]
FROM [dbo].[existing_bacteria]
where [bacteria_type] = #row.[bacteria_type] and #row.[bacteria_size] = [bacteria_size]
)
BEGIN
PRINT 'update new_bacteria.existing_bacteria_name with [bacteria_name] we found.';
END
-- go to next record
FETCH NEXT FROM db_cursor INTO #name
END
Solution 2:
solution2 is to Join both tables in the mssql procedure
and iterate on the results but this is also
-- pseudo code
select * from [new_bacteria]
inner join [existing_bacteria]
on [new_bacteria].bacteria_size = [existing_bacteria].bacteria_size
and [new_bacteria].bacteria_family = [existing_bacteria].bacteria_family
for each result update [existing_bacteria]
I am sure this is not an optimal because of the table size and the iteration
Solution 3:
solution3 is to let the db handle the data and update the tables directly using inner Join:
-- pseudo code
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
inner join [existing_bacteria] P
on R.bacteria_size = P.bacteria_size
and R.bacteria_family = P.bacteria_family
I am not sure about this solution.
Based on your pseudo code, I'd go with solution 3 because it is a set based operation and should be much quicker than using a cursor or other loop.
If you are having issues with performance with solution 3...
and you don't have indexes on those tables, particularly those columns you are using to join the two tables, creating those would help.
create unique index uix_new_bacteria_bacteria_size_bacteria_family
on [new_bacteria] (bacteria_size,bacteria_family);
create unique index uix_existing_bacteria_bacteria_size_bacteria_family
on [existing_bacteria] (bacteria_size,bacteria_family) include (bacteria_name);
and then try:
update r
set r.existing_bacteria_name = p.[bacteria_name]
from [new_bacteria] AS R
inner join [existing_bacteria] P on R.bacteria_size = P.bacteria_size
and R.bacteria_family = P.bacteria_family;
Updating a few million rows should not be a problem with the right indexes.
This section is no longer relevant after an update to the question
Another issue possibly exists in that if bacteria_size and bacteria_family are not unique sets, you could have multiple matches.
(since they are nullable I would imagine they aren't unique unless you're using a filtered index)
In that case, before moving forward, I'd create a table to investigate multiple matches like this:
create table [dbo].[new_and_existing_bacteria_matches](
[existing_bacteria_name] [nchar](10) not null,
rn int not null,
[bacteria_type] [nchar](10) null,
[bacteria_sub_type] [nchar](10) null,
[bacteria_size] [nchar](10) null,
[bacteria_family] [nchar](10) null,
[bacteria_discovery_year] [date] not null,
constraint pk_new_and_existing primary key clustered ([existing_bacteria_name], rn)
);
insert into [new_and_existing_bacteria_matches]
([existing_bacteria_name],rn,[bacteria_type],[bacteria_sub_type],[bacteria_size],[bacteria_family],[bacteria_discovery_year])
select
e.[existing_bacteria_name]
, rn = row_number() over (partition by e.[existing_bacteria_name] order by n.[bacteria_type], n.[bacteria_sub_type])
, n.[bacteria_type]
, n.[bacteria_sub_type]
, n.[bacteria_size]
, n.[bacteria_family]
, n.[bacteria_discovery_year]
from [new_bacteria] as n
inner join [existing_bacteria] e on n.bacteria_size = e.bacteria_size
and n.bacteria_family = e.bacteria_family;
-- and query multiple matches with something like this:
select *
from [new_and_existing_bacteria_matches] n
where exists (
select 1
from [new_and_existing_bacteria_matches] i
where i.[existing_bacteria_name]=n.[existing_bacteria_name]
and rn>1
);
On the subject of performance I'd look at:
The "Recovery Model" of the database, if your DBA says you can have it in "simple mode" then do it, you want to have as little logging as possible.
Consider Disabling some Indexes on the TARGET table, and then rebuilding them when you've finished. On large scale operations the modifications to the index will lead to extra logging, and the manipulation of the index will take up space in your Buffer Pool.
Can you convert the NCHAR to CHAR, it will require less storage space consequently reducing IO, freeing up buffer space and reducing Logging.
If your target table has no Clustered index then try activating 'TraceFlag 610' (warning this is an Instance-wide setting so talk to your DBA)
If your environment allows it, the use of the TABLOCKX hint can remove locking overhead and also help meet the criteria for reduced logging.
For anyone who has to perform Bulk Inserts or Large scale updates, this white paper from Microsoft is a valuable read:
You can try a MERGE statement. It will perform the operation in a single pass of the data. (the problem with a merge is that it tries to do everything in one Transaction and you can end up with an unwanted Spool in the execution plan. I'd then move towards a Batch process looping through maybe 100,000 records at a time.)
(It will need some minor changes to suit your column matching/update requirements)
MERGE [dbo].[new_bacteria] T --TARGET TABLE
USING [dbo].[existing_bacteria] S --SOURCE TABLE
ON
S.[bacteria_name] = T.[existing_bacteria_name] --FILEDS TO MATCH ON
AND S.[bacteria_type] = T.[bacteria_type]
WHEN MATCHED
AND
ISNULL(T.[bacteria_sub_type],'') <> ISNULL(S.[bacteria_sub_type],'') --FIELDS WHERE YOURE LOOKING FOR A CHANGE
OR ISNULL(T.[bacteria_size],'') <> ISNULL(S.[bacteria_size],'')
THEN --UPDATE RECORDS THAT HAVE CHANGED
UPDATE
SET T.[bacteria_sub_type] = S.[bacteria_sub_type]
WHEN NOT MATCHED BY TARGET THEN --ANY NEW RECORDS IN THE SOURCE TABLE WILL BE INSERTED
INSERT(
[existing_bacteria_name],
[bacteria_type],
[bacteria_sub_type],
[bacteria_size],
[bacteria_family],
[bacteria_discovery_year]
)
VALUES(
s.[bacteria_name],
s.[bacteria_type],
s.[bacteria_sub_type],
s.[bacteria_size],
s.[bacteria_family],
s.[bacteria_discovery_year]
);
If the Single MERGE is too much for your system to handle, here's a method for embedding it in a loop that updates large batches. You can modify the batch size to match your Server's capabilities.
It works by using a couple of staging tables that ensure if anything goes wrong (i.e. server agent restart), the process can continue from where it left off. (If you have any question please ask).
--CAPTURE WHAT HAS CHANGED SINCE THE LAST TIME THE SP WAS RUN
--EXCEPT is a usefull command because it can compare NULLS, this removes the need for ISNULL or COALESCE
INSERT INTO [dbo].[existing_bacteria_changes]
SELECT
*
FROM
[dbo].[existing_bacteria]
EXCEPT
SELECT
*
FROM
[dbo].[new_bacteria]
--RUN FROM THIS POINT IN THE EVENT OF A FAILURE
DECLARE #R INT = 1
DECLARE #Batch INT = 100000
WHILE #R > 0
BEGIN
BEGIN TRAN --CARRY OUT A TRANSACTION WITH A SUBSET OF DATA
--USE DELETE WITH OUTPUT TO MOVE A BATCH OF RECORDS INTO A HOLDING AREA.
--The holding area will provide a rollback point so if the job fails at any point it will restart from where it last was.
DELETE TOP (#Batch)
FROM [dbo].[existing_bacteria_changes]
OUTPUT DELETED.* INTO [dbo].[existing_bacteria_Batch]
##ROWCOUNT
--LOG THE NUMBER OF RECORDS IN THE UPDATE SET, THIS WILL ENSURE THE NEXT ITTERATION
SET #R = ISNULL(##ROWCOUNT,0)
--RUN THE MERGE STATEMENT WITH THE SUBSET OF UPDATES
MERGE [dbo].[new_bacteria] T --TARGET TABLE
USING [dbo].[existing_bacteria_Batch] S --SOURCE TABLE
ON
S.[bacteria_name] = T.[existing_bacteria_name] --FILEDS TO MATCH ON
AND S.[bacteria_type] = T.[bacteria_type]
WHEN MATCHED
AND
ISNULL(T.[bacteria_sub_type],'') <> ISNULL(S.[bacteria_sub_type],'') --FIELDS WHERE YOURE LOOKING FOR A CHANGE
OR ISNULL(T.[bacteria_size],'') <> ISNULL(S.[bacteria_size],'')
THEN --UPDATE RECORDS THAT HAVE CHANGED
UPDATE
SET T.[bacteria_sub_type] = S.[bacteria_sub_type]
WHEN NOT MATCHED BY TARGET THEN --ANY NEW RECORDS IN THE SOURCE TABLE WILL BE INSERTED
INSERT(
[existing_bacteria_name],
[bacteria_type],
[bacteria_sub_type],
[bacteria_size],
[bacteria_family],
[bacteria_discovery_year]
)
VALUES(
s.[bacteria_name],
s.[bacteria_type],
s.[bacteria_sub_type],
s.[bacteria_size],
s.[bacteria_family],
s.[bacteria_discovery_year]
);
COMMIT;
--No point in logging this action
TRUNCATE [dbo].[existing_bacteria_Batch]
END
Definitely option 3. SET-based always wins from anything loopy.
That said, the biggest 'risk' might be that the amount of updated data 'overwhelms' your machine. More specific, it could happen that the transaction becomes so big that the system takes forever to finish it. To avoid this you could try splitting the one big UPDATE into multiple smaller UPDATEs and still work mostly set-based. Good indexing and knowing your data is key here.
For instance, starting from
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
INNER JOIN [existing_bacteria] P
ON R.bacteria_size = P.bacteria_size
AND R.bacteria_family = P.bacteria_family
You might try 'chunk' the (target) table into smaller parts. E.g. by making a loop over the bacteria_discovery_year field, assuming that said column splits the table into e.g. 50 more or less equally sized parts. (BTW: I'm no biologist so I might be totally wrong there =)
You'd then get something along the lines of:
DECLARE #c_bacteria_discovery_year date
DECLARE year_loop CURSOR LOCAL STATIC
FOR SELECT DISTINCT bacteria_discovery_year
FROM [new_bacteria]
ORDER BY bacteria_discovery_year
OPEN year_loop
FETCH NEXT FROM year_loop INTO #c_bacteria_discovery_year
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
INNER JOIN [existing_bacteria] P
ON R.bacteria_size = P.bacteria_size
AND R.bacteria_family = P.bacteria_family
WHERE R.bacteria_discovery_year = #c_bacteria_discovery_year
FETCH NEXT FROM year_loop INTO #c_bacteria_discovery_year
END
CLOSE year_loop
DEALLOCATE year_loop
Some remarks:
Like I said, I don't know the distribution of the bacteria_discovery_year values, if 3 years make up 95% of the data it might not be such a great choice.
This will only work if there is an index on the bacteria_discovery_year column, preferably with bacteria_size and bacteria_family included.
You could add some PRINT inside the loop to see the progress and rows affected... it won't speed up anything, but it feels better if you know it's doing something =)
All in all, don't overdo it, if you split it into too many small chunks you'll end up with something that takes forever too.
PS: in any case you'll also need an index on the 'source' table that indexes the bacteria_size and bacteria_family column, preferably including the bacteria_name if the latter is not the (clustered) PK of the table.

SQL Server - pattern matching a string

First post here. What a great resource. Hoping someone can help....
I have a character field that contains mostly numeric values but not all. The field, lets call it diag, is formatted as varchar(8). It contains diagnosis codes and they have been entered inconsistently at times. So I might see 29001 in the diag field. Or I might see 290.001. Sometimes people will code it as 290.00 other times 29000 and yet other times 290. To make it more complicated, I may have alpha characters in that field so the field could contain something like V700.00 or H601. Using these as examples, but it's indicative of what's in the field.
I am trying to find a range of values....for instance diagnosis codes between 29001 to 29999. Taking into account the inconsistencies in coding entry, I also want to return any records that have a diag value of 290.01 to 299.99 I am just at a loss. Searched here for hours and found a lot of info... but couldn't seem to answer my question. I am somewhat new to SQL and can't figure out how to return records that match the range of values I am looking for. There are 40-some million records so it is a lot of data. Trying to pare it down to something I can work with. I am using an older version of SQL Server...2005 in case it matters.
Any help would be most appreciated. I really don't even know where to start.
Thank you!
you can use this T-SQL to remove all non wanted characters in your numbers.
declare #strText varchar(50)
--set #strText = '23,112'
--set #strText = '23Ass112'
set #strText = '2.3.1.1.2'
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
select #strText
on your case I suggest you to create a function
CREATE Function CleanNumbers(#strText VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
return #strText
END
Then you'll have to create a normal query calling the function.
WITH CTE as
(
SELECT dbo.CleanNumbers(yourtable.YourFakeNumber) as Number, yourtable.*
FROM yourtable
WHERE YourCriteria = 1
)
Select * from CTE where CAST(Number as int) between 29001 and 29999
Or easier
Select * from yourtable where CAST(dbo.CleanNumbers(YourFakeNumber) as int) between 29001 and 29999
I hope I haven't done any spelling mistakes ;)
It sounds like you have a little bit of a mess. If you know the rules for the variances, then you could build an automated script to update. But it sounds like it's pretty loose, so you might want to start by deciding what are valid values for the fields, making a table of them to validate against, and then identifying and classifying the invalid data.
First step, you need to get a list of valid diagnosis codes and get them into a table. Something like:
CREATE TABLE [dbo].[DiagnosticCodes](
[DiagnosticCode] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DiagnosticDescription] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_DiagnosticCodes] PRIMARY KEY CLUSTERED
(
[DiagnosticCode] ASC
)
)
Then get a list of the valid codes and import them into this table.
Then you need to find data in your table that is invalid. Something like this query will give you all the invalid codes in your database:
CREATE TABLE [dbo].[DiagnosticCodesMapping](
[Diag] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[NewCode] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_DiagnosticCodesMapping] PRIMARY KEY CLUSTERED
(
[Diag] ASC
)
)
insert into [dbo].[DiagnosticCodesMapping]
Select distinct MyDataTable.Diag, null NewCode
from MyDataTable
left join DiagnosticCodes
on MyDataTable.Diag = DiagnosticCodes.DiagnosticCode
where DiagnosticCodes.DiagnosticCode is null
This creates a table of all the invalid codes and also includes a field called NewCode, which you will populate a mapping from the invalid code to a new valid code. Hopefully this list will not be ridiculously long. Then you hand it over to someone for review and to enter the NewCode field to be one of the valid codes. Once you have your DiagnosticCodesMapping table completely filled in, you can then do an update to get all your fields to have valid codes:
update MyDataTable
set Diag=NewCode
from MyDataTable
join DiagnosticCodesMapping
where MyDataTable.Diag = DiagnosticCodesMapping.Diag
Doing it this way has the added advantage that you can now start validating all data entry in the future and you'll never have to do this cleanup again. You can create a constraint that ensures only valid codes from the DiagnosticCode table can be entered into the Diag field of your data table. You should check your interface to use the new lookup table as well. You'll also have to create a data maintenance interface to the DiagnosticCode table if you need to have super users with the ability to add new codes.

In Oracle PL/SQL Script, set all record's [FIeldX] value to the same value?

I've written an Oracle DB Conversion Script that transfers Data from a previous singular table into a new DB with a main table and several child/reference/maintenance tables. Naturally, this more standardized layout (previous could have, say Bob/Storage Room/Ceiling as the [Location] value) has more fields than the old table and thus cannot be exactly converted over.
For the moment, I have inserted a record value (ex.) [NO_CONVERSION_DATA] into each of my child tables. For my main table, I need to set (ex.) [Color_ID] to 22, [Type_ID] to 57 since there is no explicit conversion for these new fields (annually, all of these records are updated, and after the next update all records will exist with proper field values whereupon the placeholder value/record [NO_CONVERSION_DATA] will be removed from the child tables).
I also similarly need to set [Status_Id] something like the following (not working):
INSERT INTO TABLE1 (STATUS_ID)
VALUES
-- Status was not set as Recycled, Disposed, etc. during Conversion
IF STATUS_ID IS NULL THEN
(CASE
-- [Owner] field has a value, set ID to 2 (Assigned)
WHEN RTRIM(LTRIM(OWNER)) IS NOT NULL THEN 2
-- [Owner] field has no value, set ID to 1 (Available)
WHEN RTRIM(LTRIM(OWNER)) IS NULL THEN 1
END as Status)
Can anyone more experienced with Oracle & PL/SQL assist with the syntax/layout for what I'm trying to do here?
Ok, I figured out how to set the 2 specific columns to the same value for all rows:
UPDATE TABLE1
SET COLOR_ID = 24;
UPDATE INV_ASSETSTEST
SET TYPE_ID = 20;
I'm still trying to figure out setting the STATUS_ID based upon the value in the [OWNER] field being NULL/NOT NULL. Coco's solution below looked good at first glace (regarding his comment, not the solution posted, itself), but the below causes each of my NON-NULLABLE columns to flag and the statement will not execute:
INSERT INTO TABLE1(STATUS_ID)
SELECT CASE
WHEN STATUS_ID IS NULL THEN
CASE
WHEN TRIM(OWNER) IS NULL THEN 1
WHEN TRIM(OWNER) IS NOT NULL THEN 2
END
END FROM TABLE1;
I've tried piecing a similar UPDATE statement together, but so far no luck.
Try with this
INSERT INTO TABLE1 (STATUS_ID)
VALUES
(
case
when TATUS_ID IS NULL THEN
(CASE
-- [Owner] field has a value, set ID to 2 (Assigned)
WHEN RTRIM(LTRIM(OWNER)) IS NOT NULL THEN 2
-- [Owner] field has no value, set ID to 1 (Available)
WHEN RTRIM(LTRIM(OWNER)) IS NULL THEN 1
END )
end);

INSERT+SELECT with a unique key

The following T-SQL statement does not work because [key] has to be unique and the MAX call in the SELECT statement only seems to be called once. In other words it is only incrementing the key value once and and trying to insert that value over and over. Does anyone have a solution?
INSERT INTO [searchOC].[dbo].[searchTable]
([key],
dataVaultType,
dataVaultKey,
searchTerm)
SELECT (SELECT MAX([key]) + 1 FROM [searchOC].[dbo].[searchTable]) AS [key]
,'PERSON' as dataVaultType
,[student_id] as dataVaultKey
,[email] as searchTerm
FROM [JACOB].[myoc4Data].[dbo].[users]
WHERE [email] != '' AND [active] = '1'
AND [student_id] IN (SELECT [userID] FROM [JACOB].[myoc4Data].[dbo].[userRoles]
WHERE ([role] = 'STUDENT' OR [role] = 'FACUTLY' OR [role] = 'STAFF'))
If you can make the key column an IDENTITY column that would probably be the easiest. That allows SQL Server to generate incremental values.
Alternatively, if you are definite about finding your own way to generate the key then a blog post I wrote last month may help. Although it uses a composite key, it shows you what you need to do to stop the issue with inserting multiple rows in a single INSERT statement safely generating a new value for each new row and it is also safe across many simultaneous writers (which many examples don't deal with)
http://colinmackay.co.uk/2012/12/29/composite-primary-keys-including-identity-like-column/
Incidentally, the reason that you get the same value for MAX(Key) on each row in your SELECT is that this happens at the time the table is read from. So for all the rows that the SELECT statement returns the MAX(key) will always be the same. Unless you add some sort of GROUP BY clause for any SELECT statement any MAX(columnName) function will return the same value for each row returned.
Also, all aggregate functions are deterministic, so for each equivalent set of input it will always have the same output. So if your set of keys was 1, 5, 9 then it will always return 9.

Resources