SQL Server - pattern matching a string - sql-server

First post here. What a great resource. Hoping someone can help....
I have a character field that contains mostly numeric values but not all. The field, lets call it diag, is formatted as varchar(8). It contains diagnosis codes and they have been entered inconsistently at times. So I might see 29001 in the diag field. Or I might see 290.001. Sometimes people will code it as 290.00 other times 29000 and yet other times 290. To make it more complicated, I may have alpha characters in that field so the field could contain something like V700.00 or H601. Using these as examples, but it's indicative of what's in the field.
I am trying to find a range of values....for instance diagnosis codes between 29001 to 29999. Taking into account the inconsistencies in coding entry, I also want to return any records that have a diag value of 290.01 to 299.99 I am just at a loss. Searched here for hours and found a lot of info... but couldn't seem to answer my question. I am somewhat new to SQL and can't figure out how to return records that match the range of values I am looking for. There are 40-some million records so it is a lot of data. Trying to pare it down to something I can work with. I am using an older version of SQL Server...2005 in case it matters.
Any help would be most appreciated. I really don't even know where to start.
Thank you!

you can use this T-SQL to remove all non wanted characters in your numbers.
declare #strText varchar(50)
--set #strText = '23,112'
--set #strText = '23Ass112'
set #strText = '2.3.1.1.2'
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
select #strText
on your case I suggest you to create a function
CREATE Function CleanNumbers(#strText VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
return #strText
END
Then you'll have to create a normal query calling the function.
WITH CTE as
(
SELECT dbo.CleanNumbers(yourtable.YourFakeNumber) as Number, yourtable.*
FROM yourtable
WHERE YourCriteria = 1
)
Select * from CTE where CAST(Number as int) between 29001 and 29999
Or easier
Select * from yourtable where CAST(dbo.CleanNumbers(YourFakeNumber) as int) between 29001 and 29999
I hope I haven't done any spelling mistakes ;)

It sounds like you have a little bit of a mess. If you know the rules for the variances, then you could build an automated script to update. But it sounds like it's pretty loose, so you might want to start by deciding what are valid values for the fields, making a table of them to validate against, and then identifying and classifying the invalid data.
First step, you need to get a list of valid diagnosis codes and get them into a table. Something like:
CREATE TABLE [dbo].[DiagnosticCodes](
[DiagnosticCode] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DiagnosticDescription] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_DiagnosticCodes] PRIMARY KEY CLUSTERED
(
[DiagnosticCode] ASC
)
)
Then get a list of the valid codes and import them into this table.
Then you need to find data in your table that is invalid. Something like this query will give you all the invalid codes in your database:
CREATE TABLE [dbo].[DiagnosticCodesMapping](
[Diag] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[NewCode] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_DiagnosticCodesMapping] PRIMARY KEY CLUSTERED
(
[Diag] ASC
)
)
insert into [dbo].[DiagnosticCodesMapping]
Select distinct MyDataTable.Diag, null NewCode
from MyDataTable
left join DiagnosticCodes
on MyDataTable.Diag = DiagnosticCodes.DiagnosticCode
where DiagnosticCodes.DiagnosticCode is null
This creates a table of all the invalid codes and also includes a field called NewCode, which you will populate a mapping from the invalid code to a new valid code. Hopefully this list will not be ridiculously long. Then you hand it over to someone for review and to enter the NewCode field to be one of the valid codes. Once you have your DiagnosticCodesMapping table completely filled in, you can then do an update to get all your fields to have valid codes:
update MyDataTable
set Diag=NewCode
from MyDataTable
join DiagnosticCodesMapping
where MyDataTable.Diag = DiagnosticCodesMapping.Diag
Doing it this way has the added advantage that you can now start validating all data entry in the future and you'll never have to do this cleanup again. You can create a constraint that ensures only valid codes from the DiagnosticCode table can be entered into the Diag field of your data table. You should check your interface to use the new lookup table as well. You'll also have to create a data maintenance interface to the DiagnosticCode table if you need to have super users with the ability to add new codes.

Related

Convert CHAR to NUMERIC in SQL

I am trying to set up a database that has the following columns, "number", "full names", "card no", "status". In the last column status there are many types of status, which I would like to have them converted to 0 or 1 to another column within the table, depending on the statuses. The database then, is to be pulled to another application and will use the binary column to give access to a facility.
I have not tried any code on this, still learning.
SELECT TOP 1000 [MemberNo]
,[FirstName]
,[LastName]
,[CardNo]
,[Status]
FROM [GateAccess].[dbo].[GateProxy]
Not sure why ask a question when you havent tried anything...
First you are trying to create a table with following columns not a database.
Here is a link of how to create a table http://www.mysqltutorial.org/mysql-create-table/
Second you dont really need to to convert char to numeric I would personally use CASE
here is a link on case https://www.techonthenet.com/mysql/functions/case.php
Regarding to your 'application and will use the binary column to give access to a facility' it depends on what application is pulling the data.
This may help ...
DECLARE #sampletext VARCHAR(100) = '123456';
SELECT TRY_CONVERT(INT, #sampletext); -- 123456
SELECT TRY_CAST(#sampletext AS INT); -- 123456
SELECT TOP 1000 [MemberNo]
,[FirstName]
,[LastName]
,[CardNo]
,[Status]
-- Try like this for new derived column
,CASE WHEN [Status] = 'yourValue'THEN 1
WHEN [Status] = 'yourOtherValue' THEN 0
END
FROM [GateAccess].[dbo].[GateProxy]

SQL - Insert into table if not exists - with existing NULL values

I need to import data into SQL from Excel via a .NET user application. I need to avoid duplication. However, some records might have NULLs in certain columns.
I'm using stored procedures to implement the imports, but I can't seem to provide a "universal" solution that checks for matching data if it exists or NULLS if the data doesn't exit.
Note that my Part table uses an Identity PK, but the import records won't include it.
Below is an example (I did not include all the columns for brevity):
CREATE PROCEDURE [dbo].[spInsertPart]
(#PartNo NCHAR(50),
#PartName NCHAR(50) = NULL,
#PartVariance NCHAR(30) = NULL)
AS
BEGIN
SET NOCOUNT OFF;
IF NOT EXISTS (SELECT PartNo, PartVariance
FROM Part
WHERE PartNo = #PartNo AND PartVariance = #PartVariance)
BEGIN
INSERT INTO Part (PartNo, PartName, PartVariance)
VALUES (#PartNo, #PartName, #PartVariance
END
END
The import data may or may not include a PartVariance, and the existing records may (or may not) also have NULL as the PartVariance.
If both are NULL, then I get a duplicate record - which I don't want.
How can I re-write the procedure to not duplicate, but to treat the NULL value like any other value? (That is, add a record if either contains NULL, but not both).
I think you need to provide clear information on the following before this questions can be correctly answered:
What are the columns based on which 'matching' of an incoming record is performed against the rows of the 'Part' table? What that means is having same values on which columns would require the rest of the columns of 'Part' table to be rather 'updated' with incoming values VS a new record would be 'inserted' into the 'Part' table.
Considering only 'PartNo' and 'PartVariance' columns to be used for 'matching' as seen in the query and only PartVariance column can have NULL, here would be the solution:
CREATE PROCEDURE [dbo].[spInsertPart]
(#PartNo NCHAR(50),
#PartName NCHAR(50) = NULL,
#PartVariance NCHAR(30) = NULL)
AS
BEGIN
SET NOCOUNT OFF;
IF NOT EXISTS (
SELECT 1
FROM Part
WHERE PartNo = #PartNo
AND COALESCE(PartVariance, '') = COALESCE(#PartVariance, '')
)
BEGIN
INSERT INTO Part (PartNo, PartName, PartVariance)
VALUES (#PartNo, #PartName, #PartVariance)
END
END
Note:- You have mentioned that only PartVarince can be NULL. If same is true with PartNo, then COALESCE can be used for matching PartNo column as well.
Well, NULL is a problem when it comes to SQL Server. You can't use equality checks on it (=, <>) since they both will return unknown which will be translated as false.
However, you can use a combination of is null, or and and to get the desired results.
With sql server 2012 or higher (in older versions change iif to a case), you can do this:
IF NOT EXISTS (SELECT PartNo, PartVariance
FROM Part
WHERE IIF((PartNo IS NULL AND #PartNo IS NULL) OR (PartNo = #PartNo), 0, 1) = 1
AND IIF((PartVariance IS NULL AND #PartVariance IS NULL) OR (PartVariance = #PartVariance), 0, 1) = 1)
If both PartNo and #PartNo are null, or they contain the same value (remember, null = any other value will be evaluated as false) - the IIF will return 0, otherwise, (meaning, The column and the variable contains different values, even if one of them is null), it will return 1.
Of course, the second iif does the same thing for the other column/variable combination.

MSSQL Procedure - Performance considerations when updating a massive table

I have two tables
existing_bactria (may contain millions of rows)
new_bactria (may contain millions of rows)
sample tables:
CREATE TABLE [dbo].[existing_bacteria](
[bacteria_name] [nchar](10) NULL,
[bacteria_type] [nchar](10) NULL,
[bacteria_sub_type] [nchar](10) NULL,
[bacteria_size] [nchar](10) NULL,
[bacteria_family] [nchar](10) NULL,
[bacteria_discovery_year] [date] NOT NULL
)
CREATE TABLE [dbo].[new_bacteria](
[existing_bacteria_name] [nchar](10) NULL,
[bacteria_type] [nchar](10) NULL,
[bacteria_sub_type] [nchar](10) NULL,
[bacteria_size] [nchar](10) NULL,
[bacteria_family] [nchar](10) NULL,
[bacteria_discovery_year] [date] NOT NULL
)
I need to create a stored proc to update new_bactria table with a possible match from existing_bactria (update field new_bactria.existing_bacteria_name
By finding a match on the other fields from [existing_bacteria] (assuming only single record in existing_bacteria)
Since the tables are massive (millions of records each) I would like your opinion on how to go about the solution, here is what I got so far:
Solution 1:
the obvious solution is to fetch all into a cursor and iterate over the results and update existing_bacteria
But since there are million records - its not an optimal solution
-- pseudo code
db_cursor as select * from new_bacteria
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #row
WHILE ##FETCH_STATUS = 0
BEGIN
IF EXISTS (
SELECT
#bacteria_name = [bacteria_name]
,#bacteria_type = [bacteria_type]
,#bacteria_size = [bacteria_size]
FROM [dbo].[existing_bacteria]
where [bacteria_type] = #row.[bacteria_type] and #row.[bacteria_size] = [bacteria_size]
)
BEGIN
PRINT 'update new_bacteria.existing_bacteria_name with [bacteria_name] we found.';
END
-- go to next record
FETCH NEXT FROM db_cursor INTO #name
END
Solution 2:
solution2 is to Join both tables in the mssql procedure
and iterate on the results but this is also
-- pseudo code
select * from [new_bacteria]
inner join [existing_bacteria]
on [new_bacteria].bacteria_size = [existing_bacteria].bacteria_size
and [new_bacteria].bacteria_family = [existing_bacteria].bacteria_family
for each result update [existing_bacteria]
I am sure this is not an optimal because of the table size and the iteration
Solution 3:
solution3 is to let the db handle the data and update the tables directly using inner Join:
-- pseudo code
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
inner join [existing_bacteria] P
on R.bacteria_size = P.bacteria_size
and R.bacteria_family = P.bacteria_family
I am not sure about this solution.
Based on your pseudo code, I'd go with solution 3 because it is a set based operation and should be much quicker than using a cursor or other loop.
If you are having issues with performance with solution 3...
and you don't have indexes on those tables, particularly those columns you are using to join the two tables, creating those would help.
create unique index uix_new_bacteria_bacteria_size_bacteria_family
on [new_bacteria] (bacteria_size,bacteria_family);
create unique index uix_existing_bacteria_bacteria_size_bacteria_family
on [existing_bacteria] (bacteria_size,bacteria_family) include (bacteria_name);
and then try:
update r
set r.existing_bacteria_name = p.[bacteria_name]
from [new_bacteria] AS R
inner join [existing_bacteria] P on R.bacteria_size = P.bacteria_size
and R.bacteria_family = P.bacteria_family;
Updating a few million rows should not be a problem with the right indexes.
This section is no longer relevant after an update to the question
Another issue possibly exists in that if bacteria_size and bacteria_family are not unique sets, you could have multiple matches.
(since they are nullable I would imagine they aren't unique unless you're using a filtered index)
In that case, before moving forward, I'd create a table to investigate multiple matches like this:
create table [dbo].[new_and_existing_bacteria_matches](
[existing_bacteria_name] [nchar](10) not null,
rn int not null,
[bacteria_type] [nchar](10) null,
[bacteria_sub_type] [nchar](10) null,
[bacteria_size] [nchar](10) null,
[bacteria_family] [nchar](10) null,
[bacteria_discovery_year] [date] not null,
constraint pk_new_and_existing primary key clustered ([existing_bacteria_name], rn)
);
insert into [new_and_existing_bacteria_matches]
([existing_bacteria_name],rn,[bacteria_type],[bacteria_sub_type],[bacteria_size],[bacteria_family],[bacteria_discovery_year])
select
e.[existing_bacteria_name]
, rn = row_number() over (partition by e.[existing_bacteria_name] order by n.[bacteria_type], n.[bacteria_sub_type])
, n.[bacteria_type]
, n.[bacteria_sub_type]
, n.[bacteria_size]
, n.[bacteria_family]
, n.[bacteria_discovery_year]
from [new_bacteria] as n
inner join [existing_bacteria] e on n.bacteria_size = e.bacteria_size
and n.bacteria_family = e.bacteria_family;
-- and query multiple matches with something like this:
select *
from [new_and_existing_bacteria_matches] n
where exists (
select 1
from [new_and_existing_bacteria_matches] i
where i.[existing_bacteria_name]=n.[existing_bacteria_name]
and rn>1
);
On the subject of performance I'd look at:
The "Recovery Model" of the database, if your DBA says you can have it in "simple mode" then do it, you want to have as little logging as possible.
Consider Disabling some Indexes on the TARGET table, and then rebuilding them when you've finished. On large scale operations the modifications to the index will lead to extra logging, and the manipulation of the index will take up space in your Buffer Pool.
Can you convert the NCHAR to CHAR, it will require less storage space consequently reducing IO, freeing up buffer space and reducing Logging.
If your target table has no Clustered index then try activating 'TraceFlag 610' (warning this is an Instance-wide setting so talk to your DBA)
If your environment allows it, the use of the TABLOCKX hint can remove locking overhead and also help meet the criteria for reduced logging.
For anyone who has to perform Bulk Inserts or Large scale updates, this white paper from Microsoft is a valuable read:
You can try a MERGE statement. It will perform the operation in a single pass of the data. (the problem with a merge is that it tries to do everything in one Transaction and you can end up with an unwanted Spool in the execution plan. I'd then move towards a Batch process looping through maybe 100,000 records at a time.)
(It will need some minor changes to suit your column matching/update requirements)
MERGE [dbo].[new_bacteria] T --TARGET TABLE
USING [dbo].[existing_bacteria] S --SOURCE TABLE
ON
S.[bacteria_name] = T.[existing_bacteria_name] --FILEDS TO MATCH ON
AND S.[bacteria_type] = T.[bacteria_type]
WHEN MATCHED
AND
ISNULL(T.[bacteria_sub_type],'') <> ISNULL(S.[bacteria_sub_type],'') --FIELDS WHERE YOURE LOOKING FOR A CHANGE
OR ISNULL(T.[bacteria_size],'') <> ISNULL(S.[bacteria_size],'')
THEN --UPDATE RECORDS THAT HAVE CHANGED
UPDATE
SET T.[bacteria_sub_type] = S.[bacteria_sub_type]
WHEN NOT MATCHED BY TARGET THEN --ANY NEW RECORDS IN THE SOURCE TABLE WILL BE INSERTED
INSERT(
[existing_bacteria_name],
[bacteria_type],
[bacteria_sub_type],
[bacteria_size],
[bacteria_family],
[bacteria_discovery_year]
)
VALUES(
s.[bacteria_name],
s.[bacteria_type],
s.[bacteria_sub_type],
s.[bacteria_size],
s.[bacteria_family],
s.[bacteria_discovery_year]
);
If the Single MERGE is too much for your system to handle, here's a method for embedding it in a loop that updates large batches. You can modify the batch size to match your Server's capabilities.
It works by using a couple of staging tables that ensure if anything goes wrong (i.e. server agent restart), the process can continue from where it left off. (If you have any question please ask).
--CAPTURE WHAT HAS CHANGED SINCE THE LAST TIME THE SP WAS RUN
--EXCEPT is a usefull command because it can compare NULLS, this removes the need for ISNULL or COALESCE
INSERT INTO [dbo].[existing_bacteria_changes]
SELECT
*
FROM
[dbo].[existing_bacteria]
EXCEPT
SELECT
*
FROM
[dbo].[new_bacteria]
--RUN FROM THIS POINT IN THE EVENT OF A FAILURE
DECLARE #R INT = 1
DECLARE #Batch INT = 100000
WHILE #R > 0
BEGIN
BEGIN TRAN --CARRY OUT A TRANSACTION WITH A SUBSET OF DATA
--USE DELETE WITH OUTPUT TO MOVE A BATCH OF RECORDS INTO A HOLDING AREA.
--The holding area will provide a rollback point so if the job fails at any point it will restart from where it last was.
DELETE TOP (#Batch)
FROM [dbo].[existing_bacteria_changes]
OUTPUT DELETED.* INTO [dbo].[existing_bacteria_Batch]
##ROWCOUNT
--LOG THE NUMBER OF RECORDS IN THE UPDATE SET, THIS WILL ENSURE THE NEXT ITTERATION
SET #R = ISNULL(##ROWCOUNT,0)
--RUN THE MERGE STATEMENT WITH THE SUBSET OF UPDATES
MERGE [dbo].[new_bacteria] T --TARGET TABLE
USING [dbo].[existing_bacteria_Batch] S --SOURCE TABLE
ON
S.[bacteria_name] = T.[existing_bacteria_name] --FILEDS TO MATCH ON
AND S.[bacteria_type] = T.[bacteria_type]
WHEN MATCHED
AND
ISNULL(T.[bacteria_sub_type],'') <> ISNULL(S.[bacteria_sub_type],'') --FIELDS WHERE YOURE LOOKING FOR A CHANGE
OR ISNULL(T.[bacteria_size],'') <> ISNULL(S.[bacteria_size],'')
THEN --UPDATE RECORDS THAT HAVE CHANGED
UPDATE
SET T.[bacteria_sub_type] = S.[bacteria_sub_type]
WHEN NOT MATCHED BY TARGET THEN --ANY NEW RECORDS IN THE SOURCE TABLE WILL BE INSERTED
INSERT(
[existing_bacteria_name],
[bacteria_type],
[bacteria_sub_type],
[bacteria_size],
[bacteria_family],
[bacteria_discovery_year]
)
VALUES(
s.[bacteria_name],
s.[bacteria_type],
s.[bacteria_sub_type],
s.[bacteria_size],
s.[bacteria_family],
s.[bacteria_discovery_year]
);
COMMIT;
--No point in logging this action
TRUNCATE [dbo].[existing_bacteria_Batch]
END
Definitely option 3. SET-based always wins from anything loopy.
That said, the biggest 'risk' might be that the amount of updated data 'overwhelms' your machine. More specific, it could happen that the transaction becomes so big that the system takes forever to finish it. To avoid this you could try splitting the one big UPDATE into multiple smaller UPDATEs and still work mostly set-based. Good indexing and knowing your data is key here.
For instance, starting from
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
INNER JOIN [existing_bacteria] P
ON R.bacteria_size = P.bacteria_size
AND R.bacteria_family = P.bacteria_family
You might try 'chunk' the (target) table into smaller parts. E.g. by making a loop over the bacteria_discovery_year field, assuming that said column splits the table into e.g. 50 more or less equally sized parts. (BTW: I'm no biologist so I might be totally wrong there =)
You'd then get something along the lines of:
DECLARE #c_bacteria_discovery_year date
DECLARE year_loop CURSOR LOCAL STATIC
FOR SELECT DISTINCT bacteria_discovery_year
FROM [new_bacteria]
ORDER BY bacteria_discovery_year
OPEN year_loop
FETCH NEXT FROM year_loop INTO #c_bacteria_discovery_year
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
INNER JOIN [existing_bacteria] P
ON R.bacteria_size = P.bacteria_size
AND R.bacteria_family = P.bacteria_family
WHERE R.bacteria_discovery_year = #c_bacteria_discovery_year
FETCH NEXT FROM year_loop INTO #c_bacteria_discovery_year
END
CLOSE year_loop
DEALLOCATE year_loop
Some remarks:
Like I said, I don't know the distribution of the bacteria_discovery_year values, if 3 years make up 95% of the data it might not be such a great choice.
This will only work if there is an index on the bacteria_discovery_year column, preferably with bacteria_size and bacteria_family included.
You could add some PRINT inside the loop to see the progress and rows affected... it won't speed up anything, but it feels better if you know it's doing something =)
All in all, don't overdo it, if you split it into too many small chunks you'll end up with something that takes forever too.
PS: in any case you'll also need an index on the 'source' table that indexes the bacteria_size and bacteria_family column, preferably including the bacteria_name if the latter is not the (clustered) PK of the table.

SQL Server : Select ... From with FULL JOIN with a Default value for a column

I created a table, tblNewParts with 3 columns:
NewCustPart
AddedDate
Handled
and I am trying to FULL JOIN it to an existing table, tblPartsWorkedOn.
tblNewParts is defined to have Handled defaulted to 'N'...
SELECT *
FROM dbo.tblPartsWorkedOn AS BASE
FULL JOIN dbo.tblNewParts AS ADDON ON BASE.[CustPN] = ADDON.[NewCustPart]
WHERE ADDON.[Handled] IS NULL
ORDER BY [CustPN] DESC
And I want the field [Handled] to come back as 'N' instead of NULL when I run the query. The problem is that when there aren't any records in the new table, I get NULL's instead of 'N's.
I saw a SELECT CASE WHEN col1 IS NULL THEN defaultval ELSE col1 END as a mostly suitable answer from here. I am wondering if this will work in this instance, and how would I write that in T-SQL for SQL Server 2012? I need all of the columns from both tables, rather than just the one.
I'm making this a question, rather than a comment on the cited link, so as to not obscure the original link's question.
Thank you for helping!
Name the column (alias.column_name) in select statement and use ISNULL(alias.column,'N').
Thanks
After many iterations I found the answer, it's kind of bulky but here it is anyway. Synopsis:
Yes, the CASE statement does work, but it gives the output as an unnamed column. Also, in this instance to get all of the original columns AND the corrected column, I had to use SELECT *, CASE...END as [ColumnName].
But, here is the better solution, as it will place the information into the correct column, rather than adding a column to the end of the table and calling that column 'Unnamed Column'.
Select [ID], [Seq], [Shipped], [InternalPN], [CustPN], [Line], [Status],
CASE WHEN ADDON.[NewCustPart] IS NULL THEN BASE.[CustPN] ELSE
ADDON.[NewCustomerPart] END as [NewCustPart],
GetDate() as [AddedDate],
CASE WHEN ADDON.[Handled] IS NULL THEN 'N' ELSE ADDON.[Handled] END as [Handled]
from dbo.tblPartsWorkedOn as BASE
full join dbo.tblNewParts as AddOn ON Base.[CustPN] = AddOn.NewCustPart
where AddOn.Handled = 'N' or AddOn.Handled is null
order by [NewCustPart] desc
This sql code places the [CustPN] into [NewCustPart] if it's null, it puts a 'N' into the field [Handled] if it's null and it assigns the date to the [AddedDate] field. It also only returns records that have not been handled, so that you get the ones that need to be looked at; and it orders the resulting output by the [NewCustPart] field value.
Resulting Output looks something like this: (I shortened the DateTime for the output here.)
[ID] [SEQ] [Shipped] [InternalPN] [CustPN] [Status] [NewCustPart] [AddedDate] [Handled]
1 12 N 10012A 10012A UP 10012A 04/02/2016 N
...
Rather than with the nulls:
[ID] [SEQ] [Shipped] [InternalPN] [CustPN] [Status] [NewCustPart] [AddedDate] [Handled]
1 12 N 10012A 10012A UP NULL NULL NULL
...
I'm leaving this up, and just answering it rather than deleting it, because I am fairly sure that someone else will eventually ask this same question. I think that lots of examples showing how and why something is done, is a very helpful thing to have as not everything can be generalized. Just some thoughts and I hope that this helps someone else!

INSERT+SELECT with a unique key

The following T-SQL statement does not work because [key] has to be unique and the MAX call in the SELECT statement only seems to be called once. In other words it is only incrementing the key value once and and trying to insert that value over and over. Does anyone have a solution?
INSERT INTO [searchOC].[dbo].[searchTable]
([key],
dataVaultType,
dataVaultKey,
searchTerm)
SELECT (SELECT MAX([key]) + 1 FROM [searchOC].[dbo].[searchTable]) AS [key]
,'PERSON' as dataVaultType
,[student_id] as dataVaultKey
,[email] as searchTerm
FROM [JACOB].[myoc4Data].[dbo].[users]
WHERE [email] != '' AND [active] = '1'
AND [student_id] IN (SELECT [userID] FROM [JACOB].[myoc4Data].[dbo].[userRoles]
WHERE ([role] = 'STUDENT' OR [role] = 'FACUTLY' OR [role] = 'STAFF'))
If you can make the key column an IDENTITY column that would probably be the easiest. That allows SQL Server to generate incremental values.
Alternatively, if you are definite about finding your own way to generate the key then a blog post I wrote last month may help. Although it uses a composite key, it shows you what you need to do to stop the issue with inserting multiple rows in a single INSERT statement safely generating a new value for each new row and it is also safe across many simultaneous writers (which many examples don't deal with)
http://colinmackay.co.uk/2012/12/29/composite-primary-keys-including-identity-like-column/
Incidentally, the reason that you get the same value for MAX(Key) on each row in your SELECT is that this happens at the time the table is read from. So for all the rows that the SELECT statement returns the MAX(key) will always be the same. Unless you add some sort of GROUP BY clause for any SELECT statement any MAX(columnName) function will return the same value for each row returned.
Also, all aggregate functions are deterministic, so for each equivalent set of input it will always have the same output. So if your set of keys was 1, 5, 9 then it will always return 9.

Resources