Generating a unique key in SQL Server on a per company basis

Generating a unique key in SQL Server on a per company basis - sql-server

I store positions in a SQL Server 2012 database, where each position is defined by a position number and a company number.
The position numbers are unique for each company only.
For instance, my database could have the following
POSITION_NO COMPANY_NO
1 1
2 1
3 1
1 2
2 2
3 2
1 3
I need a function which takes a company number as a parameter, and returns the next sequential position number, which in the example table above would be 2 for COMPANY_NO = 3
What I use at the moment is:
CREATE PROCEDURE [DB].[GenerateKey]
#p_company_no float(53),
#return_value_argument float(53) OUTPUT
AS
BEGIN
DECLARE
#v_position_no numeric(5, 0)
SELECT #v_position_no = max(POSITION_NO) + 1
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
SET #return_value_argument = #v_position_no
RETURN
END
I am aware of the potential pitfalls of using with (nolock), but this was added in an unsuccessful attempt to prevent data-locks on my database. In fact, besides the fact that well-written code is obviously preferable, the main reason I am asking this question is to try and cut down the amount of places that could be causing the data-lock.
Is there any way my code could be improved?

Create an auxilliary table with sequences, with one row for every company (as you already did):
create table seq (company int, sequence int);
go
Seed the counters, one for every company (say there are two companies, 1 and 2):
insert seq values
(1, 1), (2, 1);
go
Then all you need is a way to both update and select the new value in a single statement to avoid race conditions. This is how to do it:
declare #next int;
declare #company int;
set #company = 2;
update seq
set #next = sequence = sequence + 1
where company = #company;
select #next
It would be nice to enclose this into a scalar function, but unfortunatelly no updates in functions are allowed. But you already have a stored procedure in place, so just modify the code in it.
And please tell me that the datatypes used are not really floats? Why not ints?

WHILE(1=1)
BEGIN
SELECT #v_position_no = max(POSITION_NO)
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
INSERT INTO DB.POSITION_TABLE
(COMPANY_NO, POSITION_NO)
SELECT TOP 1 #p_company_no, #v_position_no + 1
FROM DB.POSITION_TABLE with (nolock)
WHERE NOT EXISTS (SELECT 1
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
AND POSITION_NO = #v_position_no + 1)
IF(##ROWCOUNT > 0)
BREAK;
END
SET #return_value_argument = #v_position_no + 1
Note that this would only insert in the second statement if the POSITION_NO + 1 wasn't since added. If it was then it would try again.

Related

How do I use ##RowCount in a stored procedure, against rows in another table to work out the percentage?

Firstly, may I state that I'm aware of the ability to, e.g., create a new function, declare variables for rowcount1 and rowcount2, run a stored procedure that returns a subset of rows from a table, then determine the entire rowcount for that same table, assign it to the second variable and then 1 / 2 x 100....
However, is there a cleaner way to do this which doesn't result in numerous running of things like this stored procedure? Something like
select (count(*stored procedure name*) / select count(*) from table) x 100) as Percentage...
Sorry for the crap scenario!
EDIT: Someone has asked for more details. Ultimately, and to cut a very long story short, I wish to know what people would consider the quickest and most processor-concise method there would be to show the percentage of rows that are returned in the stored procedure, from ALL rows available in that table. Does that make more sense?
The code in the stored procedure is below:
SET #SQL = 'SELECT COUNT (DISTINCT c.ElementLabel), r.FirstName, r.LastName, c.LastReview,
CASE
WHEN c.LastReview < DateAdd(month, -1, GetDate()) THEN ''OUT of Date''
WHEN c.LastReview >= DateAdd(month, -1, GetDate()) THEN ''In Date''
WHEN c.LastReview is NULL THEN ''Not Yet Reviewed'' END as [Update Status]
FROM [Residents-'+#home_name+'] r
LEFT JOIN [CarePlans-'+#home_name+'] c ON r.PersonID = c.PersonID
WHERE r.Location = '''+#home_name+'''
AND CarePlanType = 0
GROUP BY r.LastName, r.FirstName, c.LastReview
HAVING COUNT(ELEMENTLABEL) >= 14
Thanks
Ant

I could not tell from your question if you are attempting to get the count and the result set in one query. If it is ok to execute the SP and separately calculate a table count then you could store the results of the stored procedure into a temp table.
CREATE TABLE #Results(ID INT, Value INT)
INSERT #Results EXEC myStoreProc #Parameter1, #Parameter2
SELECT
Result = ((SELECT COUNT(*) FROM #Results) / (select count(*) from table))* 100

How to extract every 7 characters of an nvarchar into another table?

I have an nvarchar(200) called ColumnA in Table1 that contains, for example, the value:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
I want to extract every 7 characters into Table2, ColumnB and end up with all of these values below.
ABCDEFG
BCDEFGH
CDEFGHI
DEFGHIJ
EFGHIJK
FGHIJKL
GHIJKLM
HIJKLMN
IJKLMNO
JKLMNOP
KLMNOPQ
LMNOPQR
MNOPQRS
NOPQRST
OPQRSTU
PQRSTUV
QRSTUVW
RSTUVWX
STUVWXY
TUVWXYZ
[Not the real table and column names.]
The data is being loaded to Table1 and Table2 in an SSIS Package, and I'm puzzling whether it is better to do the string handling in TSQL in a SQL Task or parse out the string in a VB Script Component.
[Yes, I think we're the last four on the planet using VB in Script Components. I cannot persuade the other three that this C# thing is here to stay. Although, maybe it is a perfect time to go rogue.]

You can use a recursive CTE calculating the offsets step by step and substring().
WITH
cte
AS
(
SELECT 1 n
UNION ALL
SELECT n + 1 n
FROM cte
WHERE n + 1 <= len('ABCDEFGHIJKLMNOPQRSTUVWXYZ') - 7 + 1
)
SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ', n, 7)
FROM cte;
db<>fiddle

If you have a physical numbers table, this is easy. If not, you can create a tally-on-the-fly:
DECLARE #string VARCHAR(100)='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
--We create the tally using ROW_NUMBER against any table with enough rows.
WITH Tally(Nmbr) AS
(SELECT TOP(LEN(#string)-6) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT Nmbr
,SUBSTRING(#string,Nmbr,7) AS FragmentOf7
FROM Tally
ORDER BY Nmbr;
The idea in short:
The tally returns a list of numbers from 1 to n (n=LEN(#string)-6). This Number is used in SUBSTRING to define the starting position.

You can do it with T-SQL like this:
DECLARE C CURSOR LOCAL FOR SELECT [ColumnA] FROM [Table1]
OPEN C
DECLARE #Val nvarchar(200);
FETCH NEXT FROM C into #Val
WHILE ##FETCH_STATUS = 0 BEGIN
DECLARE #I INTEGER;
SELECT #I = 1;
WHILE #I <= LEN(#vAL)-6 BEGIN
PRINT SUBSTRING(#Val, #I, 7)
SELECT #I = #I + 1
END
FETCH NEXT FROM C into #Val
END
CLOSE C

Script Component solution
Assuming that the input Column name is Column1
Add a script component
Open the script component configuration form
Go to Inputs and Outputs Tab
Click on the Output icon and set the Synchronous Input property to None
Add an Output column (example outColumn1)
In the Script editor, use a similar code in the row processing function:
Dim idx as integer = 0
While Row.Column1.length > idx + 7
Output0Buffer.AddRow()
Output0Buffer.outColumn1 = Row.
Column1.Substring(idx,7)
idx +=1
End While

SQL-Server Trigger on Insert into a table, 1 column into 6 in the same table

I have spent a lot of time investigating if this can be done outside of the database but to be honest I don't think so, well not very easily. We access the data in the tables via Access 2010 using VBA so I thought I could do it
via a action in the front end software. Easy to complete however there are two many permutations I cant control.
I have a table [TableData] with multiple columns. We have some externally supplied software that populates the table about 20-30 rows at a time. One of the fields [Fluctuation] currently allows us to transfer data up to 60 chars in length and our intention is to send data in the format 1.1,1.2,1.3,1.4,1.5,1.6 where we have six numbers of up to two decimal places separated by commas, no spaces. Column names Fluc1, Fluc2, Flu3 etc.
What I would like to do is create a trigger within the SQL database that operates once the row is inserted to split the above into six new columns only if 6 values separated by five commas exist.
I then need to complete maths on the 6 values but at least i will have them to complete the numbers to complete the maths on.
I have no knowledge of triggers so any help given would be very much appreciated.
Sample data examples are:
101.23,100.45,101.56,102.89,101,74,100.25
1.05,1.09,1.05,0.99,0.99,0.98
etc
I have VBA code to split the data and was going to do this via a SELECT query after the fact but as I cant control the data being entered from the external software thought a trigger would be more useful.
VBA code.
'This function returns the string data sperated by commas
Public Function FluctuationSeperation(strFluctuationData As String) As Variant
Dim strTest As String
Dim strArray() As String
Dim intCount As Integer
strArray = Split(strFluctuationData, ",")
Dim arr(5) As Variant
For intCount = LBound(strArray) To UBound(strArray)
arr(intCount) = Trim(strArray(intCount))
Next
FluctuationSeperation = arr
End Function

When writing a trigger you need to take care that it can launch for multiple inserted rows. There is inserted built in table alias available for that purpose. You need to iterate through all the inserted records and update them individually. You need to use your primary key (I have assumed a column id) to match inserted records with records to update.
CREATE TRIGGER TableData_ForInsert
ON [TableData]
AFTER INSERT
AS
BEGIN
DECLARE #id int
DECLARE #Fluctuation varchar(max)
DECLARE i CURSOR FOR
SELECT id, Fluctuation FROM inserted
FETCH NEXT FROM i INTO #id, #Fluctuation
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #pos1 int = charindex(',',#Fluctuation)
DECLARE #pos2 int = charindex(',',#Fluctuation, #pos1+1)
DECLARE #pos3 int = charindex(',',#Fluctuation, #pos2+1)
DECLARE #pos4 int = charindex(',',#Fluctuation, #pos3+1)
UPDATE [TableData]
SET fluc1 = ltrim(substring(#Fluctuation,1,#pos1-1)),
fluc2 = ltrim(substring(#Fluctuation,#pos1+1,#pos2-#pos1-1)),
fluc3 = ltrim(substring(#Fluctuation,#pos2+1,#pos3-#pos2-1)),
fluc4 = ltrim(substring(#Fluctuation,#pos3+1,#pos4-#pos3-1)),
fluc5 = ltrim(substring(#Fluctuation,#pos4+1,999))
WHERE id = #id -- need to find TableData record to update by inserted id
FETCH NEXT FROM i INTO #id, #Fluctuation
END
END
But because cursors are in many cases considered as a bad practice, it is better to write the same as a set based command. It can be achieved with APPLY clause like this:
CREATE TRIGGER TableData_ForInsert
ON [TableData]
AFTER INSERT
AS
BEGIN
UPDATE t SET
fluc1 = SUBSTRING(t.fluctuation, 0, i1.i),
fluc2 = SUBSTRING(t.fluctuation, i1.i+1, i2.i - i1.i -1),
fluc3 = SUBSTRING(t.fluctuation, i2.i+1, i3.i - i2.i -1),
fluc4 = SUBSTRING(t.fluctuation, i3.i+1, i4.i - i3.i -1),
fluc5 = SUBSTRING(t.fluctuation, i4.i+1, 999)
FROM [TableData] t
OUTER APPLY (select charindex(',', t.fluctuation) as i) i1
OUTER APPLY (select charindex(',', t.fluctuation, i1.i+1) as i) i2
OUTER APPLY (select charindex(',', t.fluctuation, i2.i+1) as i) i3
OUTER APPLY (select charindex(',', t.fluctuation, i3.i+1) as i) i4
JOIN INSERTED new ON new.ID = t.ID -- need to find TableData record to update by inserted id
END
This code example is missing handling malformed strings, it expects allways 5 numbers delimited by 4 commas.
For more tips how to split strings in SQL Server check this link.
Test case:
DECLARE #test TABLE
(
id int,
Fluctuation varchar(max),
fluc1 numeric(9,3) NULL,
fluc2 numeric(9,3) NULL,
fluc3 numeric(9,3) NULL,
fluc4 numeric(9,3) NULL,
fluc5 numeric(9,3) NULL
)
INSERT INTO #test (id, Fluctuation) VALUES(1, '1.2,5,8.52,6,7.521')
INSERT INTO #test (id, Fluctuation) VALUES(2, '2.2,6,9.52,7,8.521')
INSERT INTO #test (id, Fluctuation) VALUES(3, '2.5,3,4.52,9,7.522')
INSERT INTO #test (id, Fluctuation) VALUES(4, '2.53,4.52,97.522') -- this fails
UPDATE t SET
fluc1 = CASE WHEN i1.i<0 THEN NULL ELSE SUBSTRING(t.fluctuation, 0, i1.i) END,
fluc2 = CASE WHEN i2.i<0 THEN NULL ELSE SUBSTRING(t.fluctuation, i1.i+1, i2.i - i1.i -1) END,
fluc3 = CASE WHEN i3.i<0 THEN NULL ELSE SUBSTRING(t.fluctuation, i2.i+1, i3.i - i2.i -1) END,
fluc4 = CASE WHEN i4.i<0 THEN NULL ELSE SUBSTRING(t.fluctuation, i3.i+1, i4.i - i3.i -1) END,
fluc5 = CASE WHEN i4.i<0 THEN NULL ELSE SUBSTRING(t.fluctuation, i4.i+1, 999) END
FROM #test t
OUTER APPLY (select charindex(',', t.fluctuation) as i) i1
OUTER APPLY (select charindex(',', t.fluctuation, i1.i+1) as i) i2
OUTER APPLY (select charindex(',', t.fluctuation, i2.i+1) as i) i3
OUTER APPLY (select charindex(',', t.fluctuation, i3.i+1) as i) i4
SELECT * FROM #test

Performance issue with larger resultsets MSSQL

I currently have a stored procedure in MSSQL where I execute a SELECT-statement multiple times based on the variables I give the stored procedure. The stored procedure counts how many results are going to be returned for every filter a user can enable.
The stored procedure isn't the issue, I transformed the select statement from te stored procedure to a regular select statement which looks like:
DECLARE #contentRootId int = 900589
DECLARE #RealtorIdList varchar(2000) = ';880;884;1000;881;885;'
DECLARE #publishSoldOrRentedSinceDate int = 8
DECLARE #isForSale BIT= 1
DECLARE #isForRent BIT= 0
DECLARE #isResidential BIT= 1
--...(another 55 variables)...
--Table to be returned
DECLARE #resultTable TABLE
(
variableName varchar(100),
[value] varchar(200)
)
-- Create table based of inputvariable. Example: turns ';18;118;' to a table containing two ints 18 AND 118
DECLARE #RealtorIdTable table(RealtorId int)
INSERT INTO #RealtorIdTable SELECT * FROM dbo.Split(#RealtorIdList,';') option (maxrecursion 150)
INSERT INTO #resultTable ([value], variableName)
SELECT [Value], VariableName FROM(
Select count(*) as TotalCount,
ISNULL(SUM(CASE WHEN reps.ForRecreation = 1 THEN 1 else 0 end), 0) as ForRecreation,
ISNULL(SUM(CASE WHEN reps.IsQualifiedForSeniors = 1 THEN 1 else 0 end), 0) as IsQualifiedForSeniors,
--...(A whole bunch more SUM(CASE)...
FROM TABLE1 reps
LEFT JOIN temp t on
t.ContentRootID = #contentRootId
AND t.RealEstatePropertyID = reps.ID
WHERE
(EXISTS(select 1 from #RealtorIdTable where RealtorId = reps.RealtorID))
AND (#SelectedGroupIds IS NULL OR EXISTS(select 1 from #SelectedGroupIdtable where GroupId = t.RealEstatePropertyGroupID))
AND (ISNULL(reps.IsForSale,0) = ISNULL(#isForSale,0))
AND (ISNULL(reps.IsForRent, 0) = ISNULL(#isForRent,0))
AND (ISNULL(reps.IsResidential, 0) = ISNULL(#isResidential,0))
AND (ISNULL(reps.IsCommercial, 0) = ISNULL(#isCommercial,0))
AND (ISNULL(reps.IsInvestment, 0) = ISNULL(#isInvestment,0))
AND (ISNULL(reps.IsAgricultural, 0) = ISNULL(#isAgricultural,0))
--...(Around 50 more of these WHERE-statements)...
) as tbl
UNPIVOT (
[Value]
FOR [VariableName] IN(
[TotalCount],
[ForRecreation],
[IsQualifiedForSeniors],
--...(All the other things i selected in above query)...
)
) as d
select * from #resultTable
The combination of a Realtor- and contentID gives me a set default set of X amount of records. When I choose a Combination which gives me ~4600 records, the execution time is around 250ms. When I execute the sattement with a combination that gives me ~600 record, the execution time is about 20ms.
I would like to know why this is happening. I tried removing all SUM(CASE in the select, I tried removing almost everything from the WHERE-clause, and I tried removing the JOIN. But I keep seeing the huge difference between the resultset of 4600 and 600.

Table variables can perform worse when the number of records is large. Consider using a temporary table instead. See When should I use a table variable vs temporary table in sql server?
Also, consider replacing the UNPIVOT by alternative SQL code. Writing your own TSQL code will give you more control and even increase performance. See for example PIVOT, UNPIVOT and performance

Detecting changes in SQL Server 2000 table data

I have a periodic check of a certain query (which by the way includes multiple tables) to add informational messages to the user if something has changed since the last check (once a day).
I tried to make it work with checksum_agg(binary_checksum(*)), but it does not help, so this question doesn't help much, because I have a following case (oversimplified):
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
1 as status
union all
select 2 as id,
0 as status
) data
and
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
0 as status
union all
select 2 as id,
1 as status
) data
Both of the above cases result in the same check-sum, 49, and it is clear that the data has been changed.
This doesn't have to be a simple function or a simple solution, but I need some way to uniquely identify the difference like these in SQL server 2000.

checksum_agg appears to simply add the results of binary_checksum together for all rows. Although each row has changed, the sum of the two checksums has not (i.e. 17+32 = 16+33). This is not really the norm for checking for updates, but the recommendations I can come up with are as follows:
Instead of using checksum_agg, concatenate the checksums into a delimited string, and compare strings, along the lines of SELECT binary_checksum(*) + ',' FROM MyTable FOR XML PATH(''). Much longer string to check and to store, but there will be much less chance of a false positive comparison.
Instead of using the built-in checksum routine, use HASHBYTES to calculate MD5 checksums in 8000 byte blocks, and xor the results together. This will give you a much more resilient checksum, although still not bullet-proof (i.e. it is still possible to get false matches, but very much less likely). I'll paste the HASHBYTES demo code that I wrote below.
The last option, and absolute last resort, is to actually store the table table in XML format, and compare that. This is really the only way you can be absolutely certain of no false matches, but is not scalable and involves storing and comparing large amounts of data.
Every approach, including the one you started with, has pros and cons, with varying degrees of data size and processing requirements against accuracy. Depending on what level of accuracy you require, use the appropriate option. The only way to get 100% accuracy is to store all of the table data.
Alternatively, you can add a date_modified field to each table, which is set to GetDate() using after insert and update triggers. You can do SELECT COUNT(*) FROM #test WHERE date_modified > #date_last_checked. This is a more common way of checking for updates. The downside of this one is that deletions cannot be tracked.
Another approach is to create a modified table, with table_name (VARCHAR) and is_modified (BIT) fields, containing one row for each table you wish to track. Using insert, update and delete triggers, the flag against the relevant table is set to True. When you run your schedule, you check and reset the is_modified flag (in the same transaction) - along the lines of SELECT #is_modified = is_modified, is_modified = 0 FROM tblModified
The following script generates three result sets, each corresponding with the numbered list earlier in this response. I have commented which output correspond with which option, just before the SELECT statement. To see how the output was derived, you can work backwards through the code.
-- Create the test table and populate it
CREATE TABLE #Test (
f1 INT,
f2 INT
)
INSERT INTO #Test VALUES(1, 1)
INSERT INTO #Test VALUES(2, 0)
INSERT INTO #Test VALUES(2, 1)
/*******************
OPTION 1
*******************/
SELECT CAST(binary_checksum(*) AS VARCHAR) + ',' FROM #test FOR XML PATH('')
-- Declaration: Input and output MD5 checksums (#in and #out), input string (#input), and counter (#i)
DECLARE #in VARBINARY(16), #out VARBINARY(16), #input VARCHAR(MAX), #i INT
-- Initialize #input string as the XML dump of the table
-- Use this as your comparison string if you choose to not use the MD5 checksum
SET #input = (SELECT * FROM #Test FOR XML RAW)
/*******************
OPTION 3
*******************/
SELECT #input
-- Initialise counter and output MD5.
SET #i = 1
SET #out = 0x00000000000000000000000000000000
WHILE #i <= LEN(#input)
BEGIN
-- calculate MD5 for this batch
SET #in = HASHBYTES('MD5', SUBSTRING(#input, #i, CASE WHEN LEN(#input) - #i > 8000 THEN 8000 ELSE LEN(#input) - #i END))
-- xor the results with the output
SET #out = CAST(CAST(SUBSTRING(#in, 1, 4) AS INT) ^ CAST(SUBSTRING(#out, 1, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 5, 4) AS INT) ^ CAST(SUBSTRING(#out, 5, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 9, 4) AS INT) ^ CAST(SUBSTRING(#out, 9, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 13, 4) AS INT) ^ CAST(SUBSTRING(#out, 13, 4) AS INT) AS VARBINARY(4))
SET #i = #i + 8000
END
/*******************
OPTION 2
*******************/
SELECT #out