I have a periodic check of a certain query (which by the way includes multiple tables) to add informational messages to the user if something has changed since the last check (once a day).
I tried to make it work with checksum_agg(binary_checksum(*)), but it does not help, so this question doesn't help much, because I have a following case (oversimplified):
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
1 as status
union all
select 2 as id,
0 as status
) data
and
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
0 as status
union all
select 2 as id,
1 as status
) data
Both of the above cases result in the same check-sum, 49, and it is clear that the data has been changed.
This doesn't have to be a simple function or a simple solution, but I need some way to uniquely identify the difference like these in SQL server 2000.
checksum_agg appears to simply add the results of binary_checksum together for all rows. Although each row has changed, the sum of the two checksums has not (i.e. 17+32 = 16+33). This is not really the norm for checking for updates, but the recommendations I can come up with are as follows:
Instead of using checksum_agg, concatenate the checksums into a delimited string, and compare strings, along the lines of SELECT binary_checksum(*) + ',' FROM MyTable FOR XML PATH(''). Much longer string to check and to store, but there will be much less chance of a false positive comparison.
Instead of using the built-in checksum routine, use HASHBYTES to calculate MD5 checksums in 8000 byte blocks, and xor the results together. This will give you a much more resilient checksum, although still not bullet-proof (i.e. it is still possible to get false matches, but very much less likely). I'll paste the HASHBYTES demo code that I wrote below.
The last option, and absolute last resort, is to actually store the table table in XML format, and compare that. This is really the only way you can be absolutely certain of no false matches, but is not scalable and involves storing and comparing large amounts of data.
Every approach, including the one you started with, has pros and cons, with varying degrees of data size and processing requirements against accuracy. Depending on what level of accuracy you require, use the appropriate option. The only way to get 100% accuracy is to store all of the table data.
Alternatively, you can add a date_modified field to each table, which is set to GetDate() using after insert and update triggers. You can do SELECT COUNT(*) FROM #test WHERE date_modified > #date_last_checked. This is a more common way of checking for updates. The downside of this one is that deletions cannot be tracked.
Another approach is to create a modified table, with table_name (VARCHAR) and is_modified (BIT) fields, containing one row for each table you wish to track. Using insert, update and delete triggers, the flag against the relevant table is set to True. When you run your schedule, you check and reset the is_modified flag (in the same transaction) - along the lines of SELECT #is_modified = is_modified, is_modified = 0 FROM tblModified
The following script generates three result sets, each corresponding with the numbered list earlier in this response. I have commented which output correspond with which option, just before the SELECT statement. To see how the output was derived, you can work backwards through the code.
-- Create the test table and populate it
CREATE TABLE #Test (
f1 INT,
f2 INT
)
INSERT INTO #Test VALUES(1, 1)
INSERT INTO #Test VALUES(2, 0)
INSERT INTO #Test VALUES(2, 1)
/*******************
OPTION 1
*******************/
SELECT CAST(binary_checksum(*) AS VARCHAR) + ',' FROM #test FOR XML PATH('')
-- Declaration: Input and output MD5 checksums (#in and #out), input string (#input), and counter (#i)
DECLARE #in VARBINARY(16), #out VARBINARY(16), #input VARCHAR(MAX), #i INT
-- Initialize #input string as the XML dump of the table
-- Use this as your comparison string if you choose to not use the MD5 checksum
SET #input = (SELECT * FROM #Test FOR XML RAW)
/*******************
OPTION 3
*******************/
SELECT #input
-- Initialise counter and output MD5.
SET #i = 1
SET #out = 0x00000000000000000000000000000000
WHILE #i <= LEN(#input)
BEGIN
-- calculate MD5 for this batch
SET #in = HASHBYTES('MD5', SUBSTRING(#input, #i, CASE WHEN LEN(#input) - #i > 8000 THEN 8000 ELSE LEN(#input) - #i END))
-- xor the results with the output
SET #out = CAST(CAST(SUBSTRING(#in, 1, 4) AS INT) ^ CAST(SUBSTRING(#out, 1, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 5, 4) AS INT) ^ CAST(SUBSTRING(#out, 5, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 9, 4) AS INT) ^ CAST(SUBSTRING(#out, 9, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 13, 4) AS INT) ^ CAST(SUBSTRING(#out, 13, 4) AS INT) AS VARBINARY(4))
SET #i = #i + 8000
END
/*******************
OPTION 2
*******************/
SELECT #out
Related
I have a situation where I have created script to select data in our company's environment. In doing so, I decided to use functions for some pattern matching and stripping of characters in a CASE WHEN.
However, one of our clients doesn't want to let us put their data in our local environment, so I now have the requirement of massaging the script to be able to run on their environment--essentially meaning I need to remove the functions, and I am having trouble thinking about how I need to move stuff around to do so.
An example of the function call would be:
SELECT ....
CASE WHEN Prp = 'Key Cabinet'
AND SerialNumber IS NOT NULL
AND dbo.fnRemoveNonNumericCharacters(SerialNumber) <> ''
THEN dbo.fnRemoveNonNumericCharacters(SerialNumber)
....
INTO #EmpProperty
FROM ....
Where Prp is a column that contains the property type and SerialNumber is a column that contains a serial number, but also some other random garbage because data entry was sloppy.
The function definition is:
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
RETURN #strText
where #strText is the SerialNumber I am passing in.
I may be stuck in analysis paralysis because I just can't figure out a good way to do this. I don't need a full on solution per-say, perhaps just point me in a direction you know will work. Let me know if you would like some sample DDL/DML to mess around with stuff.
Example 'SerialNumber' values: CA100 (Trash bins), T110, 101B.
There are also a bunch of other types of values such as all text or all numbers, but we are filtering those out. The current patterning matching is good enough.
So I think you mean you can't use a function... so, perhaps:
declare #table table (SomeCol varchar(4000))
insert into #table values
('1 ab2cdefghijk3lmnopqr4stuvwxyz5 6 !7##$8%^&9*()-10_=11+[]{}12\|;:13></14? 15'),
('CA100 (Trash bins), T110, 101B')
;with cte as (
select top (100)
N=row_number() over (order by ##spid) from sys.all_columns),
Final as (
select SomeCol, Col
from #table
cross apply (
select (select X + ''
from (select N, substring(SomeCol, N, 1) X
from cte
where N<=datalength(SomeCol)) [1]
where X between '0' and '9'
order by N
for xml path(''))
) Z (Col)
where Z.Col is not NULL
)
select
SomeCol
,cast(Col as varchar) CleanCol --change this to BIGINT if it isn't too large
from Final
I am being passed the following parameter to my stored procedure -
#AddOns = 23:2,33:1,13:5
I need to split the string by the commas using this -
SET #Addons = #Addons + ','
set #pos = 0
set #len - 0
While CHARINDEX(',', #Addons, #pos+1)>0
Begin
SET #len = CHARINDEX(','), #Addons, #pos+1) - #pos
SET #value = SUBSTRING(#Addons, #pos, #len)
So now #value = 23:2 and I need to get 23 which is my ID and 2 which is my quantity. Here is the rest of my code -
INSERT INTO TABLE(ID, Qty)
VALUES(#ID, #QTY)
set #pos = CHARINDEX(',', #Addons, #pos+#len) + 1
END
So what is the best way to get the values of 23 and 2 in separate fields to us in the INSERT statement?
First you would split the sets of key-value pairs into rows (and it looks like you already got that far), and then you get the position of the colon and use that to do two SUBSTRING operations to split the key and value apart.
Also, this can be done much more efficiently than storing each row's key and value into separate variables just to get inserted into a table. If you INSERT from the SELECT that breaks this data apart, it will be a set-based operation instead of row-by-row.
For example:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
;WITH pairs AS
(
SELECT [SplitVal] AS [Value], CHARINDEX(N':', [SplitVal]) AS [ColonIndex]
FROM SQL#.String_Split(#AddOns, N',', 1) -- https://SQLsharp.com/
)
SELECT *,
SUBSTRING(pairs.[Value], 1, pairs.[ColonIndex] - 1) AS [ID],
SUBSTRING(pairs.[Value], pairs.[ColonIndex] + 1, 1000) AS [QTY]
FROM pairs;
/*
Value ColonIndex ID QTY
23:2 3 23 2
33:1 3 33 1
13:5 3 13 5
999:45 4 999 45
*/
GO
For that example I am using a SQLCLR string splitter found in the SQL# library (that I am the author of), which is available in the Free version. You can use whatever splitter you like, including the built-in STRING_SPLIT that was introduced in SQL Server 2016.
It would be used as follows:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
;WITH pairs AS
(
SELECT [value] AS [Value], CHARINDEX(N':', [value]) AS [ColonIndex]
FROM STRING_SPLIT(#AddOns, N',') -- built-in function starting in SQL Server 2016
)
INSERT INTO dbo.TableName (ID, QTY)
SELECT SUBSTRING(pairs.[Value], 1, pairs.[ColonIndex] - 1) AS [ID],
SUBSTRING(pairs.[Value], pairs.[ColonIndex] + 1, 1000) AS [QTY]
FROM pairs;
Of course, the Full (i.e. paid) version of SQL# includes an additional splitter designed to handle key-value pairs. It's called String_SplitKeyValuePairs and works as follows:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
SELECT *
FROM SQL#.String_SplitKeyValuePairs(#AddOns, N',', N':', 1, NULL, NULL, NULL);
/*
KeyID Key Value
1 23 2
2 33 1
3 13 5
4 999 45
*/
GO
So, it would be used as follows:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
INSERT INTO dbo.[TableName] ([Key], [Value])
SELECT kvp.[Key], kvp.[Value]
FROM SQL#.String_SplitKeyValuePairs(#AddOns, N',', N':', 1, NULL, NULL, NULL) kvp;
Check out this blog post...
http://www.sqlservercentral.com/blogs/querying-microsoft-sql-server/2013/09/19/how-to-split-a-string-by-delimited-char-in-sql-server/
Noel
I am going to make another attempt at this inspired by the answer given by #gofr1 on this question...
How to insert bulk of column data to temp table?
That answer showed how to use an XML variable and the nodes method to split comma separated data and insert it into individual columns in a table. It seemed to me to be very similar to what you were trying to do here.
Check out this SQL. It certainly isn't has concise as just having "split" function, but it seems better than chopping up the string based on position of the colon.
Noel
I want to make unique random alphanumeric sequence to be the primary key for a database table.
Each char in the sequence is either a letter (a-z) or number (0-9)
Examples for what I want :
kl7jd6fgw
zjba3s0tr
a9dkfdue3
I want to make a function that could handle that task!
You can use an uniqueidentifier. This can be generated with the NEWID() function:
SELECT NEWID()
will return something like:
BE228C22-C18A-4B4A-9AD5-1232462F7BA9
It is a very bad idea to use random strings as a primary key.
It will effect performance as well as storage size, and you will be much better of using an int or a bigint with an identity property.
However, generating a random string in SQL maybe useful for other things, and this is why I offer this solution:
Create a table to hold permitted char values.
In my example the permitted chars are 0-9 and A-Z.
CREATE TABLE Chars (C char(1))
DECLARE #i as int = 0
WHILE #i < 10
BEGIN
INSERT INTO Chars (C) VALUES (CAST(#i as Char(1)))
SET #i = #i+1
END
SET #i = 65
WHILE #i < 91
BEGIN
INSERT INTO Chars (C) VALUES (CHAR(#i))
SET #i = #i+1
END
Then use this simple select statement to generate a random string from this table:
SELECT TOP 10 C AS [text()]
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
The advantages:
You can easily control the allowed characters.
The generation of a new string is a simple select statement and not manipulation on strings.
The disadvantages:
This select results with an ugly name (i.e XML_F52E2B61-18A1-11d1-B105-00805F49916B). This is easily solved by setting the result into a local variable.
Characters will only appear once in every string. This can easily be solved by adding union:
example:
SELECT TOP 10 C AS [text()]
FROM (
SELECT * FROM Chars
UNION ALL SELECT * FROM Chars
) InnerSelect
ORDER BY NEWID()
FOR XML PATH('')
Another option is to use STUFF function instead of As [Text()] to eliminate those pesky XML tags:
SELECT STUFF((
SELECT TOP 100 ''+ C
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
), 1, 1, '') As RandomString;
This option doesn't have the disadvantage of the ugly column name, and can have an alias directly. Execution plan is a little different but it should not suffer a lot of performance lose.
Play with it yourself in this Sql Fiddle
If there are any more advantages / disadvantages you think of please leave a comment. Thanks.
NewID() Function will generate unique numbers.So i have incremented them with loop and picked up the combination of alpha numeric characters using Charindex and Left functions
;with list as
(
select 1 as id,newid() as val
union all
select id + 1,NEWID()
from list
where id + 1 < 100
)
select ID,left(val, charindex('-', val) - 2) from list
option (maxrecursion 0)
The drawback of NEWID() for this request is it limits the character pool to 0-9 and A-F. To define your own character pool, you have to role a custom solution.
This solution adapted from Generating random strings with T-SQL
--Define list of characters to use in random string
DECLARE #CharPool VARCHAR(255)
SET #CharPool = '0123456789abcdefghijkmnopqrstuvwxyz'
--Store length of CharPool for use later
DECLARE #PoolLength TINYINT
SET #PoolLength = LEN(#CharPool) --36
--Define random string length
DECLARE #StringLength TINYINT
SET #StringLength = 9
--Declare target parameter for random string
DECLARE #RandomString VARCHAR(255)
SET #RandomString = ''
--Loop control variable
DECLARE #LoopCount TINYINT
SET #LoopCount = 0
--For each char in string, choose random char from char pool
WHILE(#LoopCount < #StringLength)
BEGIN
SELECT #RandomString += SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 1)
SELECT #LoopCount += 1
END
SELECT #RandomString
http://sqlfiddle.com/#!6/9eecb/4354
I must reiterate, however, that I agree with the others: this is a horrible idea.
I store positions in a SQL Server 2012 database, where each position is defined by a position number and a company number.
The position numbers are unique for each company only.
For instance, my database could have the following
POSITION_NO COMPANY_NO
1 1
2 1
3 1
1 2
2 2
3 2
1 3
I need a function which takes a company number as a parameter, and returns the next sequential position number, which in the example table above would be 2 for COMPANY_NO = 3
What I use at the moment is:
CREATE PROCEDURE [DB].[GenerateKey]
#p_company_no float(53),
#return_value_argument float(53) OUTPUT
AS
BEGIN
DECLARE
#v_position_no numeric(5, 0)
SELECT #v_position_no = max(POSITION_NO) + 1
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
SET #return_value_argument = #v_position_no
RETURN
END
I am aware of the potential pitfalls of using with (nolock), but this was added in an unsuccessful attempt to prevent data-locks on my database. In fact, besides the fact that well-written code is obviously preferable, the main reason I am asking this question is to try and cut down the amount of places that could be causing the data-lock.
Is there any way my code could be improved?
Create an auxilliary table with sequences, with one row for every company (as you already did):
create table seq (company int, sequence int);
go
Seed the counters, one for every company (say there are two companies, 1 and 2):
insert seq values
(1, 1), (2, 1);
go
Then all you need is a way to both update and select the new value in a single statement to avoid race conditions. This is how to do it:
declare #next int;
declare #company int;
set #company = 2;
update seq
set #next = sequence = sequence + 1
where company = #company;
select #next
It would be nice to enclose this into a scalar function, but unfortunatelly no updates in functions are allowed. But you already have a stored procedure in place, so just modify the code in it.
And please tell me that the datatypes used are not really floats? Why not ints?
WHILE(1=1)
BEGIN
SELECT #v_position_no = max(POSITION_NO)
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
INSERT INTO DB.POSITION_TABLE
(COMPANY_NO, POSITION_NO)
SELECT TOP 1 #p_company_no, #v_position_no + 1
FROM DB.POSITION_TABLE with (nolock)
WHERE NOT EXISTS (SELECT 1
FROM DB.POSITION_TABLE with (nolock)
WHERE COMPANY_NO = #p_company_no
AND POSITION_NO = #v_position_no + 1)
IF(##ROWCOUNT > 0)
BREAK;
END
SET #return_value_argument = #v_position_no + 1
Note that this would only insert in the second statement if the POSITION_NO + 1 wasn't since added. If it was then it would try again.
I am stuck on converting a varchar column UserID to INT. I know, please don't ask why this UserID column was not created as INT initially, long story.
So I tried this, but it doesn't work. and give me an error:
select CAST(userID AS int) from audit
Error:
Conversion failed when converting the varchar value
'1581............................................................................................................................' to data type int.
I did select len(userID) from audit and it returns 128 characters, which are not spaces.
I tried to detect ASCII characters for those trailing after the ID number and ASCII value = 0.
I have also tried LTRIM, RTRIM, and replace char(0) with '', but does not work.
The only way it works when I tell the fixed number of character like this below, but UserID is not always 4 characters.
select CAST(LEFT(userID, 4) AS int) from audit
You could try updating the table to get rid of these characters:
UPDATE dbo.[audit]
SET UserID = REPLACE(UserID, CHAR(0), '')
WHERE CHARINDEX(CHAR(0), UserID) > 0;
But then you'll also need to fix whatever is putting this bad data into the table in the first place. In the meantime perhaps try:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), ''))
FROM dbo.[audit];
But that is not a long term solution. Fix the data (and the data type while you're at it). If you can't fix the data type immediately, then you can quickly find the culprit by adding a check constraint:
ALTER TABLE dbo.[audit]
ADD CONSTRAINT do_not_allow_stupid_data
CHECK (CHARINDEX(CHAR(0), UserID) = 0);
EDIT
Ok, so that is definitely a 4-digit integer followed by six instances of CHAR(0). And the workaround I posted definitely works for me:
DECLARE #foo TABLE(UserID VARCHAR(32));
INSERT #foo SELECT 0x31353831000000000000;
-- this succeeds:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), '')) FROM #foo;
-- this fails:
SELECT CONVERT(INT, UserID) FROM #foo;
Please confirm that this code on its own (well, the first SELECT, anyway) works for you. If it does then the error you are getting is from a different non-numeric character in a different row (and if it doesn't then perhaps you have a build where a particular bug hasn't been fixed). To try and narrow it down you can take random values from the following query and then loop through the characters:
SELECT UserID, CONVERT(VARBINARY(32), UserID)
FROM dbo.[audit]
WHERE UserID LIKE '%[^0-9]%';
So take a random row, and then paste the output into a query like this:
DECLARE #x VARCHAR(32), #i INT;
SET #x = CONVERT(VARCHAR(32), 0x...); -- paste the value here
SET #i = 1;
WHILE #i <= LEN(#x)
BEGIN
PRINT RTRIM(#i) + ' = ' + RTRIM(ASCII(SUBSTRING(#x, #i, 1)))
SET #i = #i + 1;
END
This may take some trial and error before you encounter a row that fails for some other reason than CHAR(0) - since you can't really filter out the rows that contain CHAR(0) because they could contain CHAR(0) and CHAR(something else). For all we know you have values in the table like:
SELECT '15' + CHAR(9) + '23' + CHAR(0);
...which also can't be converted to an integer, whether you've replaced CHAR(0) or not.
I know you don't want to hear it, but I am really glad this is painful for people, because now they have more war stories to push back when people make very poor decisions about data types.
This question has got 91,000 views so perhaps many people are looking for a more generic solution to the issue in the title "error converting varchar to INT"
If you are on SQL Server 2012+ one way of handling this invalid data is to use TRY_CAST
SELECT TRY_CAST (userID AS INT)
FROM audit
On previous versions you could use
SELECT CASE
WHEN ISNUMERIC(RTRIM(userID) + '.0e0') = 1
AND LEN(userID) <= 11
THEN CAST(userID AS INT)
END
FROM audit
Both return NULL if the value cannot be cast.
In the specific case that you have in your question with known bad values I would use the following however.
CAST(REPLACE(userID COLLATE Latin1_General_Bin, CHAR(0),'') AS INT)
Trying to replace the null character is often problematic except if using a binary collation.
This is more for someone Searching for a result, than the original post-er. This worked for me...
declare #value varchar(max) = 'sad';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 0
declare #value varchar(max) = '3';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 3
I would try triming the number to see what you get:
select len(rtrim(ltrim(userid))) from audit
if that return the correct value then just do:
select convert(int, rtrim(ltrim(userid))) from audit
if that doesn't return the correct value then I would do a replace to remove the empty space:
select convert(int, replace(userid, char(0), '')) from audit
This is how I solved the problem in my case:
First of all I made sure the column I need to convert to integer doesn't contain any spaces:
update data set col1 = TRIM(col1)
I also checked whether the column only contains numeric digits.
You can check it by:
select * from data where col1 like '%[^0-9]%' order by col1
If any nonnumeric values are present, you can save them to another table and remove them from the table you are working on.
select * into nonnumeric_data from data where col1 like '%[^0-9]%'
delete from data where col1 like '%[^0-9]%'
Problems with my data were the cases above. So after fixing them, I created a bigint variable and set the values of the varchar column to the integer column I created.
alter table data add int_col1 bigint
update data set int_col1 = CAST(col1 AS VARCHAR)
This worked for me, hope you find it useful as well.