Update inserts different hash values than a select does

Update inserts different hash values than a select does - sql-server

I am trying to update a new column in my SQL Server user table with the hash of another column.
when I test the conversion with a select statement it will return the correct MD5 hash that I get from online hash generators.
select CONVERT(VARCHAR(32), HashBytes('MD5', 'valuetohash'), 2)
when I use this same conversion in an update statement as shown below I get a different value, inserted then the select statement with the same value hashed.
UPDATE users SET [newcolumn1] = CONVERT(VARCHAR(32), HashBytes('MD5', column1), 2)
what am I doing wrong?

The value you have in users.column1 does not exactly match the value you are manually passing through HashBytes as a test. To confirm that this works when the values are the same, try:
DECLARE #users TABLE (
column1 VARCHAR(100),
newcolumn1 VARCHAR(32)
)
INSERT INTO #users
SELECT 'some text', NULL
SELECT CONVERT(VARCHAR(32), HashBytes('MD5', 'some text'), 2)
UPDATE #users SET newcolumn1 = CONVERT(VARCHAR(32), HashBytes('MD5', column1), 2)
SELECT newcolumn1 FROM #users
You'll see that the results you get from each SELECT are the same, because the values of 'some text' and #users.column1 are identical.
Try comparing your values first:
SELECT CASE WHEN column1 = 'expectedValue'
THEN 'MATCH'
ELSE 'DIFFERENCE'
END AS MatchCheck
FROM users
or
SELECT column1
FROM users
WHERE column1 = 'expectedValue'
If you get results from the first query where MatchCheck = 'MATCH' or results from the second query at all, then you should also get results form your UPDATE which give the hash you expect, as the values are the same.
As mentioned by ughai in the comments, it's most likely you have some spaces or non-printable characters in the values in your database which you are not including when you dry-run the hashing, hence the different results.

Related

'String or binary data would be truncated' without any data exceeding the length

Yesterday suddenly a report occurred that someone was not able to get some data anymore because the issue Msg 2628, Level 16, State 1, Line 57 String or binary data would be truncated in table 'tempdb.dbo.#BC6D141E', column 'string_2'. Truncated value: '!012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678'. appeared.
I was unable to create a repro without our tables. This is the closest as I can get to:
-- Create temporary table for results
DECLARE #results TABLE (
string_1 nvarchar(100) NOT NULL,
string_2 nvarchar(100) NOT NULL
);
CREATE TABLE #table (
T_ID BIGINT NULL,
T_STRING NVARCHAR(1000) NOT NULL
);
INSERT INTO #table VALUES
(NULL, '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'),
(NULL, '!0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789!');
WITH abc AS
(
SELECT
'' AS STRING_1,
t.T_STRING AS STRING_2
FROM
UT
INNER JOIN UTT ON UTT.UT_ID = UT.UT_ID
INNER JOIN MV ON MV.UTT_ID = UTT.UTT_ID
INNER JOIN OT ON OT.OT_ID = MV.OT_ID
INNER JOIN #table AS T ON T.T_ID = OT.T_ID -- this will never get hit because T_ID of #table is NULL
)
INSERT INTO #results
SELECT STRING_1, STRING_2 FROM abc
ORDER BY LEN(STRING_2) DESC
DROP TABLE #table;
As you can see the join of #table cannot yield any results because all T_ID are NULL nevertheless I am getting the error mentioned above. The result set is empty.
That would be okay if a text with more than 100 characters would be in the result set but that is not the case because it is empty. If I remove the INSERT INTO #results and display the results it does not contain any text with more than 100 characters. The ORDER BY was only used to determine the faulty text value (with the original data).
When I use SELECT STRING_1, LEFT(STRING_2, 100) FROM abc it does work but it does not contain the text either that is meant to be truncated.
Therefore: What am I missing? Is it a bug of SQL Server?

-- this will never get hit is a bad assumption. It is well known and documented that SQL Server may try to evaluate parts of your query before it's obvious that the result is impossible.
A much simpler repro (from this post and this db<>fiddle):
CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int NOT NULL);
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id) VALUES (1), (3), (4), (5);
GO
DECLARE #t table(dest varchar(3) NOT NULL);
INSERT #t(dest) SELECT t1.s
FROM dbo.t1
INNER JOIN dbo.t2 ON t1.id = t2.id;
Result:
Msg 2628, Level 16, State 1
String or binary data would be truncated in table 'tempdb.dbo.#AC65D70E', column 'dest'. Truncated value: 'len'.
While we should have only retrieved rows with values that fit in the destination column (id is 1 or 3, since those are the only two rows that match the join criteria), the error message indicates that the row where id is 2 was also returned, even though we know it couldn't possibly have been.
Here's the estimated plan:
This shows that SQL Server expected to convert all of the values in t1 before the filter eliminated the longer ones. And it's very difficult to predict or control when SQL Server will process your query in an order you don't expect - you can try with query hints that attempt to either force order or to stay away from hash joins but those can cause other, more severe problems later.
The best fix is to size the temp table to match the source (in other words, make it large enough to fit any value from the source). The blog post and db<>fiddle explain some other ways to work around the issue, but declaring columns to be wide enough is the simplest and least intrusive.

Sybase - how do I return the first value that exists from a condition in SQL?

Say I'm trying to return some results where a column in a table matches a condition I set. But I only want to return the first result from a list of possible values in the condition. Is there a quick and easy way to do that? I'm thinking that I can use coalesce somehow, but not sure how I can structure it.
Something like:
select identifier,purpose from table
where identifier = 'letters'
and purpose = coalesce('A','B','C')
group by purpose
So in the table, if A purpose isn't there, then I only want the B purpose to show up. if it isn't there, then I want the C to show up, if none of them are there, then I would ideally like a null or no results to be returned. I'd rather not make several case statements where if A is null then look to B, then if B is null to look to C. Is there a quick way syntactically to do so?
Edit: I also want this to work if I have multiple identifiers I list, such as:
select identifier,purpose from table
where identifier in ('letters1', 'letters2')
and purpose = coalesce('A','B','C')
group by purpose
where I return two results if they exist - one purpose for each identifier, with the purpose in the order of importance for A first, then B, then C, or null if none exist.
Unforunately my reasoning for caolesce doesn't work above, as none of the variables are null so my query will just try to return all purposes of 'A' without the fallback that I intend my query to do. I want to try and avoid using temp tables if possible.

Sybase ASE does not have support for the row_number() function (else this would be fairly simple), so one option would be to use a #temp table to simulate (to some extent) row_number() functionality.
Some sample data:
create table mytab
(identifier varchar(30)
,purpose varchar(30)
)
go
insert mytab values ('letters1','A')
insert mytab values ('letters1','B')
insert mytab values ('letters1','C')
insert mytab values ('letters2','A')
insert mytab values ('letters2','B')
insert mytab values ('letters2','C')
go
The #temp table is created with an identity column plus a 2nd column to hold the items you wish to prioritize; priority is determined by the order in which the rows are inserted into the #temp table.
create table #priority
(id smallint identity
,purpose varchar(30))
go
insert #priority (purpose)
select 'A' -- first priority
union all
select 'B' -- second priority
union all
select 'C' -- last priority
go
select * from #priority order by id
go
id purpose
------ -------
1 A
2 B
3 C
We'll use a derived table to find the highest priority purpose (ie, minimal id value). We then join this minimal id back to #priority to generate the final result set:
select dt.identifier,
p.purpose
from (-- join mytab with #priority, keeping only the minimal priority id of the rows that exist:
select m.identifier,
min(p.id) as min_id
from mytab m
join #priority p
on p.purpose = m.purpose
group by m.identifier) dt
-- join back to #priority to convert min(id) into the actual purpose:
join #priority p
on p.id = dt.min_id
order by 1
go
Some test runs with different set of mytab data:
/* contents of mytab:
insert mytab values ('letters1','A')
insert mytab values ('letters1','B')
insert mytab values ('letters1','C')
insert mytab values ('letters2','A')
insert mytab values ('letters2','B')
insert mytab values ('letters2','C')
*/
identifier purpose
---------- -------
letters1 A
letters2 A
/* contents of mytab:
--insert mytab values ('letters1','A')
--insert mytab values ('letters1','B')
insert mytab values ('letters1','C')
--insert mytab values ('letters2','A')
insert mytab values ('letters2','B')
insert mytab values ('letters2','C')
*/
identifier purpose
---------- -------
letters1 C
letters2 B
Returning NULL if a row does not exist is not going to be easy since generating a NULL requires existence of a row ... somewhere ... with which to associate the NULL.
One idea would be to expand on the #temp table idea by creating another #temp table (eg, #identifiers) with the list of desired identifier values you wish to search on. You could then make use of a left (outer) join from #identifiers to mytab to ensure you always generate a result record for each identifier.

Find and change specific number in first part out of string and replace it

I start to work on old project and there is sql server database column which stores articles numbers for example as follows:
11.1006.45
11.1006.46
11.1006.47
01.10012.11
01.10012.12
2.234.1
2.234.2
2.234.3
657.104324.32
Every number contains 3 parts. First part describe what producent it is and that's something i have to change when user choose diffrent number for specific producent. For example producent number 2 will be now 13 so according to our examples:
2.234.1
2.234.2
2.234.3
has to be done this way right now:
13.234.1
13.234.2
13.234.3
I am looking for sql query which would find all records where producent number is e.g 2.xxxxx and then replace to 13.xxxxx. I would like this query to be secure to avoid any issues with numbers replacments.Hope you understand what i ment.

You could use this for update. '2. and 13.' could be any other string
DECLARE #SampleTable AS TABLE
(
Version varchar(100)
)
INSERT INTO #SampleTable
VALUES
('11.1006.45'),
('11.1006.46'),
('11.1006.47'),
('01.10012.11'),
('01.10012.12'),
('2.234.1'),
('2.234.2'),
('2.234.3'),
('657.104324.32')
UPDATE #SampleTable
SET
Version = '13.' + substring(Version, charindex('.', Version) + 1, len(Version) - charindex('.', Version))
WHERE Version LIKE '2.%'
SELECT * FROM #SampleTable st
Demo link: Rextester

update t
set t.col= replace(yourcol,substring(yourcol,1,charindex('.',yourcol,1),2)
from table t
this finds first character before first dot
substring(yourcol,1,charindex('.',yourcol,1)
then you use replace ,to replace it with whatever you need

You can use this query for multiple updation,
DECLARE #Temp AS TABLE
(
ArtNo VARCHAR(100)
)
INSERT INTO #Temp
VALUES
('11.1006.45'),
('11.1006.46'),
('11.1006.47'),
('01.10012.11'),
('01.10012.12'),
('2.234.1'),
('2.234.2'),
('2.234.3'),
('657.104324.32')
UPDATE #Temp
SET ArtNo = CASE WHEN SUBSTRING(ArtNo,1,CHARINDEX('.',ArtNo)-1) = '2' THEN STUFF(ArtNo,1,CHARINDEX('.',ArtNo)-1,'13')
WHEN SUBSTRING(ArtNo,1,CHARINDEX('.',ArtNo)-1) = '11' THEN STUFF(ArtNo,1,CHARINDEX('.',ArtNo)-1,'15')
ELSE ArtNo
END
SELECT * FROM #Temp

Most effective way to check sub-string exists in comma-separated string in SQL Server

I have a comma-separated list column available which has values like
Product1, Product2, Product3
I need to search whether the given product name exists in this column.
I used this SQL and it is working fine.
Select *
from ProductsList
where productname like '%Product1%'
This query is working very slowly. Is there a more efficient way I can search for a product name in the comma-separated list to improve the performance of the query?
Please note I have to search comma separated list before performing any other select statements.

user defined functions for comma separation of the string
Create FUNCTION [dbo].[BreakStringIntoRows] (#CommadelimitedString varchar(max))
RETURNS #Result TABLE (Column1 VARCHAR(max))
AS
BEGIN
DECLARE #IntLocation INT
WHILE (CHARINDEX(',', #CommadelimitedString, 0) > 0)
BEGIN
SET #IntLocation = CHARINDEX(',', #CommadelimitedString, 0)
INSERT INTO #Result (Column1)
--LTRIM and RTRIM to ensure blank spaces are removed
SELECT RTRIM(LTRIM(SUBSTRING(#CommadelimitedString, 0, #IntLocation)))
SET #CommadelimitedString = STUFF(#CommadelimitedString, 1, #IntLocation, '')
END
INSERT INTO #Result (Column1)
SELECT RTRIM(LTRIM(#CommadelimitedString))--LTRIM and RTRIM to ensure blank spaces are removed
RETURN
END
Declare #productname Nvarchar(max)
set #productname='Product1,Product2,Product3'
select * from product where [productname] in(select * from [dbo].[![enter image description here][1]][1][BreakStringIntoRows](#productname))

Felix is right and the 'right answer' is to normalize your table. Although, maybe you have 500k lines of code that expect this column to exist as it is. So your next best (non-destructive) answer is:
Create a table to hold normalize data:
CREATE TABLE ProductsList2 (ProductId INT, ProductName VARCHAR)
Create a TRIGGER that on UPDATE/INSERT/DELETE maintains ProductList2 by splitting the string 'Product1,Product2,Product3' into three records.
Index your new table.
Query against your new table:
SELECT *
FROM ProductsList
WHERE ProductId IN (SELECT x.ProductId
FROM ProductsList2 x
WHERE x.ProductName = 'Product1')

SQL Server : converting varchar to INT

I am stuck on converting a varchar column UserID to INT. I know, please don't ask why this UserID column was not created as INT initially, long story.
So I tried this, but it doesn't work. and give me an error:
select CAST(userID AS int) from audit
Error:
Conversion failed when converting the varchar value
'1581............................................................................................................................' to data type int.
I did select len(userID) from audit and it returns 128 characters, which are not spaces.
I tried to detect ASCII characters for those trailing after the ID number and ASCII value = 0.
I have also tried LTRIM, RTRIM, and replace char(0) with '', but does not work.
The only way it works when I tell the fixed number of character like this below, but UserID is not always 4 characters.
select CAST(LEFT(userID, 4) AS int) from audit

You could try updating the table to get rid of these characters:
UPDATE dbo.[audit]
SET UserID = REPLACE(UserID, CHAR(0), '')
WHERE CHARINDEX(CHAR(0), UserID) > 0;
But then you'll also need to fix whatever is putting this bad data into the table in the first place. In the meantime perhaps try:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), ''))
FROM dbo.[audit];
But that is not a long term solution. Fix the data (and the data type while you're at it). If you can't fix the data type immediately, then you can quickly find the culprit by adding a check constraint:
ALTER TABLE dbo.[audit]
ADD CONSTRAINT do_not_allow_stupid_data
CHECK (CHARINDEX(CHAR(0), UserID) = 0);
EDIT
Ok, so that is definitely a 4-digit integer followed by six instances of CHAR(0). And the workaround I posted definitely works for me:
DECLARE #foo TABLE(UserID VARCHAR(32));
INSERT #foo SELECT 0x31353831000000000000;
-- this succeeds:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), '')) FROM #foo;
-- this fails:
SELECT CONVERT(INT, UserID) FROM #foo;
Please confirm that this code on its own (well, the first SELECT, anyway) works for you. If it does then the error you are getting is from a different non-numeric character in a different row (and if it doesn't then perhaps you have a build where a particular bug hasn't been fixed). To try and narrow it down you can take random values from the following query and then loop through the characters:
SELECT UserID, CONVERT(VARBINARY(32), UserID)
FROM dbo.[audit]
WHERE UserID LIKE '%[^0-9]%';
So take a random row, and then paste the output into a query like this:
DECLARE #x VARCHAR(32), #i INT;
SET #x = CONVERT(VARCHAR(32), 0x...); -- paste the value here
SET #i = 1;
WHILE #i <= LEN(#x)
BEGIN
PRINT RTRIM(#i) + ' = ' + RTRIM(ASCII(SUBSTRING(#x, #i, 1)))
SET #i = #i + 1;
END
This may take some trial and error before you encounter a row that fails for some other reason than CHAR(0) - since you can't really filter out the rows that contain CHAR(0) because they could contain CHAR(0) and CHAR(something else). For all we know you have values in the table like:
SELECT '15' + CHAR(9) + '23' + CHAR(0);
...which also can't be converted to an integer, whether you've replaced CHAR(0) or not.
I know you don't want to hear it, but I am really glad this is painful for people, because now they have more war stories to push back when people make very poor decisions about data types.

This question has got 91,000 views so perhaps many people are looking for a more generic solution to the issue in the title "error converting varchar to INT"
If you are on SQL Server 2012+ one way of handling this invalid data is to use TRY_CAST
SELECT TRY_CAST (userID AS INT)
FROM audit
On previous versions you could use
SELECT CASE
WHEN ISNUMERIC(RTRIM(userID) + '.0e0') = 1
AND LEN(userID) <= 11
THEN CAST(userID AS INT)
END
FROM audit
Both return NULL if the value cannot be cast.
In the specific case that you have in your question with known bad values I would use the following however.
CAST(REPLACE(userID COLLATE Latin1_General_Bin, CHAR(0),'') AS INT)
Trying to replace the null character is often problematic except if using a binary collation.

This is more for someone Searching for a result, than the original post-er. This worked for me...
declare #value varchar(max) = 'sad';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 0
declare #value varchar(max) = '3';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 3

I would try triming the number to see what you get:
select len(rtrim(ltrim(userid))) from audit
if that return the correct value then just do:
select convert(int, rtrim(ltrim(userid))) from audit
if that doesn't return the correct value then I would do a replace to remove the empty space:
select convert(int, replace(userid, char(0), '')) from audit

This is how I solved the problem in my case:
First of all I made sure the column I need to convert to integer doesn't contain any spaces:
update data set col1 = TRIM(col1)
I also checked whether the column only contains numeric digits.
You can check it by:
select * from data where col1 like '%[^0-9]%' order by col1
If any nonnumeric values are present, you can save them to another table and remove them from the table you are working on.
select * into nonnumeric_data from data where col1 like '%[^0-9]%'
delete from data where col1 like '%[^0-9]%'
Problems with my data were the cases above. So after fixing them, I created a bigint variable and set the values of the varchar column to the integer column I created.
alter table data add int_col1 bigint
update data set int_col1 = CAST(col1 AS VARCHAR)
This worked for me, hope you find it useful as well.