MSSQL aggregate ignoring where clause - sql-server

I have a strange problem that when performing an aggregate function on a type cast varchar column I receive an "Msg 8114, Level 16, State 5, Line 1. Error converting data type nvarchar to bigint." The queries where clause should filter out the non-numeric values.
Table structure is similar to this:
IF EXISTS (SELECT * FROM sys.all_objects ao WHERE ao.name = 'Identifier' AND ao.type = 'U') BEGIN DROP TABLE Identifier END
IF EXISTS (SELECT * FROM sys.all_objects ao WHERE ao.name = 'IdentifierType' AND ao.type = 'U') BEGIN DROP TABLE IdentifierType END
CREATE TABLE IdentifierType
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Style] [int] NULL,
CONSTRAINT [PK_IdentifierType_ID] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
CREATE TABLE Identifier
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[IdentifierTypeID] [int] NOT NULL,
[Value] [nvarchar](4000) NOT NULL,
CONSTRAINT [PK_Identifier_ID] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
ALTER TABLE Identifier WITH CHECK ADD CONSTRAINT [FK_Identifier_IdentifierTypeID] FOREIGN KEY([IdentifierTypeID]) REFERENCES IdentifierType ([ID])
GO
Identifier.Value is a VARCHAR column, it can and does contain non-numeric data. Filtering the query to IdentifierType.Style = 0 should mean that 'Value' only returns string representations of integers. The query below fails with "Msg 8114, Level 16, State 5, Line 1. Error converting data type nvarchar to bigint."
SELECT
MAX(CAST(Value AS BIGINT))
FROM
Identifier i,
IdentifierType it
WHERE
i.IdentifierTypeID = it.ID AND
it.Style = 0
If i extend the WHERE clause to include a 'AND ISNUMERIC(i.Value) = 1' it will return the maximum integer value. That to me implies that there is a non-numeric string in my result set. Yet i get no rows returned from this:
SELECT
*
FROM
Identifier i,
IdentifierType it
WHERE
i.IdentifierTypeID = it.ID AND
it.Style = 0 AND
ISNUMERIC(i.Value) <> 1
I've been unable to identity the row(s) that are tripping the type cast. The above query should have exposed the exceptional rows. In addition, there are no empty or extremely long strings either (the largest string is 6 character long)
Is it possible that MSSQL is attempting to do the CAST on all rows rather than filtering via the WHERE clause first?
Or has anyone else seen anything similar?
There is a second work around which is instantiating the component of the query into a temp table, and then selecting the MAX value from that.
SELECT
Value
INTO
IdentifierClone
FROM
Identifier i,
IdentifierType it
WHERE
i.IdentifierTypeID = it.ID AND
it.Style = 0
SELECT MAX(CAST(Value as BIGINT)) FROM IdentifierClone
A subquery doesn't work however.
Any help or thoughts would be appreciated.

Try using a REGEX expression to find the problem record. Here's an example where ISNUMERIC does not detect the problem but the regex expression does
CREATE TABLE tst (value nvarchar(4000))
INSERT INTO tst select '£'
-- Record found ...
SELECT * FROM tst WHERE value NOT LIKE '%[0-9]%'
-- No record found ...
SELECT * from tst where isnumeric(value) <> 1

Related

Conversion failed when converting the varchar value '______' to data type int

USE MASTER
GO
CREATE DATABASE db_movies;
GO
USE db_movies;
GO
CREATE TABLE tbl_movies
(
movie_id INT PRIMARY KEY NOT NULL IDENTITY (1,1),
movie_name VARCHAR(50) NOT NULL
);
INSERT INTO tbl_movies (movie_name)
VALUES ('Jurassic Park'), ('Star Wars'), ('Blade Runner');
CREATE TABLE tbl_genre
(
genre_id INT PRIMARY KEY NOT NULL IDENTITY (100,1),
genre_name VARCHAR(50) NOT NULL
);
INSERT INTO tbl_genre (genre_name)
VALUES ('Sci-Fi'), ('Thriller'), ('Horror');
CREATE TABLE tbl_movielist
(
MovieID INT PRIMARY KEY NOT NULL IDENTITY (1000,1),
MovieName VARCHAR (50) NOT NULL,
Movie_identification INT NOT NULL
CONSTRAINT fk_movie_id
FOREIGN KEY REFERENCES tbl_movies(movie_id)
ON UPDATE CASCADE ON DELETE CASCADE,
Genre_identification INT NOT NULL
CONSTRAINT fk_genre_id
FOREIGN KEY REFERENCES tbl_genre(genre_id)
ON UPDATE CASCADE ON DELETE CASCADE,
rating FLOAT(3) NOT NULL,
);
INSERT INTO tbl_movielist (MovieName, Movie_identification, Genre_identification, rating)
VALUES ('Sandlot', 10, 109, 7.80),
('Knives Out', 11, 110, 7.90),
('The Notebook', 12, 111, 7.80);
INSERT INTO tbl_genre (genre_name)
VALUES ('Comedy'), ('Mystery'), ('Drama');
INSERT INTO tbl_movies(movie_name)
VALUES ('Sandlot'), ('Knives Out'), ('The Notebook');
SELECT * FROM tbl_genre;
SELECT * FROM tbl_movies;
SELECT * FROM tbl_movielist;
SELECT *
FROM tbl_movies
INNER JOIN tbl_genre ON CONVERT(int, tbl_genre.genre_id) = tbl_movies.movie_name;
I have created a database and for some reason on my final two lines, where I am using the INNER JOIN statement it says "Conversion failed when converting the varchar value 'Jurassic Park' to data type int. Now I have tried using the CAST function and CONVERT function, as well as changing the tbl and attributes I wanted to INNER JOIN. It either says the error message above, or when I do get no error message and it prints a table, there is no data in the tables. Can not figure out why, I am pretty new to SQL.

How to use MERGE-statement with VARBINARY data

I'm stuck trying to figure out how to get one of the MERGE statements to work. See below code snippet:
DECLARE #PipelineRunID VARCHAR(100) = 'testestestestest'
MERGE [TGT].[AW_Production_Culture] as [Target]
USING [SRC].[AW_Production_Culture] as [Source]
ON [Target].[MD5Key] = [Source].[MD5Key]
WHEN MATCHED AND [Target].[MD5Others] != [Source].[MD5Others]
THEN UPDATE SET
[Target].[CultureID] = [Source].[CultureID]
,[Target].[ModifiedDate] = [Source].[ModifiedDate]
,[Target].[Name] = [Source].[Name]
,[Target].[MD5Others] = [Source].[MD5Others]
,[Target].[PipelineRunID] = #PipelineRunID
WHEN NOT MATCHED BY TARGET THEN
INSERT VALUES (
[Source].[AW_Production_CultureKey]
,[Source].[CultureID]
,[Source].[ModifiedDate]
,[Source].[Name]
,#PipelineRunID
,[Source].[MD5Key]
,[Source].[MD5Others]);
When I try and run this query I receive the following error:
Msg 257, Level 16, State 3, Line 16
Implicit conversion from data type varchar to varbinary is not allowed. Use the CONVERT function to run this query.
The only VARBINARY column types are MD5Key and MD5Others. As they are both linked to their corresponding columns I don't understand why my error message indicates there is a VARCHAR problem involved. Does anybody understand how and why I should use a CONVERT() function here?
Thanks!
--EDIT: Schema definitions
CREATE VIEW [SRC].[AW_Production_Culture]
WITH SCHEMABINDING
as
SELECT
CAST(CONCAT('',[CultureID]) as VARCHAR(100)) as [AW_Production_CultureKey]
,CAST(HASHBYTES('MD5',CONCAT('',[CultureID])) as VARBINARY(16)) as [MD5Key]
,CAST(HASHBYTES('MD5',CONCAT([ModifiedDate],'|',[Name])) as VARBINARY(16)) as [MD5Others]
,[CultureID],[ModifiedDate],[Name]
FROM
[SRC].[tbl_AW_Production_Culture]
CREATE TABLE [TGT].[AW_Production_Culture](
[AW_Production_CultureKey] [varchar](100) NOT NULL,
[CultureID] [nchar](6) NULL,
[ModifiedDate] [datetime] NULL,
[Name] [nvarchar](50) NULL,
[MD5Key] [varbinary](16) NOT NULL,
[MD5Others] [varbinary](16) NOT NULL,
[RecordValidFrom] [datetime2](7) GENERATED ALWAYS AS ROW START NOT NULL,
[RecordValidUntil] [datetime2](7) GENERATED ALWAYS AS ROW END NOT NULL,
[PipelineRunID] [varchar](36) NOT NULL,
PRIMARY KEY CLUSTERED
(
[MD5Key] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY],
PERIOD FOR SYSTEM_TIME ([RecordValidFrom], [RecordValidUntil])
) ON [PRIMARY]
WITH
(
SYSTEM_VERSIONING = ON ( HISTORY_TABLE = [TGT].[AW_Production_Culture_History] )
)
Reposting my comment as an answer for the sweet, sweet, internet points:
You're getting that error because your varbinary value is being inserted into a varchar column. As your columns have the correct types already then it means your INSERT clause has mismatched columns.
As it is, your MERGE statement is not explicitly listing the destination columns - you should always explicitly list columns in production code so that your DML queries won't break if columns are added or reordered or marked HIDDEN.
So to fix this, change your INSERT clause to explicitly list destination column names.
Also, when using MERGE you should use HOLDLOCK (Or a more suitable lock, if applicable) - otherwise you’ll run into concurrency issues. MERGE is not concurrency-safe by default!
Minor nit-picks that are largely subjective:
I personally prefer avoiding [escapedName] wherever possible and prefer using short table aliases.
e.g. use s and t instead of [Source] and [Target].
"Id" (for "identity" or "identifier") is an abbreviation, not an acronym - so it should be cased as Id and not ID.
Consider using an OUTPUT clause to help diagnose/debug issues too.
So I'd write it like so:
DECLARE #PipelineRunId VARCHAR(100) = 'testestestestest'
MERGE INTO
tgt.AW_Production_Culture WITH (HOLDLOCK) AS t
USING
src.AW_Production_Culture AS s ON t.MD5Key = s.MD5Key
WHEN MATCHED AND t.MD5Others != s.MD5Others THEN UPDATE SET
t.CultureId = s.CultureId,
t.ModifiedDate = s.ModifiedDate,
t.Name = s.Name,
t.MD5Others = s.MD5Others,
t.PipelineRunID = #PipelineRunId
WHEN NOT MATCHED BY TARGET THEN INSERT
(
AW_Production_CultureKey,
CultureId,
ModifiedDate,
[Name],
PipelineRunId,
MD5Key,
MD5Others
)
VALUES
(
s.AW_Production_CultureKey,
s.CultureId,
s.ModifiedDate,
s.[Name],
#PipelineRunId,
s.MD5Key,
s.MD5Others
)
OUTPUT
$action AS [Action],
inserted.*,
deleted.*;

Temp tables, Column name or number of supplied values does not match table definition

Even though this tends to look as a duplicate, I had to post it as I can't seem to spot the error.
I don't know if I am mad or what but I can't seem to spot why there is a mismatch in the number of supplied values.
Here are they:
CREATE TABLE #TIPSTOPE_TS
(
TIP INT NULL,
SIFVAL VARCHAR(5),
GRUPA INT NULL,
DATUMOD VARCHAR(15),
PASIVNA DECIMAL(15,4) NULL DEFAULT(0),
REDOVNA DECIMAL(15,4) NULL DEFAULT(0),
ZATEZNA DECIMAL(15,4) NULL DEFAULT(0),
STOPA DECIMAL(15,4) NULL DEFAULT(0),
DATUMDO VARCHAR(15),
KONTO VARCHAR(15),
)
INSERT INTO #TIPSTOPE_TS
SELECT TS.TIP,
TS.SIFVAL,
TS.GRUPA,
CASE WHEN ISDATE(MAX(TS.DATUMOD)) = 0 THEN '2017.12.31' ELSE MAX(TS.DATUMOD) END AS DATUMOD,
CAST (2 AS DECIMAL(10,4)) AS PASIVNA,
CAST (1 AS DECIMAL(10,4)) AS REDOVNA,
CAST (3 AS DECIMAL(10,4)) AS ZATEZNA,
TS.REDOVNA,
TS.DATUMDO,
TP.M1 AS KONTO
FROM TIPSTOPE TS WITH(NOLOCK)
JOIN TIPPART TP WITH(NOLOCK) ON TP.TIP = TS.TIP
WHERE TS.DATUMOD <= '2017.12.31'
GROUP BY TS.TIP,TS.SIFVAL,TS.GRUPA,TP.M1,TS.DATUMDO,TS.REDOVNA
CREATE NONCLUSTERED INDEX IX_TIPSTOPE_TS ON #TIPSTOPE_TS (TIP, GRUPA, SIFVAL)
INCLUDE (DATUMOD)
And the second one...
CREATE TABLE #UNPVT_TIPSTOPE_TS
(
TIP INT NULL,
SIFVAL VARCHAR(5) NULL,
GRUPA INT NULL,
DATUMOD VARCHAR(10) NULL,
TIP_KS VARCHAR(15) NULL,
KAMATNA_STOPA DECIMAL(15,4) NULL DEFAULT(0),
DATUMDO VARCHAR(10) NULL,
)
INSERT INTO #UNPVT_TIPSOPE_TS
SELECT TIP, SIFVAL, GRUPA, DATUMOD, TIP_KS, KAMATNA_STOPA,DATUMDO
FROM
(
SELECT TIP, SIFVAL, GRUPA, DATUMOD, ISNULL(REDOVNA,0) AS REDOVNA, ISNULL(PASIVNA,0) AS PASIVNA, ISNULL(ZATEZNA,0) AS ZATEZNA,STOPA,DATUMDO
FROM #TIPSTOPE_TS
) P
UNPIVOT (KAMATNA_STOPA FOR TIP_KS IN (REDOVNA, PASIVNA, ZATEZNA)) AS UNPVT
The second temp tables is taking data from the first one.
When I try to create the second one error is thrown:
Insert error: Column name or number of supplied values does not match table definition
You are specifying the exact number of values that are needed. If you copy the whole code in new query window and execute it, it will work. Or in your current window drop the table table:
DROP TABLE #TIPSTOPE_TS;
DROP TABLE #UNPVT_TIPSTOPE_TS;
I mean execute only the above statements, and the execute the rest of the code. It should work again.
Sometime, when are debugging we forgot that the temporary table meta data is cached. For example, you can have the following code:
DROP TABLE IF EXISTS #TEST;
CREATE TABLE #TEST
(
[A] INT
);
INSERT INTO #TEST ([A])
SELECT 1;
And its valid. If we change it to this:
DROP TABLE IF EXISTS #TEST;
CREATE TABLE #TEST
(
[A] INT
,[B] INT
);
INSERT INTO #TEST ([A], [B])
SELECT 1, 2;
We will get:
Msg 207, Level 16, State 1, Line 9 Invalid column name 'B'.
Because, in the current session the #TEST table already exists and the engine is able to check that the B column does not exists. So, we need to drop the table manually, after the columns are changed, or we need to drop the tables at the end of our code statements.

Sql Server string interning

We have a table where we store all the exceptions (message, stackTrace, etc..), the table is getting big and we would like to reduce it.
There are plenty of repeated StackTraces, Messages, etc, but enabling compression produces a modest size reduction (10%) while I think much bigger benefits could come if somehow Sql Server will intern the strings in some per-column hash-table.
I could get some of the benefits if I normalize the table and extract StackTraces to another one, but exception messages, exception types, etc.. are also repeated.
Is there a way to enable string interning for some column in Sql Server?
There is no built-in way to do this. You could easily do something like:
SELECT MessageID = IDENTITY(INT, 1, 1), Message
INTO dbo.Messages
FROM dbo.HugeTable GROUP BY Message;
ALTER TABLE dbo.HugeTable ADD MessageID INT;
UPDATE h
SET h.MessageID = m.MessageID
FROM dbo.HugeTable AS h
INNER JOIN dbo.Messages AS m
ON h.Message = m.Message;
ALTER TABLE dbo.HugeTable DROP COLUMN Message;
Now you'll need to do a few things:
Change your logging procedure to perform an upsert to the Messages table
Add proper indexes to the messages table (wasn't sure of Message data type) and PK
Add FK to MessageID column
Rebuild indexes on HugeTable to reclaim space
Do this in a test environment first!
Aaron's posting answers the questions of adding interning to a table, but afterwards you will need to modify your application code and stored-procedures to work with the new schema.
...or so you might think. You can actually create a VIEW that returns data matching the old schema, and you can also support INSERT operations on the view too, which are translated into child operations on the Messages and HugeTable tables. For readability I'll use the names InternedStrings and ExceptionLogs for the tables.
So if the old table was this:
CREATE TABLE ExceptionLogs (
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message nvarchar(1024) NOT NULL,
ExceptionType nvarchar(512) NOT NULL,
StackTrace nvarchar(4096) NOT NULL
)
And the new tables are:
CREATE TABLE InternedStrings (
StringId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Value nvarchar(max) NOT NULL
)
CREATE TABLE ExceptionLogs2 ( -- note the new name
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message int NOT NULL,
ExceptionType int NOT NULL,
StackTrace int NOT NULL
)
Add an index to InternedStrings to make the value lookups faster:
CREATE UNIQUE NONCLUSTERED INDEX IX_U_InternedStrings_Value ON InternedStrings ( Value ASC )
Then you would also have a VIEW:
CREATE VIEW ExeptionLogs AS
SELECT
LogId,
MessageStrings .Value AS Message,
ExceptionTypeStrings.Value AS ExceptionType,
StackTraceStrings .Value AS StackTrace
FROM
ExceptionLogs2
INNER JOIN InternedStrings AS MessageStrings ON
MessageStrings.StringId = ExceptionLogs2.Message
INNER JOIN InternedStrings AS ExceptionTypeStrings ON
ExceptionTypeStrings.StringId = ExceptionLogs2.ExceptionType
INNER JOIN InternedStrings AS StackTraceStrings ON
StackTraceStrings.StringId = ExceptionLogs2.StackTrace
And to handle INSERT operations from unmodified clients:
CREATE TRIGGER ExceptionLogsInsertHandler
ON ExceptionLogs INSTEAD OF INSERT AS
DECLARE #messageId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.Message
IF #messageId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.Message )
SET #messageId = SCOPE_IDENTITY()
END
DECLARE #exceptionTypeId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.ExceptionType
IF #exceptionTypeId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.ExceptionType )
SET #exceptionTypeId = SCOPE_IDENTITY()
END
DECLARE #stackTraceId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.StackTrace
IF #stackTraceId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.StackTrace )
SET #stackTraceId = SCOPE_IDENTITY()
END
INSERT INTO ExceptionLogs2 ( Message, ExceptionType, StackTrace )
VALUES ( #messageId, #exceptionTypeId, #stackTraceId )
Note this TRIGGER can be improved: it only supports single-row insertions, and is not entirely concurrency-safe, though because previous data won't be mutated it means that there's a slight risk of data duplication in the InternedStrings table - and because of a UNIQUE index the insert will fail. There are different possible ways to handle this, such as using a TRANSACTION and changing the queries to use holdlock and updlock.

Is it possible to a db constraint in for this rule?

I wish to make sure that my data has a constraint the following check (constraint?) in place
This table can only have one BorderColour per hub/category. (eg. #FFAABB)
But it can have multiple nulls. (all the other rows are nulls, for this field)
Table Schema
ArticleId INT PRIMARY KEY NOT NULL IDENTITY
HubId TINYINT NOT NULL
CategoryId INT NOT NULL
Title NVARCHAR(100) NOT NULL
Content NVARCHAR(MAX) NOT NULL
BorderColour VARCHAR(7) -- Can be nullable.
I'm gussing I would have to make a check constraint? But i'm not sure how, etc.
sample data.
1, 1, 1, 'test', 'blah...', '#FFAACC'
1, 1, 1, 'test2', 'sfsd', NULL
1, 1, 2, 'Test3', 'sdfsd dsf s', NULL
1, 1, 2, 'Test4', 'sfsdsss', '#AABBCC'
now .. if i add the following line, i should get some sql error....
INSERT INTO tblArticle VALUES (1, 2, 'aaa', 'bbb', '#ABABAB')
any ideas?
CHECK constraints are ordinarily applied to a single row, however, you can cheat using a UDF:
CREATE FUNCTION dbo.CheckSingleBorderColorPerHubCategory
(
#HubID tinyint,
#CategoryID int
)
RETURNS BIT
AS BEGIN
RETURN CASE
WHEN EXISTS
(
SELECT HubID, CategoryID, COUNT(*) AS BorderColorCount
FROM Articles
WHERE HubID = #HubID
AND CategoryID = #CategoryID
AND BorderColor IS NOT NULL
GROUP BY HubID, CategoryID
HAVING COUNT(*) > 1
) THEN 1
ELSE 0
END
END
Then create the constraint and reference the UDF:
ALTER TABLE Articles
ADD CONSTRAINT CK_Articles_SingleBorderColorPerHubCategory
CHECK (dbo.CheckSingleBorderColorPerHubCategory(HubID, CategoryID) = 1)
Another option that is available is available if you are running SQL2008. This version of SQL has a feature called filtered indexes.
Using this feature you can create a unique index that includes all rows except those where BorderColour is null.
CREATE TABLE [dbo].[UniqueExceptNulls](
[HubId] [tinyint] NOT NULL,
[CategoryId] [int] NOT NULL,
[BorderColour] [varchar](7) NULL,
)
GO
CREATE UNIQUE NONCLUSTERED INDEX UI_UniqueExceptNulls
ON [UniqueExceptNulls] (HubID,CategoryID)
WHERE BorderColour IS NOT NULL
This approach is cleaner than the approach in my other answer because it doesn't require creating extra computed columns. It also doesn't require you to have a unique column in the table, although you should have that anyway.
Finally, it will also be much faster than the UDF/Check Constraint solutions.
You can also do a trigger with something like this (this is actually overkill - you can make it cleaner by assuming the database is already in a valid state - i.e. UNION instead of UNION all etc):
IF EXISTS (
SELECT COUNT(BorderColour)
FROM (
SELECT INSERTED.HubId, INSERTED.CategoryId, INSERTED.BorderColour
UNION ALL
SELECT HubId, CategoryId, BorderColour
FROM tblArticle
WHERE EXISTS (
SELECT *
FROM INSERTED
WHERE tblArticle.HubId = INSERTED.HubId
AND tblArticle.CategoryId = INSERTED.CategoryId
)
) AS X
GROUP BY HubId, CategoryId
HAVING COUNT(BorderColour) > 1
)
RAISEERROR
If you have a unique column in your table, then you can accomplish this by creating a unique constraint on a computer column.
The following sample created a table that behaved as you described in your requirements and should perform better than a UDF based check constraint. You might also be able to improve the performance further by making the computed column persisted.
CREATE TABLE [dbo].[UQTest](
[Id] INT IDENTITY(1,1) NOT NULL,
[HubId] TINYINT NOT NULL,
[CategoryId] INT NOT NULL,
[BorderColour] varchar(7) NULL,
[BorderColourUNQ] AS (CASE WHEN [BorderColour] IS NULL
THEN cast([ID] as varchar(50))
ELSE cast([HuBID] as varchar(3)) + '_' +
cast([CategoryID] as varchar(20)) END
),
CONSTRAINT [UQTest_Unique]
UNIQUE ([BorderColourUNQ])
)
The one possibly undesirable facet of the above implementation is that it allows a category/hub to have both a Null AND a color defined. If this is a problem, let me know and I'll tweak my answer to address that.
PS: Sorry about my previous (incorrect) answer. I didn't read the question closely enough.

Resources