Workaround for reproducable SQL Server bug when updating a CTE - sql-server

I've encountered an issue with SQL Server when using an updatable CTE when combining a view with a derived column and a table using system versioning.
It causes a stack dump and disconnects the session with the error:
Msg 596 Level 21 State 1 Line 0
Cannot continue the execution because the session is in the kill state.
Msg 0 Level 20 State 0 Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
I've spent some time getting to the bottom of the cause and am able to reproduce the error on any version of SQL Server.
My query is quite complex however I've boiled it down to the following few requirements:
Create two tables, one will be the target of an update, the other a source of data.
Create a view on the table containing source data.
The view must include a derived column eg select 0 as columnName
The table to update must have system versioning on
Define a CTE to select columns from the view and join to the target table
Update the CTE to set column in target table to the value of the derived column in view
BOOM
If the derived column in the view is replaced with a physical column, or system versioning is disabled, the update works.
It's reproducable and I can demonstrate it with this simple DB<>Fiddle
I'm looking to try and find a workaround. My actual situation is using the updatable CTE to select top N rows from the view of a staging table in order to batch-update a target table (avoiding lock escalation) with the staging table containing 500k - 1m+ rows.
Has anyone encountered this or can maybe think of a clever workaround / hack?

Thanks to some help from the comments, #lptr's suggestion to apply some sort of function to the offending columns turned out to be a valid workaround.
In the CTE that was selecting columns from the view which contained some derived column values I implemented a 1 * columnname as columnname and this made SQL Server happy.
The issue was just having these column in the view, regardless of whether they were used in an update or not.

Related

Why does the polybase pushdown filter for join not work?

Recently put polybase into production and have noticed that pushdown isn't working for joins. Take the following query:
CREATE TABLE #VendIDs
(VendorAN int primary key)
INSERT INTO #VendIDs(VendorAN)
VALUES (1),(89980),(89993),(90002),(90003),(90008),(90004),(90015),(90018),(97140),(97139),(97138)
,(97137),(97136),(97135),(97134),(97059),(97058),(97057),(97056),(97055),(97054),(97053),(97052)
SELECT VW.VendorAN, [Type], Code, [Name],Address1, Address2, City,State, Zip,Country, Phone,
ShipAddress1, ShipAddress2, ShipCity, ShipState, ShipZip,Latitude, Longitude
FROM vwVendor VW
JOIN #VendIDs FV ON VW.VendorAN = FV.VendorAN
The execution plan shows 22k rows from the 'remote query', which just happens to match the number of rows in the external table. After enabling trace flag 6408, it shows 22k records on the external side.
If I do a simply where vendorAN = XXXXXX, I can clearly see 1 row being returned via the remote query and the filtering be done on the remote side.
Anyone have a thought on why I'm not seeing pushdown filtering on the join as shown above? Makes no sense to me based upon what I've read to date.
Referencing this article for how to know if pushdown filtering is occurring: https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-how-to-tell-pushdown-computation?view=sql-server-ver15#--use-tf6408
Is your external table created with PUSHDOWN=ON or your query uses OPTION(ENABLE EXTERNALPUSHDOWN)?
Have you created statistics from the external table?
Is the remote table partitioned?
How is vwVendor created? Is it a view on the external table joined to other tables?
You also need to take a look at sys.dm_exec_distributed_request_steps and sys.dm_exec_distributed_sql_requests to see what is occurring under the hood.

Transaction replication with horizontal filtering

To make a long story short, is it possible to replicate rows of a table in SQL server with horizontal filtering function that being evaluated continuously?
For instance, I need to replicate a table rows to subscriber which are created or updated since last two days or more. I need any rows that are being created in source table but their creation date is older than two days get replicated to subscriber and this get done continuously on any newly created/updated rows. I mean that I don't need to replicate records that are newer than two days.
I have tried transaction replication with filtering function on SQL server 2017, but filtering function just get evaluated on replication creation time and after that, any new rows didn't get propagated to subscriber.
Add a column to your table: Alter Table yourTable Add Old_Enough Bit Not Null Default 0
Create a job that runs regularly (e.g. hourly) and runs Update yourTable Set Old_Enough = 1 Where Old_Enough = 0 And DateAdd(Day, 2, yourCreationDateColumn) < GetDate()
Create an indexed view that takes Select ... From yourTable Where Old_Enough = 1
Replicate your indexed view

Use of inserted and deleted tables for logging - is my concept sound?

I have a table with a simple identity column primary key. I have written a 'For Update' trigger that, among other things, is supposed to log the changes of certain columns to a log table. Needless to say, this is the first time I've tried this.
Essentially as follows:
Declare Cursor1 Cursor for
select a.*, b.*
from inserted a
inner join deleted b on a.OrderItemId = b.OrderItemId
(where OrderItemId is the actual name of the primary identity key).
I then do the usual open the cursor and go into a fetch next loop. With the columns I want to test, I do:
if Update(Field1)
begin
..... do some logging
end
The columns include varchars, bits, and datetimes. It works, sometimes. The problem is that the log function is writing the a and b values of the field to a log and in some cases, it appears that the before and after values are identical.
I have 2 questions:
Am I using the Update function correctly?
Am I accessing the before and after values correctly?
Is there a better way?
If you are using SQL Server 2016 or higher, I would recommend skipping this trigger entirely and instead using system-versioned temporal tables.
Not only will it eliminate the need for (and performance issues around) the trigger, it'll be easier to query the historical data.

How to insert data into a table such that possible extra columns in data get added to the parent table?

I'm trying to insert daily imported data into a SQL Server (2017) table. While most of the time the imported data has a fixed amount of columns, sometimes the client wants to add a new column to the data-to-be-imported.
I'm seeking for a solution that when the data gets imported (whether it is from another table, from R or from .csv's, don't mind this), SQL would automatically add the missing (extra) column to the parent table, providing the column name and assigning NULL to all previous entries.
I've tried with both UNION ALL and BULK INSERT, but both of these require the same # of columns. I'm working with SSMS2017, R3.4.1.
Next, I tried with a staging table and modifying the UNION clause as:
SELECT * FROM Table_new
UNION ALL
SELECT Tp.*, '' FROM Table_parent Tp;
But more often than not the extra column doesn't occur, so the column dimension problem occurs again.
I also thought about running the queries from R with DBI and odbc dbWriteTable() and handling the invalid column error with TryCatch(), parsing the column name from the error message and so on, but this would be a shakiest craft I've ever done and would prefer not to.
Ultimately I thought adding an if clause in R, and depending on the number of added new columns, loop and add the ', ""' part to the SQL query to create the extra columns. I'm convinced that this is too complex solution to this problem.
# Pseudo-R
#calculate the difference between lenght(colnames)
diff <- diff(length(colnames_new, colnames_parent)
if diff = 0 {
dbQuery(BULK INSERT INTO old SELECT * FROM new;)
} else if diff > 0 {
dbQuery(paste0(SELECT * FROM new
UNION ALL
SELECT T1.*, loop_paste(, '' /* for every diff */), FROM parent T1;))
} else if diff < 0 {
dbQuery(SELECT * FROM parent
UNION ALL
SELECT T2.*, loop_paste(, '' /* for every diff */), FROM new T2;))
}
To summarize: when inserting data to SQL table, how to (automatically) append the columns in the parent table, when necessary? Thanks!
The things in your database such as tables, columns, primary keys, foreign keys, check clauses are all part of the database schema. People design the schema before adding data to the database.
If you want to add new columns then you have to redesign your schema. When you do this you will also have to rewrite some of the CRUD procedures.

How to use a recursive CTE in a check constraint?

I'm trying to create a check constraint on a table so that ParentID is never a descendant of current record
For instance, I have a table Categories, with the following fields ID, Name, ParentID
I have the following CTE
WITH Children AS (SELECT ID AS AncestorID, ID, ParentID AS NextAncestorID FROM Categories UNION ALL SELECT Categories.ID, Children.ID, Categories.ParentID FROM Categories JOIN Children ON Categories.ID = Children.NextAncestorID) SELECT ID FROM Children where AncestorID =99
The results here are correct, but when I try to add it as a constraint to the table like this:
ALTER TABLE dbo.Categories ADD CONSTRAINT CK_Categories CHECK (ParentID NOT IN(WITH Children AS (SELECT ID AS AncestorID, ID, ParentID AS NextAncestorID FROM Categories UNION ALL SELECT Categories.ID, Children.ID, Categories.ParentID FROM Categories JOIN Children ON Categories.ID = Children.NextAncestorID) SELECT ID FROM Children where AncestorID =ID))
I get the following error:
Incorrect syntax near the keyword 'with'. If this statement is a
common table expression, an xmlnamespaces clause or a change tracking
context clause, the previous statement must be terminated with a
semicolon.
Adding a semicolon before the WITH, didn't help.
What would be the correct way to do this?
Thanks!
Per the SQL Server documentation on column constraints:
CHECK
Is a constraint that enforces domain integrity by limiting the possible values that can be entered into a column or columns.
logical_expression
Is a logical expression used in a CHECK constraint and returns TRUE or FALSE. logical_expression used with CHECK constraints cannot reference another table but can reference other columns in the same table for the same row. The expression cannot reference an alias data type.
(The above was quoted from the SQL Server 2017 version of the documentation, but the general principle applies to all previous versions as well, and you didn't state what version you are working with.)
The important part here is the "cannot reference another table, but can reference other columns in the same table for the same row" (emphasis added). A CTE would count as another table.
As such, you can't have a complex query like a CTE used for a CHECK constraint.
Like Saman suggested, if you want to check against existing data in other rows, and it must be in the DB layer, you could do it as a trigger.
However, triggers have their own drawbacks (e.g. issues with discoverability, behavior that is unexpected by those who are unaware of the trigger's presence).
As Sami suggested in their comment, another option is a UDF, but that's not w/o its own issues, potentially with both performance and stability according to the answers on this question about this approach in SQL Server 2008. It's probably still applicable to later versions as well.
If possible, I would say it's usually best to move that logic into the application layer. I see from your comment that you already have "client-side" validation. If there is an app server between that client and the database server (such as in a web app), I would suggest putting that additional validation there (in the app server) instead of in the database.

Resources