Transaction replication with horizontal filtering - sql-server

To make a long story short, is it possible to replicate rows of a table in SQL server with horizontal filtering function that being evaluated continuously?
For instance, I need to replicate a table rows to subscriber which are created or updated since last two days or more. I need any rows that are being created in source table but their creation date is older than two days get replicated to subscriber and this get done continuously on any newly created/updated rows. I mean that I don't need to replicate records that are newer than two days.
I have tried transaction replication with filtering function on SQL server 2017, but filtering function just get evaluated on replication creation time and after that, any new rows didn't get propagated to subscriber.

Add a column to your table: Alter Table yourTable Add Old_Enough Bit Not Null Default 0
Create a job that runs regularly (e.g. hourly) and runs Update yourTable Set Old_Enough = 1 Where Old_Enough = 0 And DateAdd(Day, 2, yourCreationDateColumn) < GetDate()
Create an indexed view that takes Select ... From yourTable Where Old_Enough = 1
Replicate your indexed view

Related

Workaround for reproducable SQL Server bug when updating a CTE

I've encountered an issue with SQL Server when using an updatable CTE when combining a view with a derived column and a table using system versioning.
It causes a stack dump and disconnects the session with the error:
Msg 596 Level 21 State 1 Line 0
Cannot continue the execution because the session is in the kill state.
Msg 0 Level 20 State 0 Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
I've spent some time getting to the bottom of the cause and am able to reproduce the error on any version of SQL Server.
My query is quite complex however I've boiled it down to the following few requirements:
Create two tables, one will be the target of an update, the other a source of data.
Create a view on the table containing source data.
The view must include a derived column eg select 0 as columnName
The table to update must have system versioning on
Define a CTE to select columns from the view and join to the target table
Update the CTE to set column in target table to the value of the derived column in view
BOOM
If the derived column in the view is replaced with a physical column, or system versioning is disabled, the update works.
It's reproducable and I can demonstrate it with this simple DB<>Fiddle
I'm looking to try and find a workaround. My actual situation is using the updatable CTE to select top N rows from the view of a staging table in order to batch-update a target table (avoiding lock escalation) with the staging table containing 500k - 1m+ rows.
Has anyone encountered this or can maybe think of a clever workaround / hack?
Thanks to some help from the comments, #lptr's suggestion to apply some sort of function to the offending columns turned out to be a valid workaround.
In the CTE that was selecting columns from the view which contained some derived column values I implemented a 1 * columnname as columnname and this made SQL Server happy.
The issue was just having these column in the view, regardless of whether they were used in an update or not.

How to insert data into a table such that possible extra columns in data get added to the parent table?

I'm trying to insert daily imported data into a SQL Server (2017) table. While most of the time the imported data has a fixed amount of columns, sometimes the client wants to add a new column to the data-to-be-imported.
I'm seeking for a solution that when the data gets imported (whether it is from another table, from R or from .csv's, don't mind this), SQL would automatically add the missing (extra) column to the parent table, providing the column name and assigning NULL to all previous entries.
I've tried with both UNION ALL and BULK INSERT, but both of these require the same # of columns. I'm working with SSMS2017, R3.4.1.
Next, I tried with a staging table and modifying the UNION clause as:
SELECT * FROM Table_new
UNION ALL
SELECT Tp.*, '' FROM Table_parent Tp;
But more often than not the extra column doesn't occur, so the column dimension problem occurs again.
I also thought about running the queries from R with DBI and odbc dbWriteTable() and handling the invalid column error with TryCatch(), parsing the column name from the error message and so on, but this would be a shakiest craft I've ever done and would prefer not to.
Ultimately I thought adding an if clause in R, and depending on the number of added new columns, loop and add the ', ""' part to the SQL query to create the extra columns. I'm convinced that this is too complex solution to this problem.
# Pseudo-R
#calculate the difference between lenght(colnames)
diff <- diff(length(colnames_new, colnames_parent)
if diff = 0 {
dbQuery(BULK INSERT INTO old SELECT * FROM new;)
} else if diff > 0 {
dbQuery(paste0(SELECT * FROM new
UNION ALL
SELECT T1.*, loop_paste(, '' /* for every diff */), FROM parent T1;))
} else if diff < 0 {
dbQuery(SELECT * FROM parent
UNION ALL
SELECT T2.*, loop_paste(, '' /* for every diff */), FROM new T2;))
}
To summarize: when inserting data to SQL table, how to (automatically) append the columns in the parent table, when necessary? Thanks!
The things in your database such as tables, columns, primary keys, foreign keys, check clauses are all part of the database schema. People design the schema before adding data to the database.
If you want to add new columns then you have to redesign your schema. When you do this you will also have to rewrite some of the CRUD procedures.

SQL Server trigger failing for row inserts in quick succession

I have looked around on SO and found many similar questions:
SQL Server A trigger to work on multiple row inserts
SQL trigger multiple insert update
Trigger to handle multiple row inserts and updates
update multiple rows with trigger after insert (sql server)
Trigger not working when inserting multiple records
But I am still having issues with my trigger to update multiple rows when Inserting multiple rows into a table.
Outline of code
I have a Reservation table which has a ReservationID and TourComponentID columns respectively. When I insert into the reservations table I have the following trigger to update the TourComponent table with the ReservationID from the row just inserted into the reservation table with matching TourComponentID:
CREATE TRIGGER [TR_Reservation_CurrentReservation] ON [Reservation] AFTER INSERT AS
UPDATE tc
SET tc.[CurrentReservationId] = I.ReservationID
FROM [tour].[TourComponent] tc
JOIN INSERTED I on I.TourComponentID = tc.TourComponentID
End
This trigger works perfectly when updating one tourComponent to have a new reservation (inserting one row into the reservation table). However if I try update multiple tour components (inserting multiple rows into the reservation table to update multiple rows in the TourComponent table) only the first tour component gets updated, any rows.
Other answers and research has shown me that
Triggers are NOT executed once per row but rather as a set based
operation so executed only ONCE for the entire DML operation. So you
need to treat it like any other update date with join statement.
So I would have expected my joining on the INSERTED table to have handled multiple rows or have I misunderstood this?
Interestingly if I log the trigger variables for TourComponentID, ReservationID and INSERTED rowcount to a temp table foo I can see two records are inserted into my temp table, each with a rowcount of 1.
Using sql profiler to catch the actual sql executed at runtime and running this manually against the database I get two rows updated as desired. It is only when using Entity Framework to update the database ie running the application do I find only one row is updated.
I have tried logging the values to a table FOO in the trigger
INSERT INTO FOO (TourComponentID, ReservationID, Rowcounts )
SELECT i.TourComponentID, I.ReservationID, 1 --#ReservationId
FROM
INSERTED I
This logs two rows with a rowcount of 1 each time and the correct tourcomponentsID and reservationID but the TourComponent table still only has one row updated.
Any suggestions greatly appreciated.
UPDATE
Tour component ID's are passed as strings in an Ajax post to the MVC Action where tour component models are populated and then passed to be updated one at a time in the code
public void UpdateTourComponents(IEnumerable<TourComponent> tourComponents)
{
foreach (var tourComponent in tourComponents)
{
UpdateTourComponent(tourComponent);
}
}
here is the call to UpdateTourComponent
public int UpdateTourComponent(TourComponent tourComponent)
{
return TourComponentRepository.Update(tourComponent);
}
and the final call to Update
public virtual int Update(TObject TObject)
{
Dictionary<string, List<string>> newChildKeys;
return Update(TObject, null, out newChildKeys);
}
So the Inserts are happening one at a time, hence my trigger is being called once per TourComponent. This is why when I count the ##Rowcount in INSERTED and log to Foo I get value of 1. When I run the inserts manually I get the correct expected results so I would agree with #Slava Murygin tests that the issue is probably not with the trigger itself. I thought it might be a speed issue if we are firing the requests one after the other so I put a wait in the trigger and in the code but this did not fix it.
Update 2
I have used a sql profiler to capture the sql that is run when only the first insert triggers work.
Interestingly when the EXACT same sql is then run in SQL Management Studio the trigger works as expected and both tour components are updated with the reservation id.
Worth mentioning also that all constraints have been removed off all tables.
Any other suggestions what might be causing this issue?
You have different problem than that particular trigger. Try to look at the table name you are updating "[tour].[TourComponent]" or "[dbo].[TourComponent]".
I've tried your trigger and it perfectly works:
use TestDB
GO
IF object_id('Reservation') is not null DROP TABLE Reservation;
GO
IF object_id('TourComponent') is not null DROP TABLE TourComponent;
GO
CREATE TABLE Reservation (
ReservationID INT IDENTITY(1,1),
TourComponentID INT
);
GO
CREATE TABLE TourComponent (
CurrentReservationId INT,
TourComponentID INT
);
GO
CREATE TRIGGER [TR_Reservation_CurrentReservation] ON [Reservation] AFTER INSERT AS
UPDATE tc
SET tc.[CurrentReservationId] = I.ReservationID
FROM [TourComponent] tc
JOIN INSERTED I on I.TourComponentID = tc.TourComponentID
GO
INSERT INTO TourComponent(TourComponentID)
VALUES (1),(2),(3),(4),(5),(6)
GO
INSERT INTO Reservation(TourComponentID)
VALUES (1),(2),(3),(4),(5),(6)
GO
SELECT * FROM Reservation
SELECT * FROM TourComponent
So the underlying problem was down to Entity Framework.
this.Property(t => t.CurrentReservationId).HasColumnName("CurrentReservationId");
Is one property for the SQL Data access layer. This was being cached and was causing the data being read out of the db to not be the latest current, thus if we have an insert in the Reservations table the second insert will be overwritten by the cached values which in my case were NULL.
Changing the line to this resolves the problem and makes the trigger work as expected.
this.Property(t => t.CurrentReservationId).HasColumnName("CurrentReservationId").HasDatabaseGeneratedOption(DatabaseGeneratedOption.Computed);
See more info on HasDatabaseGeneratedOption

SSIS data flow - copy new data or update existing

I queried some data from table A(Source) based on certain condition and insert into temp table(Destination) before upsert into Crm.
If data already exist in Crm I dont want to query the data from table A and insert into temp table(I want this table to be empty) unless there is an update in that data or new data was created. So basically I want to query only new data or if there any modified data from table A which already existed in Crm. At the moment my data flow is like this.
clear temp table - delete sql statement
Query from source table A and insert into temp table.
From temp table insert into CRM using script component.
In source table A I have audit columns: createdOn and modifiedOn.
I found one way to do this. SSIS DataFlow - copy only changed and new records but no really clear on how to do so.
What is the best and simple way to achieve this.
The link you posted is basically saying to stage everything and use a MERGE to update your table (essentially an UPDATE/INSERT).
The only way I can really think of to make your process quicker (to a significant degree) by partially selecting from table A would be to add a "last updated" timestamp to table A and enforcing that it will always be up to date.
One way to do this is with a trigger; see here for an example.
You could then select based on that timestamp, perhaps keeping a record of the last timestamp used each time you run the SSIS package, and then adding a margin of safety to that.
Edit: I just saw that you already have a modifiedOn column, so you could use that as described above.
Examples:
There are a few different ways you could do it:
ONE
Include the modifiedOn column on in your final destination table.
You can then build a dynamic query for your data flow source in a SSIS string variable, something like:
"SELECT * FROM [table A] WHERE modifiedOn >= DATEADD(DAY, -1, '" + #[User::MaxModifiedOnDate] + "')"
#[User::MaxModifiedOnDate] (string variable) would come from an Execute SQL Task, where you would write the result of the following query to it:
SELECT FORMAT(CAST(MAX(modifiedOn) AS date), 'yyyy-MM-dd') MaxModifiedOnDate FROM DestinationTable
The DATEADD part, as well as the CAST to a certain degree, represent your margin of safety.
TWO
If this isn't an option, you could keep a data load history table that would tell you when you need to load from, e.g.:
CREATE TABLE DataLoadHistory
(
DataLoadID int PRIMARY KEY IDENTITY
, DataLoadStart datetime NOT NULL
, DataLoadEnd datetime
, Success bit NOT NULL
)
You would begin each data load with this (Execute SQL Task):
CREATE PROCEDURE BeginDataLoad
#DataLoadID int OUTPUT
AS
INSERT INTO DataLoadHistory
(
DataLoadStart
, Success
)
VALUES
(
GETDATE()
, 0
)
SELECT #DataLoadID = SCOPE_IDENTITY()
You would store the returned DataLoadID in a SSIS integer variable, and use it when the data load is complete as follows:
CREATE PROCEDURE DataLoadComplete
#DataLoadID int
AS
UPDATE DataLoadHistory
SET
DataLoadEnd = GETDATE()
, Success = 1
WHERE DataLoadID = #DataLoadID
When it comes to building your query for table A, you would do it the same way as before (with the dynamically generated SQL query), except MaxModifiedOnDate would come from the following query:
SELECT FORMAT(CAST(MAX(DataLoadStart) AS date), 'yyyy-MM-dd') MaxModifiedOnDate FROM DataLoadHistory WHERE Success = 1
So the DataLoadHistory table, rather than your destination table.
Note that this would fail on the first run, as there'd be no successful entries on the history table, so you'd need you insert a dummy record, or find some other way around it.
THREE
I've seen it done a lot where, say your data load is running every day, you would just stage the last 7 days, or something like that, some margin of safety that you're pretty sure will never be passed (because the process is being monitored for failures).
It's not my preferred option, but it is simple, and can work if you're confident in how well the process is being monitored.

How to retrieve the updated column values in sql server

I have "Order" table with more than 5,000 records. When I ran the update query unfortunately I forgot to give the ‘where’ condition.
Now the Date column in all the records has been updated. Is it possible to retrieve my existing column values.
Example:
Update Order
set ordered = getdate()
where Cusid=50
(here I forget to mention the where condition).
I am afraid NO, cause the update has already been committed. Unless you have a backup of the table it's no more can be undone.

Resources