SymemtricDS Multi-Tier Data Monitoring - symmetricds

I have 3 tier nodes (corp, store, device) and have item table each node.
Let say, I want to add monitoring data of item table to monitoring table. The monitoring table shows time and current_node of the data currently.
How to do centralized monitoring from the corp that tell the data is arrived to device?
Especially it doesn't edit the SymmetricDS' generated structure.
My past idea is requiring to edit the trigger. Any ideas?
The scenario is:
corp input the data
corp also input data monitor that tells time and current_node (corp)
The data is syncing to store
store edit the data monitor (time and current_node to store)
The data is syncing to device
device edit the data monitor (time and current_node to device)

Use custom_on_update and custom_on_insert of the sym_trigger table defining an item table trigger. Initialize it with an SQL script that will update/insert a row in the monitoring table with the changes of the item table. Of course declare a trigger on monitoring table to have it synced from a device to store and from store to corp.

Related

how to Copy from big table to another table in snowflake?

I have a 7TB+- table in snowflake, I want to pass half of that table to a new table. for example with a country filter. what technique would you recommend? insert into select * from TABLE where COUNTRY = 'A' or use snowpipe to send a parquet format to S3 an then copy into table into snowflake target table
I tried the first option. 5 hours after and the process was on 35%. I read a post where a guy had to scaling the cluster to XL instance. He read another post where snowpipe is the good option. my cluster is only a XS :(
by the way, I have Cluster key and the mission is segment the data by countries by company politics.
The original table is about events from the devices that have the app installed. 30 events per session minute, for example a Uber App or Lyft App
An MV will definitely be more performant than a standard view but there is an extra cost associated with that as Snowflake has to keep the MV in sync with the table. Sounds like the table will be rapidly changing so this cost will be continuous.
Another option is to create a stream on the source table and use a task to merge the stream data into the target table. Tasks require a running warehouse but I've found that an XS warehouse is very capable so minimum you're talking 24 credits per day. Tasks also have a minimum 1 minute interval so if you need bleeding edge, that might discount this option

Temporal Tables Manually Update Data

Using SQL Server 2019, can I push data (snapshot data) from the Current (Temporal Table) to the History Table only when I want to rather than it happening automatically after every row commit? I understand that Temporal Tables are designed to record all data changes to a row - great for auditing. But what if I don't want to save all changes? What If I only want to 'baseline' data on a set of tables every week, (or when the user wants to) and I don't care what changes are made during the week? I know you can disable and enable the temporal tables, but that is more of a high level control, and the architecture is multi-tenanted,and different tenants will snapshot at different times.
Or perhaps Temporal Tables is the wrong tool for me? My use case is as follows - A user creates a mathematical model altering many parameters, they do this many times over many days, persisting to the database with every change. When they get it right they press 'Baseline' Everything is stored. They then continue with the next changes to the next baseline. At any point they can compare the difference between any two baselines. I only retain the data at the date of 'Baseline'. This would require that I move the data to the temporal history table manually..or let it go automatically and purge everything in between two baselines, seems a waste of DB resources.

SQL Change Data Capture get all tables

I need to audit data in all tables in database. I use SQL Server 2016. I enabled Change Data Capture for all tables.
How to get changes from all tables chronologically?
Basically the Change Data Capture creates system tables in [cdc] schema to capture change events for each table. something like cdc.[TableSchemaName]_[TableName]_CT, this table will have all the changes done to your actual table in the chronological order. It is basically the data read from DB's transaction log file.
Another point - you need to query Maximum Lsn for the database at any point of time and also the minimum LSN for the table for which you want to read change data. The records between min and max LSN should give you the total changes for a table.
Reference link below:
https://learn.microsoft.com/en-us/sql/relational-databases/system-functions/cdc-fn-cdc-get-all-changes-capture-instance-transact-sql

SSIS and CDC - Incorrect state at end of "Mark Processed Range"

The Problem
I currently have CDC running on a table named subscription_events. The corresponding CT table is being populated with new inserts, updates, and deletes.
I have two SSIS flows that move data from subscription_events into another table in a different database. The first flow is the initial flow and has the following layout:
The Import Rows Into Vertica step simply has a source and a destination and copies every row into another table. As a note, the source table is currently active and has new rows flowing into it every few minutes. The Mark Initial Load Start/End steps store the current state in a variable and that is stored in a separate table meant for storing CDC names and states.
The second flow is the incremental flow and has the following layout:
The Import Rows Into Vertica step uses a CDC source and should pull the latest inserts, updates, and deletes from the CT table and these should be applied to the destination. Here is where the problem resides; I never receive anything from the CDC source, even though there are new rows being inserted into the subscription_events table and the corresponding CT table is growing in size with new change data.
To my understanding, this is how things should work:
Mark Initial Load Start
CDC State should be ILSTART
Data Flow
Mark Initial Load End
CDC State should be ILEND
Get Processing Range (First Run)
CDC State should be ILUPDATE
Data Flow
Mark Processed Range (First Run)
CDC State should be TFEND
Get Processing Range (Subsequent Runs)
CDC State should be TFSTART
Data Flow
Mark Processed Range (Subsequent Runs)
CDC State should be TFEND
Repeat the last three steps
This is not how my CDC states are being set, though... Here are my states along the same process.
Mark Initial Load Start
CDC State is ILSTART
Data Flow
Mark Initial Load End
CDC State is ILEND
Get Processing Range (First Run)
CDC State is ILUPDATE
Data Flow
Mark Processed Range (First Run)
CDC State is ILEND
Get Processing Range (Subsequent Runs)
CDC State is ILUPDATE
Data Flow
Mark Processed Range (Subsequent Runs)
CDC State is ILEND
Repeat the last three steps
I am never able to get out of the ILUPDATE/ILEND loop, so I am never able to get any new data from the CT table. Why is this happening and what can I do to fix this?
Thank you so much, in advance, for your help! :)
Edit 1
Here are a couple of articles that sort of describe my situation, though not exactly. They also did not help me resolve this issue, but it might help you think of something I can try.
http://www.bradleyschacht.com/understanding-the-cdc-state-value/
http://msdn.microsoft.com/en-us/library/hh231087.aspx
The second article includes this image, which shows the ILUPDATE/ILEND loop I am trapped in.
Edit 2
Last week (May 26, 2014) I disabled then re-enabled CDC on the subscription_events table. This didn't change anything, so I then disabled CDC on the entire database, re-enabled CDC on the database, and then enabled CDC on the subscription_events table. This did make CDC work for a few days (and I thought the problem had been resolved by going through this process). However, at the end of last week (May 30, 2014) I needed to re-load the entire table via this process, and I ran into the same problem again. I'm still stuck in this loop and I'm not sure why or how to get out of it.
Edit 3
Before I was having this problem, I was having a separate issue which I posted about here:
CDC is enabled, but cdc.dbo<table-name>_CT table is not being populated
I am unsure if these are related, but figured it couldn't hurt to provide it.
I had the same problem.
I have an initial load package for kickking things off and a separate incremental load package for loading updates on a schedule.
You fix it by putting a "Mark CDC start" CDC Control Task at the end of you Initial Load package only. That will leave the state value in a TFEND state, which is what you want for when your Incremental load starts.

data migration in informatica

A large amount of data is coming from source to target. After a successful insertion in target, we have to change the status to every rows as "committed". But when will we know that all datas have come or not in target without directly querying the source?
For example - suppose 10 records have migrated to target from source.
We cannot change the status of all the records as "committed" before successful insertion of all records in target.
So before changing the status of all the records, how will we know that 11th record is coming or not?
Is there anything that will give me the information about total records in source?
I need a real-time based answer.
we had the same scenario and this is what we did:
First of all
to check if data is loaded in target you can join source and target table, update will lock the rows so for this commit must be fired at database level in target table (so that lock for update can happen).
after joining, update the loaded data based on join with target column.
Few things.
You have to stop you session (used pmcmd to stop session in command task)
update data in your source table and restart session.
keep load for counter of 20k-30 rows so update goes smoothly.

Resources