SQL Server - Temporal Table - Storage costs - sql-server

are there any information in the net, where i can verify how hight are the storage costs for temporal tables feature?
Will the server creates a the full hardcopy of the row/tuple that was modified?
Or will the server use a reference/links to the original values of the master table that are not modified?
For example. I have a row with 10 columns = storage 100 KB. I change one value of that row, thow times. I have thow rows in the historical table after that changes. Is the fill storage cost for the master und historial table then ~300KB?
Thanks for every hint!
Ragards

Will the server creates a the full hardcopy of the row/tuple that was
modified? Or will the server use a reference/links to the original
values of the master table that are not modified?
Here is the cite of the book Pro SQL Server Internals
by Dmitri Korotkevitch that ansers your question:
In a nutshell, each temporal table consists of two tables — the
current table with the current data, and a history table that stores
old versions of the rows. Every time you modify or delete data in
the current table, SQL Server adds an original version of those rows
to the history table.
A current table should always have a primary key defined. Moreover,
both current and history tables should have two datetime2 columns,
called period columns, that indicate the lifetime of the row. SQL
Server populates these columns automatically based on transaction
start time when the new versions of the rows were created. When a row
has been modified several times in one transaction, SQL Server does
not preserve uncommitted intermediary row versions in the history
table.
SQL Server places the history tables in a default filegroup, creating
non-unique clustered indexes on the two datetime2 columns that
control row lifetime. It does not create any other indexes on the
table.
In both the Enterprise and Developer Editions, history tables use
page compression by default.
So it's not
reference/links to the original values of the master table
Previous row version is just copied as it is into historical table on every mofification.

Related

Synchronize table between two different databases

Once a day I have to synchronize table between two databases.
Source: Microsoft SQL Server
Destination: PostgreSQL
Table contains up to 30 million rows.
For the first time i will copy all table, but then for effectiveness my plan is to insert/update only changed rows.
In this way if I delete row from source database, it will not be deleted from the destination database.
The problem is that I don’t know which rows were deleted from the source database.
My dirty thoughts right now tend to use binary search - to compare the sum of the rows on each side and thus catch the deleted rows.
I’m at a dead end - please share your thoughts on this...
In SQL Server you can enable Change Tracking to track which rows are Inserted, Updated, or Deleted since the last time you synchronized the tables.
with TDS FDWs (Foreign Data Wrapper), map the source table with a temp table in pg, an use a join to find/exclude the rows that you need.

Is using a wide temporal table with only one regularly updated column efficient?

I have been unable to pin down how temporal table histories are stored.
If you have a table with several columns of nvarchar data and one stock quantity column that is updated regularly, does SQL Server store copies of the static columns for each change made to stock quantity, or is there an object-oriented method of storing the data?
I want to include all columns in the history because it is possible there will be rare changes to the nvarchar columns, but wary of the table history size if millions of qty updates are duplicating the other columns.
I suggest that you use the SQL Server temporal table only for the values that need monitoring otherwise the fixed unchanging attribute values would get duplicated with every change. SQL Server stores a whole new row whenever a row update occurs. See the docs:
UPDATES: On an UPDATE, the system stores the previous value of the row
in the history table and sets the value for the SysEndTime column to
the begin time of the current transaction (in the UTC time zone) based
on the system clock
You need to move your fixed varchar attributes/fields to another table and use a relation, 1:1 or whatever will be suitable.
Check also other relevant questions under the temporal-tables tag:
SQL Server - Temporal Table - Storage costs
SQL Server Temporal Table Creating Duplicate Records
Duplicates in temporal history table

Partition existing tables using PostgreSQL 10

I have gone through a bunch of documentation for PostgresSQL 10 partitioning but I am still not clear on whether existing tables can be partitioned. Most of the posts mention about partitioning existing tables using PostgreSQL 9.
Also, in the official PostgresSQL website : https://www.postgresql.org/docs/current/static/ddl-partitioning.html, it mentions 'It is not possible to turn a regular table into a partitioned table or vice versa'.
So, my question is can existing tables be partitioned in PostgreSQL 10?
If the answer is YES, my plan is :
Create a partitions
Alter the existing table to include the range so new data goes into the new partition. Once that is done, write a script which loops over the master table and moves the data into the right partitions.
Then, truncate the master table and enforce that nothing can be inserted into it.
If the answer is NO, my plan is to make the existing table the first partition
Create a new parent table and children(partitions).
Perform light transaction which will rename the existing table to a partition table name and the new parent to the actual table name.
Are there better ways to partition existing tables in PostgreSQL 10/9?

SQL Server : split records pointing to a unique varbinary value

I have an interesting problem for the smart people out there.
I have an external application I cannot modify writing pictures into a SQL Server table. The pictures are often non-unique, but linked to unique rows in other tables.
The table MyPictures looks like this (simplified):
Unique (ID) FileName (Varchar) Picture (Varbinary)
----------------------------------------------------------
xxx-xx-xxx1 MyPicture 0x66666666
xxx-xx-xxx2 MyPicture 0x66666666
xxx-xx-xxx3 MyPicture 0x66666666
This causes the same data to be stored over and over again, blowing up my database (85% of my DB is just this table).
Is there something on a SQL level I can do to only store the data once if filename & picture already exists in my table?
The only thing I can think of is to treat the current destination table as a 'staging' table, so allow all the rows the upstream process wants to write to it, but then have a second process that copies only distinct rows to the table(s) you're using on the SQL side and then deletes the rows from the table with the duplicates to reclaim your space.

CDC table not working after adding new columns to the source table

Two new columns were added to our source table while CDC was still enabled on the table. I need the new columns to appear in the CDC table but do not know what procedure should be followed to do this? I have already disabled CDC on the table, disabled CDC on the DB, added the new columns to the cdc.captured_columns table, and enabled CDC. But now I am getting no data in the CDC table!
Is there some other CDC table that must be updated after columns are added to the source table? These are all the CDC tables under the System Tables folder:
cdc.captured_columns <----- where I added the new columns
cdc.change_tables
cdc.dbo_myTable_CT <------ table where change data was being captured
cdc.ddl_history
cdc.index_columns
cdc.lsn_time_mapping
dbo.systranschemas
I recommend reading Tracking Changes in Your Enterprise Database. Is very detailed and deep. Among other extremly useful bits of info, there is such as:
DDL changes are unrestricted while change data capture is enabled.
However, they may have some effect on the change data collected if
columns are added or dropped. If a tracked column is dropped, all
further entries in the capture instance will have NULL for that
column. If a column is added, it will be ignored by the capture
instance. In other words, the shape of the capture instance is set
when it is created.
If column changes are required, it is possible to create another capture instance for a table (to a maximum of two capture instances per table) and allow consumers of the change data to migrate to the new table schema.
This is a very sensible and well thought design that considers schema drift (not all participants can have the schema updated simultaneously in a real online deployment). Having a multi-staged approach (deploy DDL, capture new CDC, upgrade subscribers, drop old CDC capture) is the only feasible approach and you should follow suit.

Resources