Issue replicating updates to primary key data with SymmetricDS - symmetricds

I am replicating from MSSQL (SQL Server 13) to PostgreSQL (9.5) using SymmetricDS.
The table that is replicating has a composite key of 7 different columns. Everything works perfect from an initial load to inserting and updating data.
However, I run into a problem whenever I run an update that modifies data in one of the 7 columns that comprise the primary key. On the MSSQL side, it updates the row, no problem. On the Postgres side, rather than updating the column, it inserts an additional row.
If I modify the sym_transform_column entry to have 0 for pk the specific column then it will update the data correctly but will not utilize that column as a primary key to determine which row to update.
Example Generated SQL with pk=0 for sym_transform_column:
update table set pk1 = 0, value1 = 'test', value2 = 'test' where pk2 = 0 and pk3 = 0
Example Generated SQL with pk=1 for sym_transform_column:
update table set value1='test', value2='test' where pk1 = 0 and pk2 = 0 and pk3 = 0
I realize that it is generally accepted that PK should be immutable but to cover all contingencies, is there a way to replicate the update to primary key data from MSSQL to PostgreSQL using SymmetricDS?

Is it possible to add a column in the source table and treat it as a primary key? It could, for example, be a concatenation if the seven columns that comprise the composite key. Then declare this column as a primary key for the synchronization and add the same column in the table in the target database.

Related

How to do transaction.insert_or_update on secondary index and not the primary index?

I have a table in Google Cloud Spanner.
CREATE TABLE test_id (
Id STRING(MAX) NOT NULL,
KeyColumn STRING(MAX) NOT NULL,
parent_id INT64 NOT NULL,
Updated TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
) PRIMARY KEY (Id)
And, I am trying to perform transaction.insert_or_update through a python script.
For each row in a pandas dataframe, I am doing:
transaction.insert_or_update(
'test_id', columns=['Id','KeyColumn', 'parent_id', 'Updated'],
values=[(uuid.uuid4().hex, row["KeyColumn"], row["parent_id"], spanner.COMMIT_TIMESTAMP)],
)
What I want is that if the row["KeyColumn"] is already present in KeyColumn of the table, update its parent_id column, otherwise insert a new row in the Spanner table corresponding to that KeyColumn.
But since, my primary key is Id which is generated randomly by uuid.uuid4().hex, it every time inserts a new row.
If I understand you correctly, the following is the situation:
ID is the primary key of your table.
There is a unique index defined for the table on the column KeyColumn.
You want to insert_or_update a row using KeyColumn as the column that should be used to determine whether the row already exists.
That is unfortunately not possible. insert_or_update will always use the primary key of the table to determine whether the row exists. I can think of three possible solutions to this problem, but they all have their drawbacks:
You could change the table definition and make KeyColumn the primary key and set a unique index on the Id column. The problem with this is of course that any other code that depends on Id being the primary key also needs to change. It is also a rather cumbersome change, because Cloud Spanner does not allow you to change the primary key of a table, so you would have to create a copy of the test_id table and then drop the old table.
You could fetch the row from Cloud Spanner before updating it by reading it using the KeyColumn value that you have. The big problem with this is obviously performance. You will need to do a read for each row that you want to update.
You could use a DML statement (UPDATE test_id SET parent_id=#parent WHERE KeyColumn=#key) to execute the update and check whether it actually updated a row by checking the returned update count. If it did not update anything, you could then execute the insert. This will obviously also be slower than an insert_or_update mutation.
Here there is a way to query the Cloud Spanner with a specific index.
You should use something like this in the end of your query : FROM test_id#{FORCE_INDEX=KeyColumnIndex} .
Even though this is the way to execute queries on secondary indexes and the answer for the question in the title, I do not know how much it can be applied in your use case.

The row-limitation in compound primary key in SQL Server 2014

I am going to insert a 2.3 billion rows (2,300,000,000) from table_a into table_b. The schema of table_a and table_b are identical, the only difference is table_a doesn't have a primary key but table_b has set up a 4 columns compound primary key with 0 rows of data. I encounter the error message after 24 hours:
Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 422223771074560. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
This is my compound PK in table_b and the sample query code, any help will be thankful.
column1: varchar(10), not null
column2: nvarchar(50), not null
column3: nvarchar(100), not null
column4: int, not null
Sample code
insert into table_b
select *
from table_a
where date < '2017-01-01' -- some filters here
According to the SQL Server Documentation part of creating a primary key includes creating a unique index on that same table.
When you create a PRIMARY KEY constraint, a unique index on the
column, or columns, is automatically created. By default, this index
is clustered; however, you can specify a nonclustered index when you
create the constraint.
When a unique index is not on the table, each row gets what the docs are calling a "uniqueifier" which is 4 bytes in length (aka ~2.14 Billion combinations)
If the clustered index is not created with the UNIQUE property, the
Database Engine automatically adds a 4-byte uniqueifier column to the
table. When it is required, the Database Engine automatically adds a
uniqueifier value to a row to make each key unique. This column and
its values are used internally and cannot be seen or accessed by
users.
From this information and your error message we can tell two things:
There is a clustered index on the table
There is not a primary key on the table
Given the volume of the data you're dealing with, I'm betting you have a Clustered Columnstore Index on the table, which in SQL Server 2014 does not have the ability to have a primary key on.
One possible solution is to partition table_b based on particular column value (that has less than 15K unique values based on the limitations specified in the documentation). As a side-note, the same partitioning effort could have a significant impact on minimizing run time of any queries using table_b depending on which column is used in the partition function.
You know that:
If the clustered index is not created with the UNIQUE property, the
Database Engine automatically adds a 4-byte uniqueifier column to the
table. When it is required, the Database Engine automatically adds a
uniqueifier value to a row to make each key unique. This column and
its values are used internally and cannot be seen or accessed by
users.
While it´s unlikely that you will face an issue related with uniqueifiers, we have seen rare cases where customer reaches the uniqueifier limit of 2,147,483,648, generating error 666.
And from this topic about the issue we have:
As of February 2018, the design goal for the storage engine is to not
reset uniqueifiers during REBUILDs. As such, rebuild of the index
ideally would not reset uniquifiers and issue would continue to occur,
while inserting new data with a key value for which the uniquifiers
were exhausted. But current engine behavior is different for one
specific case, if you use the statement ALTER INDEX ALL ON
REBUILD WITH (ONLINE = ON), it will reset the uniqueifiers (across all
version starting SQL Server 2005 to SQL Server 2017).
So, if this is the cause if your issue, you can add additional integer column and build the index over it.

Incorrect Duplicate insert problem with SQL Server CE 3.5

I am not able to insert data into my table anymore!
Here's my table design.
intId is the Primary Key, there's no explicit unique constraint defined on it, has identity increment set to 1 and identity seed to 1.
I am inserting data into this table thru LINQ.
testDB.tbl_Vehicle.InsertOnSubmit(newVehicle);
testDB.SubmitChanges();
All this used to work till now, and all of a sudden it stopped working!
It now says
A duplicate value cannot be inserted into a unique index. [ Table name = tbl_Vehicle,Constraint name = PK_tbl_Vehicle ]
More info: This desktop application has 1 executable and 1 .sdf file. It was developed on Win 7 and recently was moved to Win XP system. But that shouldn't be a problem as there are other tables I am inserting into with similar logic and table design.
Do one thin make use of SQL profiler and check the query fire on insert statement.
More on check the database Table again and if possbile set the seed for the primary i.e identity column.

Converting int primary key to bigint in Sql Server

We have a production table with 770 million rows and change. We want(/need?) to change the Primary ID column from int to bigint to allow for future growth (and to avoid the sudden stop when the 32bit integer space is exhausted)
Experiments in DEV have shown that this is not as simple as altering the column as we would need to drop the index and then re-create it. So far in DEV (which is a bit humbler than PROD) the dropping of the index has not finished after 1 and a half hours. This table is hit 24/7 and having it offline for such a long time is not an option.
Has anyone else had to deal with a similar situation? How did you get it done?
Are there alternatives?
Edit: Additional Info:
The Primary key is clustered.
You could attempt a staged approach.
Create a new bigint column
Create an insert trigger to keep new entries in sync with the 2 columns
Execute an update to populate all the empty values in the bigint column with the converted value
Change the primary index on the table from your old id column to the new one
Point any FK's and queries to use the new column
Change the new column to become your identity column and remove the insert trigger from #2
Delete the old ID column
You should end up spreading the pain out over these 7 steps instead of hitting it all at once.
Create a parallel table with the longer data type for new rows and UNION the results?
What I had to do was copy the data into a new table with the desired structure (primary/clustered key only, non-clustered/FK once complete). If you don't have the room, you could bcp out the data and back in. You may need an application outage to make this happen.
What doesn't work: alter table Orderhistory alter column ID bigint because of the primary key. Don't drop the key and alter column as you will just fill your log file and take much longer than copy/bcp.
Never use the SSMS tools designer to change a column property, it copies table into temp table then does a rename once done. Lookup the alter table alter column syntax and use it and possibly defrag once complete if you modified a column wider that sits in middle of table.

SQL server 2005 :Updating one record from 2 identical records

I have 2 records in a table in SQL Server 2005 db which has exactly same data.
I want to update one record.Is there anyway to do it?Unfortunately this table does not have an identity column and i cant use a direct update query because both will be updated since data is same.Is there anyway using rowid or something in SQL server 2005 ?
I don't much like the TOP operator, but:
UPDATE top (1) MyTable
set Data = '123'
where Data = 'def'
Really, you want to have primary keys on your tables to avoid just this kind of situation, even if they are just identity surrogate values.
I would add an identity column to the table and then update on that identity column or update based on whatever the primary key of the table is that makes the row unique.

Resources