Ignite : Can I set affinity key after pushing data? - database

I have already pushed data from source to Ignite but I had not configured affinity key. If I configure affinity key now, will data get re-distributed or should I re-push data in ignite?
Thank you.

The table configuration in Ignite -- including the affinity key -- is largely immutable. In order to change the affinity key you'll need to drop and recreate the table.

Related

Will reclustering a table change the Micropartitions in Snowflake?

Will reclustering a table change the micropartions in Snowflake? I remember reading somewhere that Micro-partitions can not be changed once created and are immutable. I am obviously mixing different things. Can someone please explain?
Yes, when you enable clustering on a table, the data will be re-distributed to the micro-partitions in background by the clustering service. It will sort data (time to time) and create new micro-partitions, delete the old ones.
This article may be helpful:
https://docs.snowflake.com/en/user-guide/tables-auto-reclustering.html

Setting Table Auto Clustering On in snowflake is not clustering the table

I moved from manual clustering to auto clustering around 2 week back.
And the steps i used are below.
Update AUTO_CLUSTERING_ON to yes for the table.
create a middle table and insert the record in the table.
then insert into the main table with order by clustering key from the middle table.
Then i see the clustering is all over the place.
I once did the manual clustering as well and see the cluster doing good.
however on next insert in the main table. clustering again looks trouble some.
Please suggest if I am missing anything.
please note:
The data loaded in middle table is insert from some other table as well. And that table is never clustered. I am not sure if that is the issue.(which i feel it should not be)
You may need to raise a case with Snowflake to enable automatic clustering. Accounts that were created a while ago won't have this enabled. From the documentation:
If manual reclustering is still available in your account, Automatic Clustering may not be enabled yet for your account.
You can request Automatic Clustering to be enabled for your account; however, it will only affect clustered tables that are defined from the time after the feature is enabled.
For clustered tables that were defined before the feature is enabled, you must explicitly “resume” Automatic Clustering for each table. You can use SQL to determine whether Automatic Clustering is enabled for a given table.
Also from the documentation here you should try to run the resume recluster command since the table may have been created prior to automatic clustering being enabled for your account:
alter table t1 resume recluster;
Dont forget that the table gets automatically gets reclustered at Snowflake discretion. Snowflake may simply not think the table requires reclustering based on a number of factors (which I don't know :))
I think raising a case with Snowflake will probably solve this pretty quickly so that may be the best route.
Not specifically related to the question, but I have found that periodically rebuilding a table will achieve the best clustering results, especially for tables which churn frequently. To do this you can specify an ORDER BY clause which mimics your clustering keys.
CREATE OR REPLACE TABLE t1 COPY GRANTS AS
SELECT * FROM t1 ORDER BY a, b, c;

CDC table not working after adding new columns to the source table

Two new columns were added to our source table while CDC was still enabled on the table. I need the new columns to appear in the CDC table but do not know what procedure should be followed to do this? I have already disabled CDC on the table, disabled CDC on the DB, added the new columns to the cdc.captured_columns table, and enabled CDC. But now I am getting no data in the CDC table!
Is there some other CDC table that must be updated after columns are added to the source table? These are all the CDC tables under the System Tables folder:
cdc.captured_columns <----- where I added the new columns
cdc.change_tables
cdc.dbo_myTable_CT <------ table where change data was being captured
cdc.ddl_history
cdc.index_columns
cdc.lsn_time_mapping
dbo.systranschemas
I recommend reading Tracking Changes in Your Enterprise Database. Is very detailed and deep. Among other extremly useful bits of info, there is such as:
DDL changes are unrestricted while change data capture is enabled.
However, they may have some effect on the change data collected if
columns are added or dropped. If a tracked column is dropped, all
further entries in the capture instance will have NULL for that
column. If a column is added, it will be ignored by the capture
instance. In other words, the shape of the capture instance is set
when it is created.
If column changes are required, it is possible to create another capture instance for a table (to a maximum of two capture instances per table) and allow consumers of the change data to migrate to the new table schema.
This is a very sensible and well thought design that considers schema drift (not all participants can have the schema updated simultaneously in a real online deployment). Having a multi-staged approach (deploy DDL, capture new CDC, upgrade subscribers, drop old CDC capture) is the only feasible approach and you should follow suit.

Allowing individual columns not to be tracked in Merge Replication

Using Merge Replication, I have a table that for the most part is synchronized normally. However, the table contains one column is used to store temporary, client-side data which is only meaningfully edited and used on the client, and which I don't have any desire to have replicated back to the server. For example:
CREATE TABLE MyTable (
ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
Name NVARCHAR(200),
ClientCode NVARCHAR(100)
)
In this case, even if subscribers make changes to the ClientCode column in the table, I don't want those changes getting back to the server. Does Merge Replication offer any means to accomplish this?
An alternate approach, which I may fall back on, would be to publish an additional table, and configure it to be "Download-only to subscriber, allow subscriber changes", and then reference MyTable.ID in that table, along with the ClientCode. But I'd rather not have to publish an additional table if I don't absolutely need to.
Thanks,
-Dan
Yes, when you create the article in the publication, don't include this column. Then, create a script that adds this column back to the table, and in the publication properties, under snapshot, specify that this script executes after the snapshot is applied.
This means that the column will exist on both the publisher and subscriber, but will be entirely ignored by replication. Of course, you can only use this technique if the column(s) to ignore are nullable.

Ibatis and polling database

I would like to use IBatis for polling 3 legacy databases for new rows and insert into a new database. But our customers don't allow me to insert one "status" column in three legacy databases which help me avoid consuming twice or more. So what do I have to do? Thanks in advance!
Create a new table with the status column and add a foreign key pointing to the primary key of the legacy table. Create a view with both tables joined together and you will have your status column associated with the legacy table without altering it.
You can use the idempotent consumer EIP to filter out duplicates
http://camel.apache.org/idempotent-consumer.html
But as Joachim said, you need a new table to store the status.
You can maybe also create a SQL VIEW on the original table + status table, and let iBatis query that view.

Resources