I have two DBs with identical structures and tables, and they both contain some identical and some different data.
my task is to transport all the records from DB1 to DB2, but I do not have to delete all the data present in DB2 before, I must first check if in DB2 there are different records from DB1, not present in DB1. If there is a different record in DB2 it should not be deleted. The problem that starting from the insert extrapolated from the DB1, can have the same IDs to the DB2 and therefore the insert command will give the duplicate key error.
I'm carrying out a manual operation, very onerous in terms of time:
With excel, in a card insert the result of the select of a table of the DB1, and in another card the result of the same table but of the DB2.
Subsequently, with the vertical search, I verify the data that are more on DB2. If I find them, group them and change the id (seeing the max id of the DB1 table), then I check the weak or related tables connected to them and change the external id to the strong table to which I changed the id and also changing the id of weak and relational tables, as done previously. all this with "update table set id = 15000 where id = 101"
Finally, put the extra records in place, do a delete of the rest and execute the inserts taken from the DB1. (all right)
But done this for every strong table, which in turn has "n" weak and relational tables is a massacre. (if done on one or two tables ok, but since it is happening to me often I need something a little more automated)
Do you have any info to give me?
Thanks in advance
Related
Is there any way of converting the last a.ROWID > b.ROWID values in below code in to snowflake? the below is the oracle code. Need to take the ROW ID to snowflake. But snowflake does not maintain ROW ID. Is there any way to achieve the below and convert the row id issue?
DELETE FROM user_tag.user_dim_default a
WHERE EXISTS (SELECT 1
FROM rev_tag.emp_site_weekly b
WHERE a.number = b.ID
AND a.accountno = b.account_no
AND a.ROWID > b.ROWID)
So this Oracle code seem very broken, because ROWID is a table specific pseudo column, thus comparing value between table seem very broken. Unless the is some aligned magic happening, like when user_tag.user_dim_default is inserted into rev_tag.emp_site_weekly is also written. But even then I can imagine data flows where this will not get what you want.
So as with most things Snowflake, "there is no free lunch", so the data life cycle that is relying on ROW_ID needs to be implemented.
Which implies if you are wanting to use two sequences, then you should do explicitly on each table. And if you are wanting them to be related to each other, it sounds like a multi table insert or Merge should be used so you can access the first tables SEQ and relate it in the second.
ROWID is an internal hidden column used by the database for specific DB operations. Depending on the vendor, you may have additional columns such as transaction ID or a logical delete flag. Be very carful to understand the behavior of these columns and how they work. They may not be in order, they may not be sequential, they may change in value as a DB Maint job runs while your code is running, or someone else runs an update on a table. Some of these internal columns may have the same value for more than one row for example.
When joining tables, the RowID on one table has no relation to the RowID on another table. When writing Dedup logic or delete before insert type logic, you should use the primary key, and then additionally an audit column that has the date of insert or date of last update in combo with that. Check the data model or ERD digram for the PK/FK relationships between the tables and what audit columns are available.
Once a day I have to synchronize table between two databases.
Source: Microsoft SQL Server
Destination: PostgreSQL
Table contains up to 30 million rows.
For the first time i will copy all table, but then for effectiveness my plan is to insert/update only changed rows.
In this way if I delete row from source database, it will not be deleted from the destination database.
The problem is that I don’t know which rows were deleted from the source database.
My dirty thoughts right now tend to use binary search - to compare the sum of the rows on each side and thus catch the deleted rows.
I’m at a dead end - please share your thoughts on this...
In SQL Server you can enable Change Tracking to track which rows are Inserted, Updated, or Deleted since the last time you synchronized the tables.
with TDS FDWs (Foreign Data Wrapper), map the source table with a temp table in pg, an use a join to find/exclude the rows that you need.
I have an interesting problem for the smart people out there.
I have an external application I cannot modify writing pictures into a SQL Server table. The pictures are often non-unique, but linked to unique rows in other tables.
The table MyPictures looks like this (simplified):
Unique (ID) FileName (Varchar) Picture (Varbinary)
----------------------------------------------------------
xxx-xx-xxx1 MyPicture 0x66666666
xxx-xx-xxx2 MyPicture 0x66666666
xxx-xx-xxx3 MyPicture 0x66666666
This causes the same data to be stored over and over again, blowing up my database (85% of my DB is just this table).
Is there something on a SQL level I can do to only store the data once if filename & picture already exists in my table?
The only thing I can think of is to treat the current destination table as a 'staging' table, so allow all the rows the upstream process wants to write to it, but then have a second process that copies only distinct rows to the table(s) you're using on the SQL side and then deletes the rows from the table with the duplicates to reclaim your space.
We're a manufacturing company, and we've hired a couple of data scientists to look for patterns and correlation in our manufacturing data. We want to give them a copy of our reporting database (SQL 2014), but it must be in a 'sanitized' form. This means that all table names get converted to 'Table1', 'Table2' etc., and column names in each table become 'Column1', 'Column2' etc. There will be roughly 100 tables, some having 30+ columns, and some tables have 2B+ rows.
I know there is a hard way to do this. This would be to manually create each table, with the sanitized table name and column names, and then use something like SSIS to bulk insert the rows from one table to another. This would be rather time consuming and tedious because of the manual SSIS column mapping required, and manual setup of each table.
I'm hoping someone has done something like this before and has a much faster, more efficienct, way.
By the way, the 'sanitized' database will have no indexes or foreign keys. Also, it may seem to make any sense why we would want to do this, but this is what was agreed to by our Director of Manufacturing and the data scientists, as the first round of analysis which will involve many rounds.
You basically want to scrub the data and objects, correct? Here is what I would do.
Restore a backup of the db.
Drop all objects not needed (indexes, constraints, stored procedures, views, functions, triggers, etc.)
Create a table with two columns, populate the table, each row has orig table name and new table name
Write a script that iterates through the table, roe by row, and renames your tables. Better yet, put the data into excel, and create a third column that builds the tsql you want to build, then cut/paste and execute in ssms.
Repeat step 4, but for all columns. Best to query sys.columns to get all the objects you need, put to excel, and build your tsql
Repeat again for any other objects needed.
Backip/restore will be quicker than dabbling in SSIS and data transfer.
They can see the data but they can't see the column names? What can that possibly accomplish? What are you protecting by not revealing the table or column names? How is a data scientist supposed to evaluate data without context? Without a FK all I see is a bunch of numbers on a column named colx. What are expecting to accomplish? Get a confidentially agreement. Consider a FK columns customerID verses a materialID. Patterns have widely different meanings and analysis. I would correlate a quality measure with materialID or shiftID but not with a customerID.
Oh look there is correlation between tableA.colB and tableX.colY. Well yes that customer is college team and they use aluminum bats.
On top of that you strip indexes (on tables with 2B+ rows) so the analysis they run will be slow. What does that accomplish?
As for the question as stated do a back up restore. Using system table drop all triggers, FK, index, and constraints. Don't forget to drop the triggers and constraints - that may disclose some trade secret. Then rename columns and then tables.
My company has an application with a bunch of database tables that used to use a sequence table to determine the next value to use. Recently, we switched this to using an identity property. The problem is that in order to upgrade a client to the latest version of the software, we have to change about 150 tables to identity. To do this manually, you can right click on a table, choose design, change (Is Identity) to "Yes" and then save the table. From what I understand, in the background, SQL Server exports this to a temporary table, drops the table and then copies everything back into the new table. Clients may have their own unique indexes and possibly other things specific to the client, so making a generic script isn't really an option.
It would be really awesome if there was a stored procedure for scripting this task rather than doing it in the GUI (which takes FOREVER). We made a macro that can go through and do this, but even then, it takes a long time to run and is error prone. Something like: exec sp_change_to_identity 'table_name', 'column name'
Does something like this exist? If not, how would you handle this situation?
Update: This is SQL Server 2008 R2.
This is what SSMS seems to do:
Obtain and Drop all the foreign keys pointing to the original table.
Obtain the Indexes, Triggers, Foreign Keys and Statistics of the original table.
Create a temp_table with the same schema as the original table, with the Identity field.
Insert into temp_table all the rows from the original table (Identity_Insert On).
Drop the original table (this will drop its indexes, triggers, foreign keys and statistics)
Rename temp_table to the original table name
Recreate the foreign keys obtained in (1)
Recreate the objects obtained in (2)