Can StreamSets Data Collector automatically create tables in the destination database? - sql-server

Is there a way for StreamSets Data Collector to automatically create tables in the destination database based on the origin database in the case of cdc?
I am reading data from a source: mssql and writing to a destination, postgresql. If I am interested in 50 tables in the source, I do not want to manually create those tables in the destination db.

There is a (beta) Postgres Metadata processor for StreamSets Data Collector that will create and alter tables on the fly - more information at Drift Synchronization Solution for Postgres.

Related

How can I sync a SQL Server view to a Postgres table?

I need to sync data from several tables in a legacy SQL Server db (source) to a single table in a Postgres db (target). The schema of the source db is absurd, so the query to select the data takes a very long time to run. I'm planning to create an indexed view in the source db, and then somehow sync that indexed view to the Postgres table.
Right now, I simply have a scheduled task that drops the Postgres table (target) and then recreates it from scratch by running the complex query in the source db. This was quick to set up, and it ensures that changes in the source db always eventually make it to the target db, but recreating the table every few hours is (understandably) very slow and expensive. I need a way to replicate ongoing changes (only the new/updated data) from the source view to the target table. Is there a (relatively) simple way to do this?
I'm somewhat familiar with CDC, but I understand that CDC cannot be used on a view, so I don't believe that's an option. Adding "updated at" timestamps to the source tables is not an option, so I can't use that approach. I could add a hash column to the source tables, or maybe add a hash column to the view, so that's an option if that would work. Is there an existing tool/service that does what I need?
If you want to view SQL Server DB data in PostgreSQL, then you can also tds_fdw.
https://github.com/tds-fdw/tds_fdw
Also, there are some third-party tools which could help you to achieve your goal, for example, SymmetricDS
http://www.symmetricds.org/about/overview

Database: backup few tables with data+schema and other tables with only schema

In our database, we have application related data tables and transaction related data tables. since there is huge amount of records in my transaction table I want to ignore them while taking backup. So basically when I run a scheduler I want schema + data for application related tables and only schema for transaction related data tables.
I was suggested to use generate script. however I'm not sure if it would work because my application tables are linked with each other and my primary key columns are generally identity columns.
backup few tables with data+schema and other tables with only schema
For such a scenario, a regular SQL Server Backup functionality will not work at all, because there is no way to split data and structure and no way to avoid backup of some tables.
Even if you will perform a filegroup backup, it does not mean that you can restore only that filegroup and leave other tables as is. The filegroup backups just do not work this way.
Therefore, scripting can be one of the solutions:
Create a script of data and structure for smaller tables
Create a script of the structure only of transactional tables
Another approach is a dump necessary data and structure into a separate database with a further backup:
Something like:
SELECT * INTO ExportDB.dbo.Table1
SELECT * INTO ExportDB.dbo.Table2
SELECT * INTO ExportDB.dbo.Table3
--- two tables below will have no data, only a structure
SELECT * INTO ExportDB.dbo.Table4 WHERE 1=0 -- A large transactional table 1
SELECT * INTO ExportDB.dbo.Table5 WHERE 1=0 -- A large Transactional table 1
BACKUP DATABASE ExportDB TO DISK='..'
DROP DATABASE ExportDB
However, native backups (that are not an option for your scenario) can ensure data consistency, enforced by PK and FK, while custom options I mentioned after, cannot really guarantee it
References:
How can I take backup of particular tables in SQL Server 2008 using T-SQL Script

Most efficient and easiest way to back up a portion of specific tables hourly

I need to create an hourly .SQB backup file of some specific tables, each filtered with a WHERE clause, from a SQL Server database. As an example, I need this data:
SELECT * FROM table1 WHERE pk_id IN (2,5,7)
SELECT * FROM table2 WHERE pk_id IN (2,5,7)
SELECT * FROM table3 WHERE pk_id IN (2,5,7)
SELECT * FROM table4 WHERE pk_id IN (2,5,7)
The structure of the tables on the source database may change over time, e.g. columns may be added or removed, indexes added, etc.
One option is to do some kind of export, script generation, etc. into a staging database on the same instance of SQL Server. Efficiency aside, I have no problem dropping or truncating the tables on the destination database each time. In short, I'm looking to have both the schema and data of the tables duplicated to the destination database. That's completely acceptable.
Another is to just create a .SQB backup from the source database. Being that the .SQB file is all that I really need (it's going to be sent SFTP) - that would be fine, too.
What's the recommended approach in this scenario?
Well if I understand your requirement correctly, you want data from some tables from your database to be shipped over to somewhere else periodically.
Thing that is not possible in SQL server is taking a backup of a subset of tables from your database. So, this is not an option.
Since you have mentioned you will be using SFTP to send the data, using BCP command to extract data is one option, but BCP command may or may not perform very well and it definitely will not scale-out very well.
Instead of using BCP, I would prefer an SSIS package, you will be able to do all (extract files, add where clauses, drop files on SFTP, tune your queries, logging, monitoring etc) in your SSIS package.
Finally, SQL Server Replication can be used to create a subscriber, only publish the articles (tables) you are interested in, you can also add where clauses in your publications.
Again there are a few options with the replication subscriber database.
Give access to your data clients to your subscriber database, no need
for extracts.
Use BCP on the subscriber database to extract data,
without putting load on your production server.
Use SSIS Package to
extract data from the subscriber database.
Finally create a backup of
this subscriber database and ship the whole backup (.bak) file to
SFPT.
I short there is more than one way to skin the cat, now you have to decide which one suits your requirements best.

How to delete data in DMVs( system tables) in Microsoft Parallel Datawarehouse

I need to delete some of the data in a system table. Is there anything that I can do?
You cannot delete data from DMVs, nor alter results from DMFs since they are collecting the data from sql server resource table which is unavailable to the user.
The only way to reset the data is to restart/shutdown sql server service

Load data from one oracle db to another oracle db best practices/methods

I wanted to do onetime load from one source Oracle db to destination oracle db.
it can't done direct load /unload or import/export of data as it as different tables structures columns at source and destination. so it requires good transformation,
My plan is to get the data as in XML format from the source DB and process the XML to destination DB.
and also Data volume would be more ( 1 to 20+ million records or more in some tables) and the databases involved are : Oracle (source) and Oracle (destination),
Please provide some best practices or best way to do this.
I'm not sure that I understand why you can't do a direct load.
If you create a database link on the destination database that points to the source database, you can then put your ETL logic into SQL statements that SELECT from the source database and INSERT into the destination database. That avoids the need to write the data to a flat file, to read that flat file, to parse the XML, etc. which is going to be slow and require a decent amount of coding. That way, you can focus on the ETL logic and you can migrate the data as efficiently as possible.
You can write SQL (or PL/SQL) that loads directly from the old table structure on the old database to the new table structure on the new database.
INSERT INTO new_table( <<list of columns>> )
SELECT a.col1, a.col2, ... , b.colN, b.colN+1
FROM old_table_1#link_to_source a,
old_table_2#link_to_source b
WHERE <<some join condition>>

Resources