Get data and control records using AWS DMS with Kafka as the target - sql-server

I am using SQL Server RDS as the source database and Apache-Kafka as the target in AWS DMS. I want to receive both the data and control records on every CDC changes made in the source database but I am only getting data records in case of CRUD commands and control record in case of the DDL commands. I went through the AWS DMS documentation but couldn't find anything relevant.
Is it possible to get both the control and data records in the Kafka topic?

It is not possible to get both the control and data records using aws dms.

Related

Sync API table to SQL Database

I'm exploring options on how to one-way sync from a table available via API to an SQL database. Does anyone have any suggestions on how to achieve this?
The data from the "Source" is often updated and should be copied to the "Destination" as the changes happen (live).
Source
Read Only table from an ERP available via an API. Webhooks on the source are not possible. Entries to this table may be created, updated or deleted. There would be approximately 150,000 entries in the table with about 1000 changes per day.
Destination
Azure MS SQL database which I have full control over.
I'm looking for best practice or any ideas on how to achieve this. There seems to be very few articles that I can find with anything helpful.
I'm open to using any tool on Azure including Logic Apps and Azure Functions but want to stay away from using 3rd party tools.
If you are trying to achieving this through logic apps, Below is the flow that you can follow.
Note: Make sure you preprocess the data before sending the data to SQL database using appropriate actions based on the type of data that you are receiving.

Copying tables from databases to a database in AWS in simplest and most reliable way

I have some tables from three databases that I want to copy their data to another database in an automated way and these data are quite large. My servers are running on AWS. What is the simplest and most reliable way to do so?
Edit
I want them to stay on-sync (automation process as DevOps engineer)
The databases are all MySQL and all moved between AWS EC2. The data is in range between 100GiB and 200GiB
Currently, Maxwell to take the data from the tables then moved to Kafka and then a script written in Java to feed the other database.
I believe you can use AWS Database Migration Service (DMS) to replicate tables from each source into a single target. You would have a single target endpoint and three source endpoints. You would have three replication tasks that would take data from each source and put it into your target. DMS can keep data in sync via ongoing replication. Be sure to read up on the documentation before proceeding as it isn't the most intuitive service to use, but it should be able to do what you are asking.
https://docs.aws.amazon.com/dms/latest/userguide/Welcome.html

Is there anyway to stream data from Snowflake to Oracle other then informatica

I am working on requirement where I need to stream data from Snowflake to Oracle for some value added process.
Few method which I got to know is unload file to S3 then load to Oracle and other one is informatica.
But above two approaches require some effort so is there any simple way of streaming data from Snowflake to Oracle.
Snowflake cannot connect directly to Oracle. You'll need some tooling or code "in between" the two.
I came across this data migration tool the other day, and it appears to support both Snowflake and Oracle: https://github.com/markddrake/YADAMU---Yet-Another-DAta-Migration-Utility/releases/tag/v1.0
-Paul-

Allow Data Push into an Azure SQL Database?

I'm relatively new to Azure and am having trouble finding what options are out there for connecting to an existing SQL database to push data into it.
The situation is that we have an external client who needs to connect to our Azure SQL database to push data into it, on an on-going basis. We can't give them permission to get into our database, so we're looking at what we can do allow data in. At this point the best option seems to be to create a web service deployed in Azure that will validate the data and then push it into our database.
The question I have is, are there other options to do this in an easier way? Are there Azure services or processes that can be set up to automatically process a file and pull the data into a database? Any other go-between options when each side has their own database and for security reasons can't just open up access to it?
Azure Data Factory works great for basic ETL. If neither party can grant direct access, you can use an intermediate repository like Blob Storage to drop csv/xml/json files for ingestion. If they'll grant you access to pull, you can setup a linked service that more or less functions the same as a linked server in MSSQL. As of the last release ADF now supports Azure hosted SSIS packages too.
I would do this via SSIS using SQL Studio Managemenet Studio (if it's a one time operation). If you plan to do this repeatedly, you could schedule the SSIS job to execute on schedule. SSIS will do bulk inserts using small batches so you shouldn't have transaction log issues and it should be efficient (because of bulk inserting). Before you do this insert though, you will probably want to consider your performance tier so you don't get major throttling by Azure and possible timeouts.

PostgreSQL -> Oracle replication

I'm looking for a tool to export data from a PostgreSQL DB to an Oracle data warehouse. I'm really looking for a heterogenous DB replication tool, rather than an export->convert->import solution.
Continuent Tungsten Replicator looks like it would do the job, but PostgreSQL support won't be ready for another couple months.
Are there any open-source tools out there that will do this? Or am I stuck with some kind of scheduled pg_dump/SQL*Loader solution?
You can create a database link from Oracle to Postgres (this is called heterogeneous connectivity). This makes it possible to select data from Postgres with a select statement in Oracle. You can use materialized views to schedule and store the results of those selects.
It sounds like SymmetricDS would work for your scenario. SymmetricDS is web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time.
Sounds like you want an ETL (extract transform load) tool. There are allot of open source options Enhydra Octopus, and Talend Open Studio are a couple I've come across.
In general ETL tools offer you better flexibility than the straight across replication option.
Some offer scheduling, data quality, and data lineage.
Consider using the Confluent Kafka Connect JDBC sink and source connectors if you'd like to replicate data changes across heterogeneous databases in real time.
The source connector can select the entire database , particular tables, or rows returned by a provided query, and send the data as a Kafka message to your Kafka broker. The source connector can calculate the diffs based on an incrementing id column, a timestamp column, or be run in bulk mode where the entire contents are recopied periodically. The sink can read these messages, optionally check them against an avro or json schema, and populate the source database with the results. It's all free, and several sink and source connectors exist for many relational and non-relational databases.
*One major caveat - Some JDBC Kafka connectors can not capture hard deletes
To get around that limitation, you can use a propietary connector such as Debezium (http://www.debezium.io), see also
Delete events from JDBC Kafka Connect Source.

Resources