How to get the table-name and database-name in the CDC event received from debezium kafka connect

How to get the table-name and database-name in the CDC event received from debezium kafka connect - sql-server

Setup: I have CDC enabled on MS SQL Server and the CDC events are fed to Kafka using debezium kafka connect(source). Also more than one table CDC events are routed to a single topic in Kafka.
Question: Since I have more than one table data in the kafka topic, I would like to have the table name and the database name in the CDC data.
I am getting the table name and database name in MySQL CDC but not in MS SQL CDC.
Below is the Debezium source connector for the SQL Server
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "cdc-user_profile-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "<<hostname>>",
"database.port": "<<port>>",
"database.user": "<<username>>",
"database.password": "<<password>>",
"database.server.name": "test",
"database.dbname": "testDb",
"table.whitelist": "schema01.table1,schema01.table2",
"database.history.kafka.bootstrap.servers": "broker:9092",
"database.history.kafka.topic": "digital.user_profile.schema.audit",
"database.history.store.only.monitored.tables.ddl": true,
"include.schema.changes": false,
"event.deserialization.failure.handling.mode": "fail",
"snapshot.mode": "initial_schema_only",
"snapshot.locking.mode": "none",
"transforms":"addStaticField,topicRoute",
"transforms.addStaticField.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addStaticField.static.field":"source_system",
"transforms.addStaticField.static.value":"source_system_1",
"transforms.topicRoute.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.topicRoute.regex":"(.*)",
"transforms.topicRoute.replacement":"digital.user_profile",
"errors.tolerance": "none",
"errors.log.enable": true,
"errors.log.include.messages": true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
}
}'
I am getting the below output (Demo data)
{
"before": {
"profile_id": 147,
"email_address": "test#gmail.com"
},
"after": {
"profile_id": 147,
"email_address": "test_modified#gmail.com"
},
"source": {
"version": "0.9.4.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1556723528917,
"change_lsn": "0007cbe5:0000b98c:0002",
"commit_lsn": "0007cbe5:0000b98c:0003",
"snapshot": false
},
"op": "u",
"ts_ms": 1556748731417,
"source_system": "source_system_1"
}
My requirement is to get as below
{
"before": {
"profile_id": 147,
"email_address": "test#gmail.com"
},
"after": {
"profile_id": 147,
"email_address": "test_modified#gmail.com"
},
"source": {
"version": "0.9.4.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1556723528917,
"change_lsn": "0007cbe5:0000b98c:0002",
"commit_lsn": "0007cbe5:0000b98c:0003",
"snapshot": false,
"db": "testDb",
"table": "table1/table2"
},
"op": "u",
"ts_ms": 1556748731417,
"source_system": "source_system_1"
}

This is planned as a part of https://issues.jboss.org/browse/DBZ-875 issue

Debezium Kafka-Connect generally puts data from each table in a separate topic and the topic name is of the format hostname.database.table. We generally use the topic name to distinguish between the source table & database name.
If you are putting the data from all the tables manually into one topic then you might have to add the table and database name manually as well.

Related

Using Volttron aggregation agent

I'm trying to get the aggregate agent to work with timescale db.
https://volttron.readthedocs.io/en/main/developing-volttron/developing-agents/specifications/aggregate.html
I have a file aggregation.config
{
"connection": {
"type": "postgresql",
"params": {
"dbname": "volttrondb",
"host": "127.0.0.1",
"port": 5432,
"user": "user",
"password": "password",
"timescale_dialect": true
}
},
"aggregations":[
# list of aggregation groups each with unique aggregation_period and
# list of points that needs to be collected
{
"aggregation_period": "1h",
"use_calendar_time_periods": true,
"utc_collection_start_time":"2016-03-01T01:15:01.000000",
"points": [
{
"topic_names": ["campus/building/fake/EKG_Cos", "campus/building/fake/EKG_Sin"],
"aggregation_topic_name":"campus/building/fake/avg_of_EKG_Cos_EKG_Sin",
"aggregation_type": "avg",
"min_count": 2
}
]
}
]
}
And run the following command
vctl install services/core/SQLAggregateHistorian/ --tag aggregate-historian -c config/aggregation.config --start
It starts correctly - the vctl status shows it running and there are no errors in the log.
I do not see the point campus/building/fake/avg_of_EKG_Cos_EKG_Sin in the topics table.
Any suggestions?

Solr 9 – Error "No suggester named suggest was configured"

I am new to Solr and am trying to add the Suggest Search Component via the Config API.
My Solr 9.0.0 setup is as follows:
bin/solr start -e cloud with _default config
Index a csv with bin/post -c collection_name file_path/file.csv
Add the search component with a POST request to http://localhost:8983/api/collections/collection_name/config:
{
"add-searchcomponent": {
"name": "suggest",
"class": "solr.SuggestComponent",
"lookupImpl": "FuzzyLookupFactory",
"dictionaryImpl": "DocumentDictionaryFactory",
"field": "name",
"suggestAnalyzerFieldType": "string",
"buildOnStartup": false
}
}
Add the request handler with a POST request to http://localhost:8983/api/collections/collection_name/config:
{
"add-requesthandler": {
"name": "/suggest",
"startup": "lazy",
"class": "solr.SearchHandler",
"defaults": {
"suggest": true,
"suggest.count": 10
},
"components": ["suggest"]
}
}
This works fine and my configoverlay.json looks as follows:
{
"props":{"updateHandler":{"autoSoftCommit":{"maxTime":3000}}},
"searchComponent":{"suggest":{
"name":"suggest",
"class":"solr.SuggestComponent",
"lookupImpl":"FuzzyLookupFactory",
"dictionaryImpl":"DocumentDictionaryFactory",
"field":"name",
"suggestAnalyzerFieldType":"string",
"buildOnStartup":false}},
"requestHandler":{"/suggest":{
"name":"/suggest",
"startup":"lazy",
"class":"solr.SearchHandler",
"defaults":{
"suggest":true,
"suggest.count":10},
"components":["suggest"]}}}
But if I send the request http://localhost:8983/solr/collection_name/suggest?suggest.dictionary=suggest&q=J&suggest.build=true I receive the following error:
{
"responseHeader": {
"zkConnected": true,
"status": 400,
"QTime": 3
},
"error": {
"metadata": [
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"org.apache.solr.common.SolrException"
],
"msg": "No suggester named suggest was configured",
"code": 400
}
}
I could imagine it has something to do with the naming of the search component, as there are two distinct names as defined in the docs. There is a "searchComponent name" and a "name" parameter, but I don't now how to manipulate the "searchComponent name".
I have found this and this post on a similar error, but both posts are about the suggest.dictionary not being configured which is not the case in my situation.
Any help on what I am doing wrong is greatly appreciated.

One SuggestComponent can use multiple dictionaries, so you have to define them in a list and specify a name explicitly for each one.
Note also that buildOnStartup expects a string.
{
"add-searchcomponent": {
"name": "suggest",
"class": "solr.SuggestComponent",
"suggester": [{
"name": "fuzzy",
"lookupImpl": "FuzzyLookupFactory",
"dictionaryImpl": "DocumentDictionaryFactory",
"field": "name",
"suggestAnalyzerFieldType": "string",
"buildOnStartup": "false"
}]
}
}
Which you can query with .../suggest?suggest.dictionary=fuzzy&q=J&suggest.build=true

Reading message from Service bus

I have a logic app with started when there is a message in serviceBus queue. The message is being published to the service bus from the DevOps pipeline using "PublishToAzureServiceBus" as a JSON message or from the pipeline webhook.
But getting an issue while converting a message from service bus to original JSON format, not able to get valid JSON object. It's getting append with some Serialization object.
I have tried with base64 decode, and JSON converts but have not been able to get success.
Below is the content of the message it looks like.
Any pointer on how can solve this?
Sample message sent
{
"id": "76a187f3-c154-4e60-b8bc-c0b754e54191",
"eventType": "build.complete",
"publisherId": "tfs",
"message": {
"text": "Build 20220605.8 succeeded"
},
"detailedMessage": {
"text": "Build 20220605.8 succeeded"
},
"resource": {
"uri": "vstfs:///Build/Build/288",
"id": 288,
"buildNumber": "20220605.8",
"url": "https://dev.azure.com/*******/_apis/build/Builds/288",
"startTime": "2022-06-05T14:47:01.1846966Z",
"finishTime": "2022-06-05T14:47:16.7602096Z",
"reason": "manual",
"status": "succeeded",
"drop": {},
"log": {},
"sourceGetVersion": "LG:refs/heads/main:********",
"lastChangedBy": {
"displayName": "Microsoft.VisualStudio.Services.TFS",
"id": "00000000-0000-0000-0000-000000000000",
"uniqueName": "***************"
},
"retainIndefinitely": false,
"definition": {
"definitionType": "xaml",
"id": 20,
"name": "getReleaseFile",
"url": "https://dev.azure.com/************/_apis/build/Definitions/20"
},
"requests": [
{
"id": 288,
"url": "https://dev.azure.com/B*****/**********/_apis/build/Requests/288",
"requestedFor": {
"displayName": "B*****.sag",
"id": "*******",
"uniqueName": "B**********"
}
}
]
},
"resourceVersion": "1.0",
"resourceContainers": {
"collection": {
"id": "*******",
"baseUrl": "https://dev.azure.com/B*****/"
},
"account": {
"id": "******",
"baseUrl": "https://dev.azure.com/B*****/"
},
"project": {
"id": "**********",
"baseUrl": "https://dev.azure.com/B*****/"
}
},
"createdDate": "2022-06-05T14:47:28.6089499Z"
}
Message received
#string3http://schemas.microsoft.com/2003/10/Serialization/�q{"id":"****","eventType":"build.complete","publisherId":"tfs","message":{"text":"Build 20220605.8 succeeded"},"detailedMessage":{"text":"Build 20220605.8 succeeded"},"resource":{"uri":"vstfs:///Build/Build/288","id":288,"buildNumber":"20220605.8","url":"https://dev.azure.com/*****/********/_apis/build/Builds/288","startTime":"2022-06-05T14:47:01.1846966Z","finishTime":"2022-06-05T14:47:16.7602096Z","reason":"manual","status":"succeeded","drop":{},"log":{},"sourceGetVersion":"LG:refs/heads/main:f0b1a1d2bd047454066cf21dc4d4c710bca4e1d7","lastChangedBy":{"displayName":"Microsoft.VisualStudio.Services.TFS","id":"00000000-0000-0000-0000-000000000000","uniqueName":"******"},"retainIndefinitely":false,"definition":{"definitionType":"xaml","id":20,"name":"getReleaseFile","url":"https://dev.azure.com/******/_apis/build/Definitions/20"},"requests":[{"id":288,"url":"https://dev.azure.com/*****/******/_apis/build/Requests/288","requestedFor":{"displayName":"baharul.sag","id":"******","uniqueName":"baharul.*****"}}]},"resourceVersion":"1.0","resourceContainers":{"collection":{"id":"3*****","baseUrl":"https://dev.azure.com/*****/"},"account":{"id":"******","baseUrl":"https://dev.azure.com/*****/"},"project":{"id":"*******","baseUrl":"https://dev.azure.com/*****/"}},"createdDate":"2022-06-05T14:47:28.6089499Z"}
When reading message from service bus in peek mode can see as below where <#string3http://schemas.microsoft.com/2003/10/Serialization/��> is appended to json string
Publish using PublishToAzureServiceBus from Azure pipeline.
Publish from Azure DevOps project webhook

I believe what is happening is that you have two different types of serialisation in the body of the brokered message created by the PublishToAzureServiceBus task. This is because the brokered message only supports binary content.
So the json is initially serialised as a binary string using the data contract serialiser.
How to solve this? Do the following before passing to your json deserialiser - unfortunately the logic app isn't doing this:
byte[] messageContent = brokeredMessage.GetBody<byte[]>();
string messageContentStr = Encoding.UTF8.GetString(messageContent);
I probably wouldn't use a logic app to do the reading of the message because to insert c# like I suggest you're gonna need to call an azure function or similar. I'd create an azure function to read your messages as above.

80 Postgres databases in Azure Data Factory to copy into

I'm using Azure Data factory,
I'm using SQLServer as source and Postgres as target. Goal is to copy 30 tables from SQLServer, with transformation, to 30 tables in Postgres. Hard part is I have 80 databases from and to, all with the exact same layout but different data. Its one database per customer so 80 customers each with their own databases.
Linked Services doesn't allow parameters for Postgres.
I have one dataset per source and target using parameters for schema and table names.
I have one pipeline per table with SQLServer source and Postgres target.
I can parameterize the SQLServer source in linked service but not Postgres
Problem is how can I copy 80 source databases to 80 target databases without adding 80 target linked services and 80 target datasets? Plus I'd have to repeat all 30 pipelines per target database.
BTW I'm only familiar with the UI, however anything else that does the job is acceptable.
Any help would be appreciated.

There is simple way to implement this. Essentially you need to have a single Linked Service, which reads the connection string out of KeyVault. You can then parameterize source and target as keyvault secret names, and easily switch between data sources by just changing the secret name. This relies on all connection related information being enclosed within a single connection string.
I will provide a simple overview for Postgresql, but the same logic applies to MSSQL servers as source.
Implement a Linked Service for Azure Key Vault.
Add a Linked Service for Azure Postgresql that uses Key Vault to store access url in format: Server=your_server_name.postgres.database.azure.com;Database=your_database_name;Port=5432;UID=your_user_name;Password=your_password;SSL Mode=Require;Keepalive=600; (advise to use server name as secret name)
Pass this parameter, which is essentially correct secret name, in the Pipeline (you can also implement a loop that would accept immediately array of x elements, and parse n elements at a time into separate pipeline)
Linked Service Definition for KeyVault:
{
"name": "your_keyvault_name",
"properties": {
"description": "KeyVault",
"annotations": [],
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "https://your_keyvault_name.vault.azure.net/"
}
}
}
Linked Service Definition for Postgresql:
{ "name": "generic_postgres_service".
"properties": {
"type": "AzurePostgreSql",
"parameters": {
"pg_database": {
"type": "string",
"defaultValue": "your_database_name"
}
},
"annotations": [],
"typeProperties": {
"connectionString": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "KeyVaultName",
"type": "LinkedServiceReference"
},
"secretName": "#linkedService().secret_name_for_server"
}
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
}
Dataset Definition for Postgresql:
{
"name": "your_postgresql_dataset",
"properties": {
"linkedServiceName": {
"referenceName": "generic_postgres_service",
"type": "LinkedServiceReference",
"parameters": {
"secret_name_for_server": {
"value": "#dataset().secret_name_for_server",
"type": "Expression"
}
}
},
"parameters": {
"secret_name_for_server": {
"type": "string"
}
},
"annotations": [],
"type": "AzurePostgreSqlTable",
"schema": [],
"typeProperties": {
"schema": {
"value": "#dataset().schema_name",
"type": "Expression"
},
"table": {
"value": "#dataset().table_name",
"type": "Expression"
}
}
}
}
Pipeline Definition for Postgresql:
{
"name": "your_postgres_pipeline",
"properties": {
"activities": [
{
"name": "Copy_Activity_1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
...
... i skipped definition
...
"inputs": [
{
"referenceName": "your_postgresql_dataset",
"type": "DatasetReference",
"parameters": {
"secret_name_for_server": "secret_name"
}
}
]
}
],
"annotations": []
}
}

Transform / specify table name in jdbc sync connector

I need to migrate a SQL Server database from on prem location to GoogleCloud
Using confluent/kafka to do it
I do have a source debezium connector
{
"name": "mssql_src",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
...
...
...
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "op,table,source.ts_ms",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms": "Reroute",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex": "source_dbname.dbo(.*)",
"transforms.Reroute.topic.replacement": "target_dbname$1"
}
}
reroute transformation does not work with conjunction of unwrap,
I still getting source_dbname.dbo.* topics instead of target_dbname.*
I need insert data into target_dbname database
jdbc sync connector has following configuration
{
"name": "mssql_trg",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics.regex": "source_dbname.dbo.*",
"table.name.format": "${topic}",
"connection.url": "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433;DatabaseName=rocketlawyer3",
"connection.user": "sqlserver",
"connection.password": "sqlserver",
"dialect.name": "SqlServerDatabaseDialect",
"insert.mode": "upsert",
"auto.create": true,
"auto.evolve": true,
"pk.mode": "record_value"
}
}
Obviously it is failed because all SQL operations are referring tables as source_database_name.dbo.table_name
Here is 2 questions:
How can I change this string source_database_name.dbo.table_name just for table_name using table.name.format
or other option
How can I make both transformation (reroute and unwrap) work in source connector

You need to use chained transformations. Just combine unwrap and Rerout SMTs together:
{
"name": "mssql_src",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
...
...
...
"transforms": "unwrap, Reroute",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "op,table,source.ts_ms",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.Reroute.topic.replacement": "$3"
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to get the table-name and database-name in the CDC event received from debezium kafka connect - sql-server

This is planned as a part of https://issues.jboss.org/browse/DBZ-875 issue

Related

Using Volttron aggregation agent

Solr 9 – Error "No suggester named suggest was configured"

Reading message from Service bus

80 Postgres databases in Azure Data Factory to copy into

Transform / specify table name in jdbc sync connector

Categories

Resources