I need to migrate a SQL Server database from on prem location to GoogleCloud
Using confluent/kafka to do it
I do have a source debezium connector
{
"name": "mssql_src",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
...
...
...
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "op,table,source.ts_ms",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms": "Reroute",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex": "source_dbname.dbo(.*)",
"transforms.Reroute.topic.replacement": "target_dbname$1"
}
}
reroute transformation does not work with conjunction of unwrap,
I still getting source_dbname.dbo.* topics instead of target_dbname.*
I need insert data into target_dbname database
jdbc sync connector has following configuration
{
"name": "mssql_trg",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics.regex": "source_dbname.dbo.*",
"table.name.format": "${topic}",
"connection.url": "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433;DatabaseName=rocketlawyer3",
"connection.user": "sqlserver",
"connection.password": "sqlserver",
"dialect.name": "SqlServerDatabaseDialect",
"insert.mode": "upsert",
"auto.create": true,
"auto.evolve": true,
"pk.mode": "record_value"
}
}
Obviously it is failed because all SQL operations are referring tables as source_database_name.dbo.table_name
Here is 2 questions:
How can I change this string source_database_name.dbo.table_name just for table_name using table.name.format
or other option
How can I make both transformation (reroute and unwrap) work in source connector
You need to use chained transformations. Just combine unwrap and Rerout SMTs together:
{
"name": "mssql_src",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
...
...
...
"transforms": "unwrap, Reroute",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "op,table,source.ts_ms",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.Reroute.topic.replacement": "$3"
}
}
Related
I'm trying to get the aggregate agent to work with timescale db.
https://volttron.readthedocs.io/en/main/developing-volttron/developing-agents/specifications/aggregate.html
I have a file aggregation.config
{
"connection": {
"type": "postgresql",
"params": {
"dbname": "volttrondb",
"host": "127.0.0.1",
"port": 5432,
"user": "user",
"password": "password",
"timescale_dialect": true
}
},
"aggregations":[
# list of aggregation groups each with unique aggregation_period and
# list of points that needs to be collected
{
"aggregation_period": "1h",
"use_calendar_time_periods": true,
"utc_collection_start_time":"2016-03-01T01:15:01.000000",
"points": [
{
"topic_names": ["campus/building/fake/EKG_Cos", "campus/building/fake/EKG_Sin"],
"aggregation_topic_name":"campus/building/fake/avg_of_EKG_Cos_EKG_Sin",
"aggregation_type": "avg",
"min_count": 2
}
]
}
]
}
And run the following command
vctl install services/core/SQLAggregateHistorian/ --tag aggregate-historian -c config/aggregation.config --start
It starts correctly - the vctl status shows it running and there are no errors in the log.
I do not see the point campus/building/fake/avg_of_EKG_Cos_EKG_Sin in the topics table.
Any suggestions?
I'm getting error as below when creating kafka producer:
Invalid Java object for schema type STRING: class java.lang.Long for field: \"BIRTHDATE\"\n\tat.
My script as below:
{
"name": "id_srvc_producer_ODS_EB_CLIENT_202215202",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"decimal.handling.mode": "string",
"transforms": "Cast,encryptvalue,encryptkey",
"transforms.Cast.type": "com.fwd.kafka.connect.smt.Cast$Value",
"transforms.Cast.spec": "CLTDOB:string, CLTDOD:float64, USER:float64, TRDT:float64, TRTM:float64, CAPITAL:float64, SRDATE:float64, DATIME:string, AUD_TIME:string",
"transforms.encryptvalue.type": "com.xxx.kafka.connect.smt.EncryptFieldV2$Value",
"transforms.encryptvalue.encrypt.fields": "BIRTHPLACE,BIRTHDATE,IDENTITYNO,FIRSTNAME,SURNAME,EMAILID,MOBILENUMBER,PHONENUMER1,PHONENUMER2,FAXNUMBER",
"transforms.encryptvalue.encrypt.algorithm": "AES",
"transforms.encryptvalue.encrypt.key": "${securepass:/opt/secret/security.properties:aes-encryption-key.properties/aes_encryption_key}",
"transforms.encryptkey.type": "com.xxx.kafka.connect.smt.EncryptFieldVxxxkey",
"transforms.encryptkey.encrypt.fields": "pk:CLIENTCODE",
"transforms.encryptkey.encrypt.algorithm": "AES"
}
I'm using Azure Data factory,
I'm using SQLServer as source and Postgres as target. Goal is to copy 30 tables from SQLServer, with transformation, to 30 tables in Postgres. Hard part is I have 80 databases from and to, all with the exact same layout but different data. Its one database per customer so 80 customers each with their own databases.
Linked Services doesn't allow parameters for Postgres.
I have one dataset per source and target using parameters for schema and table names.
I have one pipeline per table with SQLServer source and Postgres target.
I can parameterize the SQLServer source in linked service but not Postgres
Problem is how can I copy 80 source databases to 80 target databases without adding 80 target linked services and 80 target datasets? Plus I'd have to repeat all 30 pipelines per target database.
BTW I'm only familiar with the UI, however anything else that does the job is acceptable.
Any help would be appreciated.
There is simple way to implement this. Essentially you need to have a single Linked Service, which reads the connection string out of KeyVault. You can then parameterize source and target as keyvault secret names, and easily switch between data sources by just changing the secret name. This relies on all connection related information being enclosed within a single connection string.
I will provide a simple overview for Postgresql, but the same logic applies to MSSQL servers as source.
Implement a Linked Service for Azure Key Vault.
Add a Linked Service for Azure Postgresql that uses Key Vault to store access url in format: Server=your_server_name.postgres.database.azure.com;Database=your_database_name;Port=5432;UID=your_user_name;Password=your_password;SSL Mode=Require;Keepalive=600; (advise to use server name as secret name)
Pass this parameter, which is essentially correct secret name, in the Pipeline (you can also implement a loop that would accept immediately array of x elements, and parse n elements at a time into separate pipeline)
Linked Service Definition for KeyVault:
{
"name": "your_keyvault_name",
"properties": {
"description": "KeyVault",
"annotations": [],
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "https://your_keyvault_name.vault.azure.net/"
}
}
}
Linked Service Definition for Postgresql:
{ "name": "generic_postgres_service".
"properties": {
"type": "AzurePostgreSql",
"parameters": {
"pg_database": {
"type": "string",
"defaultValue": "your_database_name"
}
},
"annotations": [],
"typeProperties": {
"connectionString": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "KeyVaultName",
"type": "LinkedServiceReference"
},
"secretName": "#linkedService().secret_name_for_server"
}
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
}
Dataset Definition for Postgresql:
{
"name": "your_postgresql_dataset",
"properties": {
"linkedServiceName": {
"referenceName": "generic_postgres_service",
"type": "LinkedServiceReference",
"parameters": {
"secret_name_for_server": {
"value": "#dataset().secret_name_for_server",
"type": "Expression"
}
}
},
"parameters": {
"secret_name_for_server": {
"type": "string"
}
},
"annotations": [],
"type": "AzurePostgreSqlTable",
"schema": [],
"typeProperties": {
"schema": {
"value": "#dataset().schema_name",
"type": "Expression"
},
"table": {
"value": "#dataset().table_name",
"type": "Expression"
}
}
}
}
Pipeline Definition for Postgresql:
{
"name": "your_postgres_pipeline",
"properties": {
"activities": [
{
"name": "Copy_Activity_1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
...
... i skipped definition
...
"inputs": [
{
"referenceName": "your_postgresql_dataset",
"type": "DatasetReference",
"parameters": {
"secret_name_for_server": "secret_name"
}
}
]
}
],
"annotations": []
}
}
I have an Azure Logic App with a SQL Server connector through a On-Premise Data Gateway, the connection is made using SQL Server Authentication. It works fine from the Logic App Designer.
No details about the connection are stored in the ARM template of the SQL Server connection, so if I want to automate the deployment of the Logic App, I need to add some values to the ARM template. The documentation for this is really poor, even though I was able to write this template:
{
"type": "MICROSOFT.WEB/CONNECTIONS",
"apiVersion": "2018-07-01-preview",
"name": "[parameters('sql_2_Connection_Name')]",
"location": "[parameters('logicAppLocation')]",
"properties": {
"api": {
"id": "[concat(subscription().id, '/providers/Microsoft.Web/locations/', parameters('logicAppLocation'), '/managedApis/', 'sql')]"
},
"displayName": "[parameters('sql_2_Connection_DisplayName')]",
"parameterValues": {
"server": "[parameters('sql_2_server')]",
"database": "[parameters('sql_2_database')]",
"username": "[parameters('sql_2_username')]",
"password": "[parameters('sql_2_password')]",
"authType": "[parameters('sql_2_authtype')]",
"sqlConnectionString": "[parameters('sql_2_sqlConnectionString')]",
"gateway": {
"id": "[concat('subscriptions/', subscription().subscriptionId, '/resourceGroups/', parameters('dataGatewayResourceGroup'), '/providers/Microsoft.Web/connectionGateways/', parameters('dataGatewayName'))]"
}
}
}
}
But I can't find the correct value for the authType property corresponding to "SQL Server Authentication". The values windows and basic are accepted, but couldn't find the value for "SQL Server Authentication".
Can someone please tell me what's the value for the authType property corresponding to "SQL Server Authentication"?
Use following properties json inside your web api connection
"properties": {
"api": {
"id": "/subscriptions/<YourSubscriptionIDHere>/providers/Microsoft.Web/locations/australiaeast/managedApis/sql"
},
"parameterValueSet": {
"name": "sqlAuthentication",
"values": {
"server": {
"value": "SampleServer"
},
"database": {
"value": "WideWorldImporters"
},
"username": {
"value": "sampleuser"
},
"password": {
"value": "somepasssword"
},
"gateway": {
"value": {
"id": "/subscriptions/<subscriptionIDGoesHere>/resourceGroups/az-integration-study-rg/providers/Microsoft.Web/connectionGateways/<NameofTheGatewayHere>"
}
}
}
}
},
"location": "australiaeast"
That should do the trick
Setup: I have CDC enabled on MS SQL Server and the CDC events are fed to Kafka using debezium kafka connect(source). Also more than one table CDC events are routed to a single topic in Kafka.
Question: Since I have more than one table data in the kafka topic, I would like to have the table name and the database name in the CDC data.
I am getting the table name and database name in MySQL CDC but not in MS SQL CDC.
Below is the Debezium source connector for the SQL Server
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "cdc-user_profile-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "<<hostname>>",
"database.port": "<<port>>",
"database.user": "<<username>>",
"database.password": "<<password>>",
"database.server.name": "test",
"database.dbname": "testDb",
"table.whitelist": "schema01.table1,schema01.table2",
"database.history.kafka.bootstrap.servers": "broker:9092",
"database.history.kafka.topic": "digital.user_profile.schema.audit",
"database.history.store.only.monitored.tables.ddl": true,
"include.schema.changes": false,
"event.deserialization.failure.handling.mode": "fail",
"snapshot.mode": "initial_schema_only",
"snapshot.locking.mode": "none",
"transforms":"addStaticField,topicRoute",
"transforms.addStaticField.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addStaticField.static.field":"source_system",
"transforms.addStaticField.static.value":"source_system_1",
"transforms.topicRoute.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.topicRoute.regex":"(.*)",
"transforms.topicRoute.replacement":"digital.user_profile",
"errors.tolerance": "none",
"errors.log.enable": true,
"errors.log.include.messages": true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
}
}'
I am getting the below output (Demo data)
{
"before": {
"profile_id": 147,
"email_address": "test#gmail.com"
},
"after": {
"profile_id": 147,
"email_address": "test_modified#gmail.com"
},
"source": {
"version": "0.9.4.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1556723528917,
"change_lsn": "0007cbe5:0000b98c:0002",
"commit_lsn": "0007cbe5:0000b98c:0003",
"snapshot": false
},
"op": "u",
"ts_ms": 1556748731417,
"source_system": "source_system_1"
}
My requirement is to get as below
{
"before": {
"profile_id": 147,
"email_address": "test#gmail.com"
},
"after": {
"profile_id": 147,
"email_address": "test_modified#gmail.com"
},
"source": {
"version": "0.9.4.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1556723528917,
"change_lsn": "0007cbe5:0000b98c:0002",
"commit_lsn": "0007cbe5:0000b98c:0003",
"snapshot": false,
"db": "testDb",
"table": "table1/table2"
},
"op": "u",
"ts_ms": 1556748731417,
"source_system": "source_system_1"
}
This is planned as a part of https://issues.jboss.org/browse/DBZ-875 issue
Debezium Kafka-Connect generally puts data from each table in a separate topic and the topic name is of the format hostname.database.table. We generally use the topic name to distinguish between the source table & database name.
If you are putting the data from all the tables manually into one topic then you might have to add the table and database name manually as well.