Using Apache Kafka sink connector to insert/update data from SQL Server to SQL Server getting following ERROR
java.sql.BatchUpdateException: Cannot insert explicit value for identity column in table 'table_name'
when IDENTITY_INSERT is set to OFF.
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2075)
The Source Configuration
name=jdbc-mssql-prod-5
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.url=jdbc:sqlserver:
connection.user=
connection.password=
topic.prefix= source_topic.
mode=timestamp
table.whitelist=A,B,C
timestamp.column.name=ModifiedDateTime
connection.backoff.ms=60000
connection.attempts=300
validate.non.null= false
# enter timestamp in milliseconds
timestamp.initial= -1
The Sink Configuration
name=mysql-sink-prod-5
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics= sink_topic_a, sink_topic_b
connection.url=jdbc:sqlserver:
connection.user=
connection.password=
insert.mode=upsert
delete.enabled=true
pk.mode=record_key
errors.log.enable= true
errors.log.include.messages=true
In the table, primary key column and identity column are same.
Related
I'm using MariaDB 10.6.8 and have one of master DB and two of slave DBs. Those DBs are set up for replication.
When I excute INSERT or UPDATE query without database selection, replication doesn't seem to work. In other words, master DB's data is changed but slave DB's data is remains intact.
/* no database is selected */
MariaDB [(none)]> show master status \G
*************************** 1. row ***************************
File: maria-bin.000007
Position: 52259873
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.000 sec)
MariaDB [(none)]> UPDATE some_database.some_tables SET some_datetime_column = now() WHERE primary_key_column = 1;
Query OK, 1 row affected (0.002 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [(none)]> show master status \G
*************************** 1. row ***************************
File: maria-bin.000007
Position: 52260068
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.000 sec)
/* only change master database's record even though the replication position is changed */
However, after selecting the database, replication work fine.
/* but, after selecting the database */
MariaDB [(none)]> USE some_database;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [some_database]> UPDATE some_tables SET some_datetime_column = now() WHERE primary_key_column = 1;
Query OK, 1 row affected (0.002 sec)
Rows matched: 1 Changed: 1 Warnings: 0
/* then change master and slave database's record */
Can anyone tell me what could be the cause of this situation?
Regardless of the binary log format (MIXED, STATEMENT, ROW) all DML commands will be written to the binary log file as soon the transaction will be committed.
When using ROW format a TABLE_MAP event will be logged first, which contains a unique ID, the database and table name. The ROW_EVENT (Delete/Insert/Update) refers to one or more table id's to identify the tables used.
The STATEMENT format logs a query event, which contains the default database name, timestamp and the SQL statement. If there is no default database, the statement itself will contain the database name.
Binlog dump example for STATEMENT format (I removed the non relevant parts such as timestamp and user variables from output)
without default database
#230210 4:42:41 server id 1 end_log_pos 474 CRC32 0x1fa4fa55 Query thread_id=5 exec_time=0 error_code=0 xid=0
insert into test.t1 values (1),(2)
/*!*/;
# at 474
#230210 4:42:41 server id 1 end_log_pos 505 CRC32 0xfecc5d48 Xid = 28
COMMIT/*!*/;
# at 505
with default database:
#230210 4:44:35 server id 1 end_log_pos 639 CRC32 0xfc862172 Query thread_id=5 exec_time=0 error_code=0 xid=0
use `test`/*!*/;
insert into t1 values (1),(2)
/*!*/;
# at 639
#230210 4:44:35 server id 1 end_log_pos 670 CRC32 0xca70b57f Xid = 56
COMMIT/*!*/;
If a session doesn't use a default database on the source server, it may not be replicated if a binary log filter was specified on the replica, e.g. replicate_do_db, since the replica doesn't parse the statement but checks if the database name applies to the filter.
To avoid inconsistent data on your replicas I would recommend to use ROW format instead.
I have setup the snowflake - kafka connector. I setup a sample table (kafka_connector_test) in snowflake with 2 fields both are VARCHAR type.
Fields are CUSTOMER_ID and PURCHASE_ID.
Here is my configuration that I created for the connector
curl -X POST \
-H "Content-Type: application/json" \
--data '{
"name":"kafka_connector_test",
"config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"2",
"topics":"kafka-connector-test",
"snowflake.topic2table.map": "kafka-connector-test:kafka_connector_test",
"buffer.count.records":"10000",
"buffer.flush.time":"60",
"buffer.size.bytes":"5000000",
"snowflake.url.name":"XXXXXXXX.snowflakecomputing.com:443",
"snowflake.user.name":"XXXXXXXX",
"snowflake.private.key":"XXXXXXXX",
"snowflake.database.name":"XXXXXXXX",
"snowflake.schema.name":"XXXXXXXX",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter"}}'\
I send data to the topic that I have configured in the connector configuration.
{"CUSTOMER_ID" : "test_id", "PURCHASE_ID" : "purchase_id_test"}
then when I check the kafka-connect server I get the below error:
[SF KAFKA CONNECTOR] Detail: Table doesn't have a compatible schema
Is there something I need to setup in either kafka connect or snowflake that says which parts of the json go into which columns of the table? Not sure how to specify how it parses the json.
I setup a different topic as well and didn't create a table in snowlake. In that I was able to populate this table but the connector makes a table with 2 columns RECORD_METADATA and RECORD_CONTENT. But I don't want to write a scheduled job to parse this I want to directly insert into a queryable table.
Snowflake Kafka connector writes data as json by design. The default columns RECORD_METADATA and RECORD_CONTENT are variant. If you like to query them you can create a view on top the table to achieve your goal and you don't need a scheduled job
So, your table created by connector would be something like
RECORD_METADATA, RECORD_CONTENT
{metadata fields in json}, {"CUSTOMER_ID" : "test_id", "PURCHASE_ID" : "purchase_id_test"}
You can create a view to display your data
create view v1 as
select RECORD_CONTENT:CUSTOMER_ID::text CUSTOMER_ID,
RECORD_CONTENT:PURCHASE_ID::text PURCHASE_ID
Your query will be
select CUSTOMER_ID , PURCHASE_ID from v1
PS. If you like to create your own tables you need to use variant as your data type instead of varchar
Also looks like it's not supported at this time in reference to this github issue
I want to extract all data from my PostgreSQL to SQL Server using Kafka Connect and JDBC sink. I want to rid off from some queries to check whether i can do data stream using insert.mode=insert only.
This is my source config:
name=debezium_pg_connectors
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
plugin.name=pgoutput
database.hostname=XXX.XXX.XXX.XX
database.port=5432
database.user=XXXXXX
database.password=XXXXXX
database.dbname=XXXXX
database.server.name=XXXXX
database.history.kafka.bootstrap.servers=localhost:9092
database.history.kafka.topic=XXXXXX
table.whitelist=XXXXXXX
time.precision.mode=connect
transforms=unwrap
transforms.unwrap.type= io.debezium.transforms.ExtractNewRecordState
This is my sink config:
name=jdbc-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=pj_user
connection.url=<connection>
auto.create=true
auto.evolve=true
insert.mode=insert
pk.mode=record_key
table.name.format=<table>
transforms=unwrap,route
transforms.unwrap.type=io.debezium.transforms.UnwrapFromEnvelope
transforms=route
transforms.route.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.route.regex=([^.]+)\\.([^.]+)\\.([^.]+)
transforms.route.replacement = $3
fields.whitelist=...
In my SQL Server, i have auto-generated column called key with uniqueidentifier data type and as primary key. However, there's failure every time i tried to push my data to sink:
[2020-03-03 12:45:11,487] ERROR WorkerSinkTask{id=jdbc-sink-0} RetriableException from SinkTask: (org.apache.kafka.connect.runtime.WorkerSinkTask:552)
org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.BatchUpdateException: Cannot insert the value NULL into column 'key', table '<table>'; column does not allow nulls. INSERT fails.
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:93)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.sql.SQLException: java.sql.BatchUpdateException: Cannot insert the value NULL into column 'key', table '<table>'; column does not allow nulls. INSERT fails.
... 12 more
If anyone have any ideas to help me, any helps and advice are appreciated. thanks
Make sure that your key column has a default value:
ALTER TABLE [tableName] ADD DEFAULT NEWSEQUENTIALID() FOR key
Using the robot-framework to connect with Sybase DB
Then DELETE row, UPDATE row in a TABLE.
When the below query is executed in robot framework, it works fine.
Sybase DB Connection - Delete and Update for a single pass
connect To Database Using Custom Params pyodbc "Driver={Adaptive Server Enterprise}; server=<myserver>; port=<myport>;db=<mydb>;uid=<myuser>; pwd=<mypasswd>;"
# Run Select Query
#{selectQuery} Query select * from TABLE where FIELD1 = '1000'
Log Many #{selectQuery}
Log "Selected Query Executed"
# Run Delete Query
#{DeleteQuery} Execute Sql String set chained off ; Delete from TABLE where FIELD1 = '1000' AND FIELD2 = 'VALUE2' AND FIELD3 = 'VALUE3'
Log Many #{DeleteQuery}
Log "Delete Query Executed"
#Run Update Query
#{updateQuery} Execute Sql String set chained off ; UPDATE TABLE SET FIELD2 = 'VALUE2' where FIELD1 = '1001'
Log Many #{updateQuery}
Log "Update Query Executed"
Disconnect From Database
Whereas when the for loop is used as below :
Sybase DB Connection - Delete with for loop for mutliple passes
connect To Database Using Custom Params pyodbc "Driver={Adaptive Server Enterprise}; server=<myserver>; port=<myport>;db=<mydb>;uid=<myuser>; pwd=<mypasswd>;"
#Run DELETE Query
:FOR ${num} IN RANGE 100
\ Execute Sql String set chained off ; Delete from TABLE where FIELD1 = ${num} and FIELD2= "${VALUE2[${num}]}" and FIELD3 = "${VALUE3[${num}]}"
\ sleep 1
It fails with the below error :
[Sybase][ODBC Driver][Adaptive Server Enterprise]SET CHAINED command not allowed within multi-statement transaction.\n (226) (SQLExecDirectW);
[Sybase][ODBC Driver][Adaptive Server Enterprise]Stored procedure 'abc_sp' may be run only in unchained transaction mode. The 'SET CHAINED OFF' command will cause the current session to use unchained transaction mode.\n (7713)")
Used the commit
The below for loop worked fine.
connect To Database Using Custom Params pyodbc "Driver={Adaptive Server Enterprise}; server=<myserver>; port=<myport>;db=<mydb>;uid=<myuser>; pwd=<mypasswd>;"
#Run DELETE Query
:FOR ${num} IN RANGE 100
\ Execute Sql String commit
\ Execute Sql String set chained off ; Delete from TABLE where FIELD1 = ${num} and FIELD2= "${VALUE2[${num}]}" and FIELD3 = "${VALUE3[${num}]}"
\ Execute Sql String commit
I am using IBATIS 2.3.4 version.
Database is MS SQL Server.
I am trying to insert record in table T_PROFILE, which has Identity primary column auto generated value.
My IBATIS configuration is
<insert id="insertProfile" parameterClass="profileDO" useGeneratedKeys="true" keyProperty="profileId">
INSERT INTO T_PROFILE (E_ID,PROFILE_NAME,DEFAULT_PROFILE)
VALUES(#eId#, #profileName#, #isDefaultProfile#)
<selectKey resultClass="java.lang.Long" keyProperty="profileId" >
SELECT ##IDENTITY AS profileId
</selectKey>
</insert>
It is giving error as
Caused by: com.ibatis.common.xml.NodeletException: Error parsing XML. Cause: org.xml.sax.SAXParseException: Attribute "useGeneratedKeys" must be declared for element type "settings".
at com.ibatis.common.xml.NodeletParser.parse(NodeletParser.java:62)
at com.ibatis.sqlmap.engine.builder.xml.SqlMapConfigParser.parse(SqlMapConfigParser.java:62)
at com.ibatis.sqlmap.engine.builder.xml.SqlMapConfigParser.parse(SqlMapConfigParser.java:55)
at org.springframework.orm.ibatis.SqlMapClientFactoryBean.buildSqlMapClient(SqlMapClientFactoryBean.java:339)
... 160 more
Caused by: org.xml.sax.SAXParseException: Attribute "useGeneratedKeys" must be declared for element type "settings".
If you have identity column set in database, the below query itself will work. useGeneratedKeys not needed.
<insert id="insertProfile" parameterClass="profileDO" >
INSERT INTO T_PROFILE (E_ID,PROFILE_NAME,DEFAULT_PROFILE)
VALUES(#eId#, #profileName#, #isDefaultProfile#)
<selectKey resultClass="long" keyProperty="profileId" >
SELECT ##IDENTITY AS profileId
</selectKey>
</insert>