Kafka Connect CDC to MS SQL sourceOffset exception - sql-server

We are using Confluent MS SQL CDC connector and the connection descriptor is :
curl -X POST -H \
"Content-Type: application/json" --data '{
"name" : "yury-mssql-cdc1",
"config" : {
"connector.class" : "io.confluent.connect.cdc.mssql.MsSqlSourceConnector",
"tasks.max" : "1",
"initial.database" : "test2",
"username" : "user",
"password" : "pass",
"server.name" : "some-server.eu-west-1.rds.amazonaws.com",
"server.port" : "1433",
"change.tracking.tables" : "dbo.foobar"
}
}' \
http://ip-10-0-0-24.eu-west-1.compute.internal:8083/connectors
the whole infrastructure is deployed at AWS... and the exception is :
ERROR Exception thrown while querying for ChangeKey
{databaseName=test2, schemaName=dbo, tableName=foobar}
(io.confluent.connect.cdc.mssql.QueryService:94)
java.lang.NullPointerException: sourceOffset cannot be null.
any help would be greatly appreciated.

I found the answer, I think the problem is the way SQL server CDC is configured. We should not use the old way of setting CDC ( EXEC sys.sp_cdc_enable_db and EXEC sys.sp_cdc_enable_table )
Instead, use the following command to configure SQL server CDC
ALTER DATABASE [db name] SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
GO
ALTER DATABASE [db name] SET ALLOW_SNAPSHOT_ISOLATION ON
GO
ALTER TABLE [talbe name ] ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON)
GO

Related

How to deploy PostgreSQL schema to Google Cloud SQL database using Pulumi?

I'm trying to initialize a managed PostgreSQL database using Pulumi. The PostgreSQL server itself is hosted and managed by Google Cloud SQL, but I set it up using Pulumi.
I can successfully create the database, but I'm stumped how to actually initialize it with my schemas, users, tables, etc. Does anyone know how to achieve this?
I believe I need to use the Postgres provider, similar to what they do for MySQL in this tutorial or this example. The below code shows what I have so far:
# Create database resource on Google Cloud
instance = sql.DatabaseInstance( # This works
"db-instance",
name="db-instance",
database_version="POSTGRES_12",
region="europe-west4",
project=project,
settings=sql.DatabaseInstanceSettingsArgs(
tier="db-g1-small", # Or: db-n1-standard-4
activation_policy="ALWAYS",
availability_type="REGIONAL",
backup_configuration={
"enabled": True,
}
),
deletion_protection=False,
)
database = sql.Database( # This works as well
"db",
name="db",
instance=instance.name,
project=project,
charset="UTF-8",
)
# The below should create a table such as
# CREATE TABLE users (id uuid, email varchar(255), api_key varchar(255);
# How to tell it to use this SQL script?
# How to connect it to the above created PostgreSQL resource?
postgres = pg.Database( # This doesn't work
f"users",
name="users",
is_template=False,
)
Here is sample code with an explanation on how we set everything up including create/delete table with Pulumi.
The code will look like this:
# Postgres https://www.pulumi.com/docs/reference/pkg/postgresql/
# provider: https://www.pulumi.com/docs/reference/pkg/postgresql/provider/
postgres_provider = postgres.Provider("postgres-provider",
host=myinstance.public_ip_address,
username=users.name,
password=users.password,
port=5432,
superuser=True)
# creates a database on the instance in google cloud with the provider we created
mydatabase = postgres.Database("pulumi-votes-database",
encoding="UTF8",
opts=pulumi.ResourceOptions(provider=postgres_provider)
)
# Table creation/deletion is via pg8000 https://github.com/tlocke/pg8000
def tablecreation(mytable_name):
print("tablecreation with:", mytable_name)
create_first_part = "CREATE TABLE IF NOT EXISTS"
create_sql_querty = "(id serial PRIMARY KEY, email VARCHAR ( 255 ) UNIQUE NOT NULL, api_key VARCHAR ( 255 ) NOT NULL)"
create_combined = f'{create_first_part} {mytable_name}{create_sql_querty}'
print("tablecreation create_combined_sql:", create_combined)
myconnection=pg8000.native.Connection(
host=postgres_sql_instance_public_ip_address,
port=5432,
user=postgres_sql_user_username,
password=postgres_sql_user_password,
database=postgres_sql_database_name
)
print("tablecreation starting")
cursor=myconnection.run(create_combined)
print("Table Created:", mytable_name)
selectversion = 'SELECT version();'
cursor2=myconnection.run(selectversion)
print("SELECT Version:", cursor2)
def droptable(table_to_drop):
first_part_of_drop= "DROP TABLE IF EXISTS "
last_part_of_drop= ' CASCADE'
combinedstring = f'{first_part_of_drop} {table_to_drop} {last_part_of_drop}'
conn=pg8000.native.Connection(
host=postgres_sql_instance_public_ip_address,
port=5432,
user=postgres_sql_user_username,
password=postgres_sql_user_password,
database=postgres_sql_database_name
)
print("droptable delete_combined_sql ", combinedstring)
cursor=conn.run(combinedstring)
print("droptable completed ", cursor)
After the 1st time of bringing the infrastructure up via pulumi up -y, you can uncomment the following code block in __main__.py and then add the configs for the postgressql server via cli and then run pulumi up -y
create_table1 = "votertable"
creating_table = tablecreation(create_table1)
print("")
create_table2 = "regionals"
creating_table = tablecreation(create_table2)
print("")
drop_table = "table2"
deleting_table = droptable(drop_table)
The settings for the table are in the Pulumi.dev.yaml file and are set via pulumi config set

Confluent MSSQL CDC Connector not fetching changes

If anyone had played with Confluent MSSQL CDC Connector (https://docs.confluent.io/current/connect/kafka-connect-cdc-mssql/index.html)
I tried setting up this connector, downloading the jar and setting up config files as mentioned in docs. Running it is actually not throwing any error but it NOT able to fetch any changes from the SQL Server. Below is my config:
{
"name" : "mssql_cdc_test",
"connector.class" : "io.confluent.connect.cdc.mssql.MsSqlSourceConnector",
"tasks.max" : "1",
"initial.database" : "DBASandbox",
"username" : "xxx",
"password" : "xxx",
"server.name" : "rptdevdb01111.homeaway.live",
"server.port" : "1433",
"change.tracking.tables" : "dbo.emp"
}
This is the message I am getting in the logs (at INFO level) :
INFO Source task WorkerSourceTask{id=mssql_cdc_test-0} finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask:143)
Strange is even if I change the server.name to some junk value, it doesn’t bother and no errors. So, probably its NOT even trying to hit my sql server.
I did also enable change tracking on Database as well specified Table:
ALTER DATABASE DBASandbox
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
ALTER DATABASE DBASandbox
SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER TABLE dbo.emp
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
Not sure whats wrong and how to debug it further. Any clue or insight will be helpful.

[Sybase][ODBC Driver][Adaptive Server Enterprise]SET CHAINED command not allowed within multi-statement transaction.\n (226) (SQLExecDirectW);

Using the robot-framework to connect with Sybase DB
Then DELETE row, UPDATE row in a TABLE.
When the below query is executed in robot framework, it works fine.
Sybase DB Connection - Delete and Update for a single pass
connect To Database Using Custom Params pyodbc "Driver={Adaptive Server Enterprise}; server=<myserver>; port=<myport>;db=<mydb>;uid=<myuser>; pwd=<mypasswd>;"
# Run Select Query
#{selectQuery} Query select * from TABLE where FIELD1 = '1000'
Log Many #{selectQuery}
Log "Selected Query Executed"
# Run Delete Query
#{DeleteQuery} Execute Sql String set chained off ; Delete from TABLE where FIELD1 = '1000' AND FIELD2 = 'VALUE2' AND FIELD3 = 'VALUE3'
Log Many #{DeleteQuery}
Log "Delete Query Executed"
#Run Update Query
#{updateQuery} Execute Sql String set chained off ; UPDATE TABLE SET FIELD2 = 'VALUE2' where FIELD1 = '1001'
Log Many #{updateQuery}
Log "Update Query Executed"
Disconnect From Database
Whereas when the for loop is used as below :
Sybase DB Connection - Delete with for loop for mutliple passes
connect To Database Using Custom Params pyodbc "Driver={Adaptive Server Enterprise}; server=<myserver>; port=<myport>;db=<mydb>;uid=<myuser>; pwd=<mypasswd>;"
#Run DELETE Query
:FOR ${num} IN RANGE 100
\ Execute Sql String set chained off ; Delete from TABLE where FIELD1 = ${num} and FIELD2= "${VALUE2[${num}]}" and FIELD3 = "${VALUE3[${num}]}"
\ sleep 1
It fails with the below error :
[Sybase][ODBC Driver][Adaptive Server Enterprise]SET CHAINED command not allowed within multi-statement transaction.\n (226) (SQLExecDirectW);
[Sybase][ODBC Driver][Adaptive Server Enterprise]Stored procedure 'abc_sp' may be run only in unchained transaction mode. The 'SET CHAINED OFF' command will cause the current session to use unchained transaction mode.\n (7713)")
Used the commit
The below for loop worked fine.
connect To Database Using Custom Params pyodbc "Driver={Adaptive Server Enterprise}; server=<myserver>; port=<myport>;db=<mydb>;uid=<myuser>; pwd=<mypasswd>;"
#Run DELETE Query
:FOR ${num} IN RANGE 100
\ Execute Sql String commit
\ Execute Sql String set chained off ; Delete from TABLE where FIELD1 = ${num} and FIELD2= "${VALUE2[${num}]}" and FIELD3 = "${VALUE3[${num}]}"
\ Execute Sql String commit

Windows Powershell script to alter table

I have below script, named as alterTable.ps1. I am trying to alter table by adding two new columns to table.
Here is script
function die {
"Error: $($args[0])"
exit 1
}
function verifySuccess {
if (!$?) {
die "$($args[0])"
}
}
# Create variables and assign environment variables
$INSTANCE_HOST = $env:HOST
$INSTANCE_PORT = $env:PORT
$INSTANCE_NAME = $env:INSTANCE
$DATABASE_NAME = $env:DATABASE_NAME
# Execute the alter table, passing the variables
#sqlcmd -U sa -P sapassword -S "$INSTANCE_HOST\$INSTANCE_NAME,$INSTANCE_PORT" -q ”use $DATABASE_NAME; ALTER TABLE dbo.tabletest ADD Test1 VARCHAR(6) NULL, Test2 VARCHAR(10) NULL”
VerifySuccess "sqlcmd failed to alter table tabletest"
When I execute script with values passing to script, getting error
C:\> .\alterTable.ps1 "WINDOWSHOST" "1433" "MSSQLSERVER" "dbname"
Getting error as below:
HResult 0x57, Level 16, State 1 SQL Server Network Interfaces:
Connection string is not valid [87]. Sqlcmd: Error: Microsoft SQL
Server Native Client 10.0 : A network-related or instance-specific
error has occurred while establishing a connection to SQL Server.
Server is not found or not accessible.
If I hardcode those values in script, then it worked fine.
One more quick thing, how I can exit script after execution and check the exit status?
You are never accessing the arguments that you pass to the script, but rather reading the values from four environment-variables (that you never set?). You should be using parameters with default values set to the values of the environment variables if necessary. Ex:
param(
$INSTANCEHOST = $env:HOST,
$INSTANCEPORT = $env:PORT,
$INSTANCENAME = $env:INSTANCE,
$DATABASENAME = $env:DATABASE_NAME
)
function die {
"Error: $($args[0])"
exit 1
}
function verifySuccess {
if (!$?) {
die "$($args[0])"
}
}
# Execute the alter table, passing the variables
sqlcmd -U sa -P sapassword -S "$INSTANCEHOST\$INSTANCENAME,$INSTANCEPORT" -q ”use $DATABASENAME; ALTER TABLE dbo.tabletest ADD Test1 VARCHAR(6) NULL, Test2 VARCHAR(10) NULL”
VerifySuccess "sqlcmd failed to alter table tabletest"
Use it like this:
C:\> .\alterTable.ps1 -INSTANCEHOST "WINDOWSHOST" -INSTANCEPORT "1433" -INSTANCENAME "MSSQLSERVER" -DATABASENAME "dbname"

Unaccent issue when restoring a Postgres database

I want to restore a particular database under another database name to another server as well. So far, so good.
I used this command :
pg_dump -U postgres -F c -O -b -f maindb.dump maindb
to dump the main database on the production server. The I use this command :
pg_restore --verbose -O -l -d restoredb maindb.dump
to restore the database in another database on our test server. It restore mostly ok, but there are some errors, like :
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3595; 1259 213452 INDEX idx_clientnomclient maindbuser
pg_restore: [archiver (db)] could not execute query: ERROR: function unaccent(text) does not exist
LINE 1: SELECT unaccent(lower($1));
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
QUERY: SELECT unaccent(lower($1));
CONTEXT: SQL function "cyunaccent" during inlining
Command was: CREATE INDEX idx_clientnomclient ON client USING btree (public.cyunaccent((lower((nomclient)::text))::character varying));
cyunaccent is a function that is in the public shcema and does gets created with the restore.
After the restore, I am able to re-create those indexs perfecly with the same sql, without any errors.
I've also tried to restore with the -i option of pg_restore to do a single transaction, but it doesn't help.
What am I doing wrong ?
I just found the problem, and I was able to narrow it down to a simple test-case.
CREATE SCHEMA intranet;
CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;
SET search_path = public, pg_catalog;
CREATE FUNCTION cyunaccent(character varying) RETURNS character varying
LANGUAGE sql IMMUTABLE
AS $_$ SELECT unaccent(lower($1)); $_$;
SET search_path = intranet, pg_catalog;
CREATE TABLE intranet.client (
codeclient character varying(10) NOT NULL,
noclient character varying(7),
nomclient character varying(200) COLLATE pg_catalog."fr_CA"
);
ALTER TABLE ONLY client ADD CONSTRAINT client_pkey PRIMARY KEY (codeclient);
CREATE INDEX idx_clientnomclient ON client USING btree (public.cyunaccent((lower((nomclient)::text))::character varying));
This test case is from a pg_dump done in plain text.
As you can see, the cyunaccent function is created in the public shcema, as it's later used by other tables in other schema.
psql/pg_restore won't re-create the index, as it cannot find the function, despite the fact that the shcema name is specified to reference it. The problem lies in the
SET search_path = intranet, pg_catalog;
call. Changing it to
SET search_path = intranet, public, pg_catalog;
solves the problem. I've submitted a bug report to postgres about this, not yet in the queue.

Resources