MSSQL doesn't work with incremeting or timestamp source connector - sql-server

I'm trying to create source connector for my SQL Server tables, here's an example:
{
"name": "test",
"config": {
"tasks.max":"1",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:sqlserver://REDACTED;databaseName=REDACTED",
"connection.user": "REDACTED",
"connection.password": "REDACTED",
"topic.prefix": "test",
"poll.interval.ms" : 2000,
"table.whitelist": "dbo.Client",
"mode":"incrementing",
"incrementing.column.name": "Id"
}
}
I'm getting following error, which doesn't say anything...
java.lang.IllegalArgumentException: Number of groups must be positive.
at org.apache.kafka.connect.util.ConnectorUtils.groupPartitions(ConnectorUtils.java:41)
at io.confluent.connect.jdbc.JdbcSourceConnector.taskConfigs(JdbcSourceConnector.java:148)
at org.apache.kafka.connect.runtime.Worker.connectorTaskConfigs(Worker.java:305)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.reconfigureConnector(DistributedHerder.java:997)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.reconfigureConnectorTasksWithRetry(DistributedHerder.java:950)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$900(DistributedHerder.java:110)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$17$1.call(DistributedHerder.java:963)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$17$1.call(DistributedHerder.java:960)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:270)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The same happens when I use mode timestamp with timestamp.column.name (column Modified with type datetimeoffset).
When I use mode bulk with a custom query, it works fine.
Any ideas?

Related

Docker, Debezium not streaming data from mssql to elasticsearch

I followed this examples to stream data from mysql to elasticsearch
https://github.com/debezium/debezium-examples/tree/master/unwrap-smt#elasticsearch-sink
The example itself works great on my local machine.
But in my case I want to stream data from mssql (which is on another server, not docker) to elasticsearch.
So in the "docker-compose-es.yaml" file i removed "mysql" part and removed the mysql links.
And created my own connectors/sink for elastic and mssql:
{
"name": "Test-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname": "192.168.1.234",
"database.port": "1433",
"database.user": "user",
"database.password": "pass",
"database.dbname": "Test",
"database.server.name": "MyServer",
"table.include.list": "dbo.TEST_A",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.testA"
}
}
{
"name": "elastic-sink-test",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "TEST_A",
"connection.url": "http://localhost:9200/",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "SQ",
"key.ignore": "false",
"type.name": "TEST_A",
"behavior.on.null.values": "delete"
}
}
When adding these the kafka connect I/O is working hard and has over 40GB input see image below:
In the kafka logs it looks like its going through all the tables. Here is one of the table logs:
2021-06-17 10:20:10,414 - INFO [data-plane-kafka-request-handler-5:Logging#66] - [Partition MyServer.dbo.TemplateGroup-0 broker=1] Log loaded for partition MyServer.dbo.TemplateGroup-0 with initial high watermark 0
2021-06-17 10:20:10,509 - INFO [data-plane-kafka-request-handler-3:Logging#66] - Creating topic MyServer.dbo.TemplateMeter with configuration {} and initial partition assignment Map(0 -> ArrayBuffer(1))
2021-06-17 10:20:10,516 - INFO [data-plane-kafka-request-handler-3:Logging#66] - [KafkaApi-1] Auto creation of topic MyServer.dbo.TemplateMeter with 1 partitions and replication factor 1 is successful
2021-06-17 10:20:10,526 - INFO [data-plane-kafka-request-handler-7:Logging#66] - [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(MyServer.dbo.TemplateMeter-0)
2021-06-17 10:20:10,528 - INFO [data-plane-kafka-request-handler-7:Logging#66] - [Log partition=MyServer.dbo.TemplateMeter-0, dir=/kafka/data/1] Loading producer state till offset 0 with message format version 2
The database is only 2GB. I'm not sure why it has so high input.
No test_a index was created in elasticsearch when running this command:
curl http://localhost:9200/_aliases?pretty=true
Does anyone know how I troubleshoot from here or point me to the right direction?
Thanks in advance!
how I troubleshoot from here
docker compose logs?
Modify the log4j.properties of Kafka Connect and/or Elasitcsearch processes to get more logs?
Use a regular Kafka consumer to see if data is actually read into the TEST_A topic?
in the "docker-compose-es.yaml" ....
If Debezium is running in a container, then Elasticsearch is not available at localhost:9200
Change that value to http://elastic:9200, like shown in the es-sink.json

TimerException in Flink Process

We are running a Flink job using several operators including map, windowing, flatMap() and the job fails with the following error - just wondering what causes this error:
2021-05-27 07:00:07,023 WARN org.apache.flink.runtime.taskmanager.Task [] - Collect Inventory (1/1)#0 (34d81cf2e59f350886f93a1e0f734d38) switched from RUNNING to FAILED with failure cause: org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught exception while processing timer.
at org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:1282)
at org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:1258)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1397)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$16(StreamTask.java:1386)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:344)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:330)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:661)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:776)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.lang.Thread.run(Thread.java:748)
Caused by: TimerException{java.lang.RuntimeException: Assigned key must not be null!}
... 12 more
Caused by: java.lang.RuntimeException: Assigned key must not be null!
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:109)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:93)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:44)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50)
at org.apache.flink.streaming.api.functions.windowing.PassThroughWindowFunction.apply(PassThroughWindowFunction.java:35)
at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalSingleValueWindowFunction.process(InternalSingleValueWindowFunction.java:48)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:577)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:533)
at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:284)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1395)
... 11 more
Caused by: java.lang.NullPointerException: Assigned key must not be null!
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76)
at org.apache.flink.runtime.state.KeyGroupRangeAssignment.assignKeyToParallelOperator(KeyGroupRangeAssignment.java:51)
at org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannel(KeyGroupStreamPartitioner.java:63)
at org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannel(KeyGroupStreamPartitioner.java:35)
at org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.emit(ChannelSelectorRecordWriter.java:54)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 22 more
2021-05-27 07:00:07,023 INFO org.apache.flink.runtime.taskmanager.Task [] - Triggering cancellation of task code Collect Inventory (1/1)#0 (34d81cf2e59f350886f93a1e0f734d38).
{eventTime:2021-05-27T07:00:05.878Z, batchId:4a0c09ad-1e28-4f74-a19a-8e1422c5bc5a, serialDetail:null, inventoryCount:{"location": "tmobile01", "sku": "190198496225", "state": "LOST", "inventoryStatus": "Adjusted-Out", "inventoryType": "ACC", "sellableFlag": false, "version": 1, "updateTimestamp": 2021-05-27T07:00:07.118Z, "globalQuantity": -1, "localQuantity": -1, "eventId": null, "movementSource": "", "locationType": "Store"
It seems that You are using keyBy operation and the extracted key is null. You can't really have null key in Flink.

AWS Kinesis Data Analytics

I am following this blog https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-python-sliding.html for creating the sample application. Once I start the application I get below error:
{
"applicationVersionId": 2,
"message": "org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute application.\n\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$4(JarRunOverrideHandler.java:207)\n\tat java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)\n\tat java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: java.util.concurrent.CompletionException: org.apache.flink.client.program.ProgramAbortException\n\tat java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)\n\tat java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)\n\t... 6 more\nCaused by: org.apache.flink.client.program.ProgramAbortException\n\tat org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:111)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:294)\n\tat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:201)\n\tat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:149)\n\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:78)\n\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:67)\n\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:203)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)\n\t... 6 more\n",
"messageType": "ERROR",
"messageSchemaVersion": "1",
"errorCode": "CodeError.InvalidApplicationCode"
}
Anyone encountered such error, please help. Thanks in advance

Enabling LTR in SOLRCloud (solr version8.2)

I am trying to work with solr-cloud and I have to use learning to rank models and features for my project. But I am facing this issue of SolrCore Initialization Failures
techproducts_shard1_replica_n2: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Failed to create new ManagedResource /schema/model-store of type org.apache.solr.ltr.store.rest.ManagedModelStore due to: org.apache.solr.common.SolrException: org.apache.solr.ltr.model.ModelException: Model type does not exist org.apache.solr.ltr.model.LinearModel
techproducts_shard2_replica_n6: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Failed to create new ManagedResource /schema/model-store of type org.apache.solr.ltr.store.rest.ManagedModelStore due to: org.apache.solr.common.SolrException: org.apache.solr.ltr.model.ModelException: Model type does not exist org.apache.solr.ltr.model.LinearModel
Please check your logs for more information
These are the solr logs:-
**2019-12-04 12:44:05.760 ERROR (searcherExecutor-15-thread-1-processing-n:192.168.137.1:8983_solr x:techproducts_shard1_replica_n2 c:techproducts s:shard1 r:core_node5) [c:techproducts s:shard1 r:core_node5 x:techproducts_shard1_replica_n2] o.a.s.h.RequestHandlerBase java.lang.NullPointerException
at org.apache.solr.handler.component.SearchHandler.initComponents(SearchHandler.java:183)
at org.apache.solr.handler.component.SearchHandler.getComponents(SearchHandler.java:203)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:260)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:74)
at org.apache.solr.core.SolrCore.lambda$getSearcher$18(SolrCore.java:2344)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-12-04 12:44:05.760 ERROR (searcherExecutor-14-thread-1-processing-n:192.168.137.1:8983_solr x:techproducts_shard2_replica_n6 c:techproducts s:shard2 r:core_node8) [c:techproducts s:shard2 r:core_node8 x:techproducts_shard2_replica_n6] o.a.s.h.RequestHandlerBase java.lang.NullPointerException
at org.apache.solr.handler.component.SearchHandler.initComponents(SearchHandler.java:183)
at org.apache.solr.handler.component.SearchHandler.getComponents(SearchHandler.java:203)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:260)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:74)
at org.apache.solr.core.SolrCore.lambda$getSearcher$18(SolrCore.java:2344)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Match the compatibility of Solr version and zookeeper....
For Solr 8.2.0 use zookepeer 3.5.6

How to add a new query listener via the SolR Config API?

I'm trying to add new query listener with the SolR Config API.
To do that, I POST the following JSON to the Config API endpoint of my SolR core :
{
"add-listener": {
"name": "listener1",
"class": "solr.QuerySenderListener",
"event": "newSearcher",
"queries": [{
"q": "any:whatever",
"rows": "10",
"start": "0"
}
]
}
}
This action populates the "configoverlay.json" file which overrides the standard solrconfig.xml. This file now looks like the following :
{
"listener": {
"listener1": {
"name": "listener1",
"class": "solr.QuerySenderListener",
"event": "newSearcher",
"queries": [{
"q": "any:whatever",
"rows": "10",
"start": "0"
}
]
}
}
}
For some reason, this structure doesn't seem to be OK for SolR. I get the following error in SolR logs :
null:java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to org.apache.solr.common.util.NamedList
at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:47)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1835)
at java.util.concurrent.FutureTask.run(Unknown Source)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
And of course, the query associated to this event is therefore not executed as intended.
Apparently, the query format doesn't fit what SolR expects... I didn't find any example of updating or creating events in the SolR Config API documentation.
The problem must be related to the syntax of the parameters of the queries objects.
If you know how to format the json message to produce a proper listener, let me know !
In case it matters, I'm using SolR 5.3.1
p.s : The point of doing this via config API instead of solrconfig.xml is to be able to change the warming queries dynamically (to reflect configuration changes in our product which uses SolR for searching purposes).
Guillaume

Resources