Elasticsearch ver 1.4.2 polling configuration is not working. why? - sql-server

Hi' I am trying to poll table data every 5 sec but it does not work
POST /_river/mytest_river/_meta
{
"type":"jdbc",
"jdbc":
{
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://[my_ip];databaseName=mega",
"user":"sa","password":"******",
"sql":"SELECT [OrderID],[CustomerName],[UserFullName],[Status] FROM [Orders_Table]",
"poll":"5s",
"index": "mega",
"type": "orders_table"
}
}
What is wrong with my configuration?

You need to use "schedule" parameter as described here:
Time scheduled execution of JDBC river

Related

org.apache.flink.runtime.checkpoint.CheckpointException: Some tasks of the job have already finished

I want to stop a flink task by rest api, and I send request: http://192.168.215.165:8081/jobs/c952ba860604a2c32a7abb9eb5b42b0d/stop ,then I got resoponse :
{
"request-id": "29c559399243c817055ebbaf7431a8d2"
}
And then I send request: http://192.168.215.165:8081/jobs/c952ba860604a2c32a7abb9eb5b42b0d/savepoints/29c559399243c817055ebbaf7431a8d2,
I got the response:(part of)
{
"status": {
"id": "COMPLETED"
},
"operation": {
"failure-cause": {
"class": "java.util.concurrent.CompletionException",
"stack-trace": "java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Some tasks of the job have already finished and checkpointing with finished tasks is not enabled. Failure reason: Not all required tasks are currently running.\n\tat java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)\n\tat java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)\n\tat java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925)\n\tat java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:913)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)\n\tat org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:246)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.
How can I stop a flink job by rest API please ?
A few alternatives:
set execution.checkpointing.checkpoints-after-tasks-finish.enabled: true in your configuration (this is a somewhat experimental feature that was added in 1.14, but it should work)
modify your job so that all of the tasks are still running at the time when the job is ready to be stopped
terminate the job without taking a savepoint

Docker, Debezium not streaming data from mssql to elasticsearch

I followed this examples to stream data from mysql to elasticsearch
https://github.com/debezium/debezium-examples/tree/master/unwrap-smt#elasticsearch-sink
The example itself works great on my local machine.
But in my case I want to stream data from mssql (which is on another server, not docker) to elasticsearch.
So in the "docker-compose-es.yaml" file i removed "mysql" part and removed the mysql links.
And created my own connectors/sink for elastic and mssql:
{
"name": "Test-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname": "192.168.1.234",
"database.port": "1433",
"database.user": "user",
"database.password": "pass",
"database.dbname": "Test",
"database.server.name": "MyServer",
"table.include.list": "dbo.TEST_A",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.testA"
}
}
{
"name": "elastic-sink-test",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "TEST_A",
"connection.url": "http://localhost:9200/",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "SQ",
"key.ignore": "false",
"type.name": "TEST_A",
"behavior.on.null.values": "delete"
}
}
When adding these the kafka connect I/O is working hard and has over 40GB input see image below:
In the kafka logs it looks like its going through all the tables. Here is one of the table logs:
2021-06-17 10:20:10,414 - INFO [data-plane-kafka-request-handler-5:Logging#66] - [Partition MyServer.dbo.TemplateGroup-0 broker=1] Log loaded for partition MyServer.dbo.TemplateGroup-0 with initial high watermark 0
2021-06-17 10:20:10,509 - INFO [data-plane-kafka-request-handler-3:Logging#66] - Creating topic MyServer.dbo.TemplateMeter with configuration {} and initial partition assignment Map(0 -> ArrayBuffer(1))
2021-06-17 10:20:10,516 - INFO [data-plane-kafka-request-handler-3:Logging#66] - [KafkaApi-1] Auto creation of topic MyServer.dbo.TemplateMeter with 1 partitions and replication factor 1 is successful
2021-06-17 10:20:10,526 - INFO [data-plane-kafka-request-handler-7:Logging#66] - [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(MyServer.dbo.TemplateMeter-0)
2021-06-17 10:20:10,528 - INFO [data-plane-kafka-request-handler-7:Logging#66] - [Log partition=MyServer.dbo.TemplateMeter-0, dir=/kafka/data/1] Loading producer state till offset 0 with message format version 2
The database is only 2GB. I'm not sure why it has so high input.
No test_a index was created in elasticsearch when running this command:
curl http://localhost:9200/_aliases?pretty=true
Does anyone know how I troubleshoot from here or point me to the right direction?
Thanks in advance!
how I troubleshoot from here
docker compose logs?
Modify the log4j.properties of Kafka Connect and/or Elasitcsearch processes to get more logs?
Use a regular Kafka consumer to see if data is actually read into the TEST_A topic?
in the "docker-compose-es.yaml" ....
If Debezium is running in a container, then Elasticsearch is not available at localhost:9200
Change that value to http://elastic:9200, like shown in the es-sink.json

How to check google cloud scheduler status?

I have two jobs and I want to execute the second one only when the first one has completed. Both are scheduled on cloud scheduler.
I am trying the get API to check the status of the first job but there is no data under the status field. Please note that I have tried to get this data when my first job was running.
{
"name": "projects/<project name>/locations/us-central1/jobs/<job name>",
"description": "Sample",
"appEngineHttpTarget": {
"httpMethod": "GET",
"appEngineRouting": {
"version": "test-v1",
"host": "test-v1.test.googleplex.com"
},
"relativeUri": "/api/v1/test",
"headers": {
"User-Agent": "AppEngine-Google; (+http://code.google.com/appengine)"
}
},
"userUpdateTime": "2020-07-17T11:44:16Z",
"state": "ENABLED",
"status": {},
"scheduleTime": "2020-07-18T11:00:00.834928Z",
"lastAttemptTime": "2020-07-17T11:44:30.439092Z",
"retryConfig": {
"maxRetryDuration": "0s",
"minBackoffDuration": "5s",
"maxBackoffDuration": "3600s",
"maxDoublings": 16
},
"schedule": "0 04 * * *",
"timeZone": "America/Los_Angeles",
"attemptDeadline": "18000s"
}
Where am I going wrong?
I was checking the documentation on this, and it looks like to get if a job is running you need to use state, as it will return the state of the job, but I could not find in the documentation a description on when the job is done.
It looks like once the job has finished, it will populate the field status, but status will give you a description of the http status from your job (so it will tell you if it failed or if it succeeded and specific information about the job failure or success)
So, in this case you would need to check if the state is ENABLED (this would tell you that the job is not paused or has other state) and if the status is populated (this would mean that the job finished and at the same time you will know if it succeeded or if it failed).

Can't connect to Neptune anymore

I have created a Neptune instance in my AWS and a Load Balancer to access it from my local machine to play around.
I'm basically redirecting all connections on the :80 at my LB to :8182 in my Neptune.
So I can easily query it through the browser. In fact, this is the output for the /status:
// 20191211170323
// http://my-lb/status
{
"status": "healthy",
"startTime": "Mon Dec 09 20:06:21 UTC 2019",
"dbEngineVersion": "1.0.2.1.R2",
"role": "writer",
"gremlin": {
"version": "tinkerpop-3.4.1"
},
"sparql": {
"version": "sparql-1.1"
},
"labMode": {
"ObjectIndex": "disabled",
"Streams": "disabled",
"ReadWriteConflictDetection": "enabled"
}
}
Problem is when I try to connect with it through Gremlin Console or Java code I'm getting the following errors:
gremlin> :remote connect tinkerpop.server conf/remote-neptune.yaml
ERROR org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler - Could not process the response
io.netty.handler.codec.http.websocketx.WebSocketHandshakeException: Invalid handshake response getStatus: 403 Forbidden
at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker13.verify(WebSocketClientHandshaker13.java:226)
at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker.finishHandshake(WebSocketClientHandshaker.java:276)
at org.apache.tinkerpop.gremlin.driver.handler.WebSocketClientHandler.channelRead0(WebSocketClientHandler.java:69)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:617)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:534)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
And my remote-neptune.yaml is as simple as:
hosts: [my-lb]
port: 80
connectionPool: { enableSsl: false}
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
I have updated my AWS credentials although I don't think that's related since I'm accessing it through the LB.
And the weirdest part is that this same scenario was working like a week ago :/
Any ideas?
Thanks!
Looks like the problem has auto resolved, but just sharing a few things to watch out for in case this happens again in the future. If you see connection issues, your first line of operation should be to check if its a network connectivity issue. (You mentioned that you were going to check if something changed with regards to security groups, so do update if that was indeed that case). To check if it indeed is a SG issue - log into your client instance, and do a simple telnet call to the DB endpoint.
telnet <endpoint> <port>
If it responds with "Connected", then you can be sure that your SGs are correct, and now you are dealing with an Application layer problem.
As called out in comments, some of the possible culprits could be:
You previously had a setup without IAM Auth in Neptune (not on ALB) and now you enabled IAM Auth. (Emphasis - I'm referring to IAM Auth on the database, and not some other component in between).
Gremlin client-server mismatches.
Some explicit settings on the ALB that could hinder the requests.
And a few others. To summarize, try to classify if it is a L2/L3 issue or an L7 issue and start investigating based off that.

How to query Index and selector using ektorp Java API

I am using ektorp 1.4.1 Jar to connect to Cloudant database. Now I am able to write map and reduce functions using class EventRepository extends CouchDbRepositorySupport. But my problem here is, how I can query for Index and Selector using ektorp java API? Please help me out here by any one. Thanks in advance.
This is my Query Index :
{
"index": {
"fields": [
{"name": "userName", "type": "string"}
]
},
"type": "text"
}
and here is my selector code for getting all Events from cloudant by User Name Descending order by startDate.
{
"selector": {
"userName": "vekusuma#in.ibm.com"
},
"fields": [
"userName",
"startDate",
"days",
"_id",
"_rev"
],
"sort": [
{
"userName": "desc"
}
]
}
I am using the below code to connect cloudant using Cloudant java API...
CloudantClient client = ClientBuilder.url(new URL("https://userName:password#*****.cloudant.com")).username("*******").
password("*******").build();
List<String> dbsList = client.getAllDbs();
System.out.println("...dbsList size is :: " + dbsList.size());
CloudantClient client = ClientBuilder.account("username").username("username").password("password").
build();
but still same issue...
and even I have tried in different way...
... and I am getting below mentioned error while running on Eclipse Websphere server 7.0 on local...
***********Error*************
[3/9/16 23:53:43:547 IST] 00000031 SystemErr R com.cloudant.client.org.lightcouch.CouchDbException: 400 Bad request: 400 Bad request
Your browser sent an invalid request.
[3/9/16 23:53:43:548 IST] 00000031 SystemErr R at com.cloudant.client.org.lightcouch.CouchDbClient.execute(CouchDbClient.java:501)
Plz help me some thing here... Thanks in advance :)
From my (quick) reading of the ektorp API, it doesn't support Cloudant Query, (the product name for query). However there is a Official Cloudant java library which does support the Cloudant Query endpoints.

Resources