Flink KafkaSink connector with exactly once semantics too many logs - apache-flink

Configuring a KafkaSink from new Kafka connector API (since version 1.15) with DeliveryGuarantee.EXACTLY_ONCE and transactionalId prefix produce an excessive amount of logs each time a new checkpoint is triggered.
Logs are these
2022-11-02 10:04:10,124 INFO org.apache.flink.connector.kafka.sink.FlinkKafkaInternalProducer [] - Flushing new partitions
2022-11-02 10:04:10,125 INFO org.apache.kafka.clients.producer.ProducerConfig [] - ProducerConfig values:
acks = -1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.dns.lookup = use_all_dns_ips
client.id = producer-flink-1-24
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = true
interceptor.classes = []
internal.auto.downgrade.txn.commit = false
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = flink-1-24
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
2022-11-02 10:04:10,131 INFO org.apache.kafka.clients.producer.KafkaProducer [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] Overriding the default enable.idempotence to true since transactional.id is specified.
2022-11-02 10:04:10,161 INFO org.apache.kafka.clients.producer.KafkaProducer [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] Overriding the default enable.idempotence to true since transactional.id is specified.
2022-11-02 10:04:10,161 INFO org.apache.kafka.clients.producer.KafkaProducer [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] Instantiated a transactional producer.
2022-11-02 10:04:10,162 INFO org.apache.kafka.clients.producer.KafkaProducer [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] Overriding the default acks to all since idempotence is enabled.
2022-11-02 10:04:10,159 INFO org.apache.kafka.clients.producer.KafkaProducer [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] Instantiated a transactional producer.
2022-11-02 10:04:10,170 INFO org.apache.kafka.clients.producer.KafkaProducer [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] Overriding the default acks to all since idempotence is enabled.
2022-11-02 10:04:10,181 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka version: 2.8.1
2022-11-02 10:04:10,184 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka commitId: 839b886f9b732b15
2022-11-02 10:04:10,184 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka startTimeMs: 1667379850181
2022-11-02 10:04:10,185 INFO org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] Invoking InitProducerId for the first time in order to acquire a producer ID
2022-11-02 10:04:10,192 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka version: 2.8.1
2022-11-02 10:04:10,192 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka commitId: 839b886f9b732b15
2022-11-02 10:04:10,192 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka startTimeMs: 1667379850192
2022-11-02 10:04:10,209 INFO org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] Invoking InitProducerId for the first time in order to acquire a producer ID
2022-11-02 10:04:10,211 INFO org.apache.kafka.clients.Metadata [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] Cluster ID: MCY5mzM1QWyc1YCvsO8jag
2022-11-02 10:04:10,216 INFO org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] Discovered transaction coordinator ubuntu:9092 (id: 0 rack: null)
2022-11-02 10:04:10,233 INFO org.apache.kafka.clients.Metadata [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] Cluster ID: MCY5mzM1QWyc1YCvsO8jag
2022-11-02 10:04:10,241 INFO org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] Discovered transaction coordinator ubuntu:9092 (id: 0 rack: null)
2022-11-02 10:04:10,345 INFO org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=producer-flink-0-24, transactionalId=flink-0-24] ProducerId set to 51 with epoch 0
2022-11-02 10:04:10,346 INFO org.apache.flink.connector.kafka.sink.KafkaWriter [] - Created new transactional producer flink-0-24
2022-11-02 10:04:10,353 INFO org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=producer-flink-1-24, transactionalId=flink-1-24] ProducerId set to 52 with epoch 0
2022-11-02 10:04:10,354 INFO org.apache.flink.connector.kafka.sink.KafkaWriter [] - Created new transactional producer flink-1-24
ProducerConfig values log is repeated for each new producer created (based on sink parallelism level).
Configuring checkpoint interval to 10 or 15 seconds, I lose valuable job logs.
There is a way to disable these logs without setting WARN level?

Related

Camel reactive streams not completing when subscribed more than once

#Component
class TestRoute(
context: CamelContext,
) : EndpointRouteBuilder() {
val streamName: String = "news-ticker-stream"
val logger = LoggerFactory.getLogger(TestRoute::class.java)
val camel: CamelReactiveStreamsService = CamelReactiveStreams.get(context)
var count = 0L
val subscriber: Subscriber<String> =
camel.streamSubscriber(streamName, String::class.java)
override fun configure() {
from("timer://foo?fixedRate=true&period=30000")
.process {
count++
logger.info("Start emitting data for the $count time")
Flux.fromIterable(
listOf(
"APPLE", "MANGO", "PINEAPPLE"
)
)
.doOnComplete {
logger.info("All the data are emitted from the flux for the $count time")
}
.subscribe(
subscriber
)
}
from(reactiveStreams(streamName))
.to("file:outbox")
}
}
2022-07-07 13:01:44.626 INFO 50988 --- [1 - timer://foo] c.e.reactivecameltutorial.TestRoute : Start emitting data for the 1 time
2022-07-07 13:01:44.640 INFO 50988 --- [1 - timer://foo] c.e.reactivecameltutorial.TestRoute : All the data are emitted from the flux for the 1 time
2022-07-07 13:01:44.646 INFO 50988 --- [1 - timer://foo] a.c.c.r.s.ReactiveStreamsCamelSubscriber : Reactive stream 'news-ticker-stream' completed
2022-07-07 13:02:14.616 INFO 50988 --- [1 - timer://foo] c.e.reactivecameltutorial.TestRoute : Start emitting data for the 2 time
2022-07-07 13:02:44.610 INFO 50988 --- [1 - timer://foo] c.e.reactivecameltutorial.TestRoute : Start emitting data for the 3 time
2022-07-07 13:02:44.611 WARN 50988 --- [1 - timer://foo] a.c.c.r.s.ReactiveStreamsCamelSubscriber : There is another active subscription: cancelled
The reactive stream are not getting completed when running for more than 1 times. So, as you can see in the logs the log message which I have added doOnComplete is only coming for the first time when timer route was triggered. When the timer route is triggered for the second time then there is no completion message. I tried to put the break point in the ReactiveStreamsCamelSubscriber, and found that for the 1st time the flow is going into the onNext() and onComplete() methods but the flow is not going into these method when the timer ran for 2nd time. I am not able to understand why this scenario is playing out?

Debezium SqlServer source not reading data

I have a running Kafka Connect instance and have submitted my connector with the following configuration at the bottom of this post.
Question
The Debezium docs seem to indicate I set database.server.name=connect_test and create topics for each table I want to ingest into kafka.. So for my table, I'd create connect_test-TEST_Test_Table_Object.
I don't get any errors, but no data is ingested into Kafka. I do see some warnings about configs, but Im just trying to get a very basic test up..
Can anyone proivide any insight?
I've also pre-created the following topics:
connect-configs (1 partition)
connect-offsets (3 partitions)
connect-status (3 partitions)
schema_changes-connect_test (3 partitions)
connect_test-TEST_Test_Table_Object (3 partitions)
{
"name": "sql-server-source-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname": "redacted.public.redacted.database.windows.net",
"database.port": "3342",
"database.user": "db_user",
"database.password": "password",
"database.dbname": "TEST_DB",
"database.server.name": "connect_test",
"database.history.kafka.bootstrap.servers": "kafka-url-1:9096,kafka-url-2:9096,kafka-url-3:9096",
"database.history.kafka.topic": "schema_changes-connect_test",
"table.include.list": "TEST_Test_Table_Object",
"database.history.producer.security.protocol": "SSL",
"database.history.producer.ssl.keystore.location": "/app/.keystore.jks",
"database.history.producer.ssl.keystore.password": "password",
"database.history.producer.ssl.truststore.location": "/app/.truststore.jks",
"database.history.producer.ssl.truststore.password": "password",
"database.history.producer.ssl.key.password": "password",
"database.history.consumer.security.protocol": "SSL",
"database.history.consumer.ssl.keystore.location": "/app/.keystore.jks",
"database.history.consumer.ssl.keystore.password": "password",
"database.history.consumer.ssl.truststore.password": "/app/.truststore.jks",
"database.history.consumer.ssl.key.password": "password"
}
}
I keep seeing Failed to construct kafka producer ... caused by: Failed to load SSL keystore /app/.keystore.jks of type JKS ... failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded
I'm using Heroku Kafka and I have three certs: client_cert.pem, client_key.pem, trusted_cert.pem
I use keytool to turn my .pem's into /app/.keystore.jks and /app/.truststore.jks
Logs..
Some WARN redacted for size
2022-06-06T21:11:16.115619+00:00 app[web.3]: [2022-06-06 21:11:16,115] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Cluster ID: some-id (org.apache.kafka.clients.Metadata)
2022-06-06T21:11:16.116945+00:00 app[web.3]: [2022-06-06 21:11:16,116] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Subscribed to partition(s): connect-status-0, connect-status-2, connect-status-1 (org.apache.kafka.clients.consumer.KafkaConsumer)
2022-06-06T21:11:16.117022+00:00 app[web.3]: [2022-06-06 21:11:16,116] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Seeking to EARLIEST offset of partition connect-status-0 (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.117070+00:00 app[web.3]: [2022-06-06 21:11:16,117] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Seeking to EARLIEST offset of partition connect-status-2 (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.117092+00:00 app[web.3]: [2022-06-06 21:11:16,117] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Seeking to EARLIEST offset of partition connect-status-1 (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:15.127922+00:00 app[web.3]: [2022-06-06 21:11:15,124] INFO [Producer clientId=producer-1] Cluster ID: some-id (org.apache.kafka.clients.Metadata)
2022-06-06T21:11:16.441247+00:00 app[web.3]: [2022-06-06 21:11:16,439] INFO [Producer clientId=producer-3] Cluster ID: some-id (org.apache.kafka.clients.Metadata)
2022-06-06T21:11:16.579714+00:00 app[web.3]: [2022-06-06 21:11:16,577] WARN The configuration 'log4j.loggers' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig)
# WARN redacted for size limits
2022-06-06T21:11:16.580291+00:00 app[web.3]: [2022-06-06 21:11:16,580] WARN The configuration 'key.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig)
2022-06-06T21:11:16.580291+00:00 app[web.3]: [2022-06-06 21:11:16,580] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig)
2022-06-06T21:11:16.580315+00:00 app[web.3]: [2022-06-06 21:11:16,580] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig)
2022-06-06T21:11:16.580348+00:00 app[web.3]: [2022-06-06 21:11:16,580] WARN The configuration 'log4j.root.loglevel' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig)
2022-06-06T21:11:16.580414+00:00 app[web.3]: [2022-06-06 21:11:16,580] INFO Kafka version: 6.1.4-ccs (org.apache.kafka.common.utils.AppInfoParser)
2022-06-06T21:11:16.580457+00:00 app[web.3]: [2022-06-06 21:11:16,580] INFO Kafka commitId: c9124241a6ff43bc (org.apache.kafka.common.utils.AppInfoParser)
2022-06-06T21:11:16.580479+00:00 app[web.3]: [2022-06-06 21:11:16,580] INFO Kafka startTimeMs: 1654549876580 (org.apache.kafka.common.utils.AppInfoParser)
2022-06-06T21:11:16.607720+00:00 app[web.3]: [2022-06-06 21:11:16,607] INFO [Consumer clientId=consumer-connect-demo-group-3, groupId=connect-demo-group] Cluster ID: someId (org.apache.kafka.clients.Metadata)
2022-06-06T21:11:16.608322+00:00 app[web.3]: [2022-06-06 21:11:16,608] INFO [Consumer clientId=consumer-connect-demo-group-3, groupId=connect-demo-group] Subscribed to partition(s): connect-configs-0 (org.apache.kafka.clients.consumer.KafkaConsumer)
2022-06-06T21:11:16.608416+00:00 app[web.3]: [2022-06-06 21:11:16,608] INFO [Consumer clientId=consumer-connect-demo-group-3, groupId=connect-demo-group] Seeking to EARLIEST offset of partition connect-configs-0 (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.658870+00:00 app[web.3]: [2022-06-06 21:11:16,658] INFO [Consumer clientId=consumer-connect-demo-group-3, groupId=connect-demo-group] Resetting offset for partition connect-configs-0 to position FetchPosition{offset=20, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-url-1:9096 (id: 1 rack: us-east-1a)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.164555+00:00 app[web.3]: [2022-06-06 21:11:16,163] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Resetting offset for partition connect-status-2 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-url-2:9096 (id: 2 rack: us-east-1b)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.189017+00:00 app[web.3]: [2022-06-06 21:11:16,188] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Resetting offset for partition connect-status-1 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-url-1:9096 (id: 1 rack: us-east-1a)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.238074+00:00 app[web.3]: [2022-06-06 21:11:16,237] INFO [Consumer clientId=consumer-connect-demo-group-2, groupId=connect-demo-group] Resetting offset for partition connect-status-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-url-3:9096 (id: 0 rack: us-east-1c)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
2022-06-06T21:11:16.252558+00:00 app[web.3]: [2022-06-06 21:11:16,252] INFO ProducerConfig values:
2022-06-06T21:11:16.252560+00:00 app[web.3]: acks = -1
2022-06-06T21:11:16.252561+00:00 app[web.3]: batch.size = 16384
2022-06-06T21:11:16.252562+00:00 app[web.3]: bootstrap.servers = [kafka-url-2:9096, kafka-url-1:9096, kafka-url-3:9096]
2022-06-06T21:11:16.252562+00:00 app[web.3]: buffer.memory = 33554432
2022-06-06T21:11:16.252563+00:00 app[web.3]: client.dns.lookup = use_all_dns_ips
2022-06-06T21:11:16.252563+00:00 app[web.3]: client.id = producer-3
2022-06-06T21:11:16.252564+00:00 app[web.3]: compression.type = none
2022-06-06T21:11:16.252564+00:00 app[web.3]: connections.max.idle.ms = 540000
2022-06-06T21:11:16.252564+00:00 app[web.3]: delivery.timeout.ms = 2147483647
2022-06-06T21:11:16.252564+00:00 app[web.3]: enable.idempotence = false
2022-06-06T21:11:16.252565+00:00 app[web.3]: interceptor.classes = []
2022-06-06T21:11:16.252565+00:00 app[web.3]: internal.auto.downgrade.txn.commit = false
2022-06-06T21:11:16.252566+00:00 app[web.3]: key.serializer = class org.apache.kafka.common.serialization.StringSerializer
2022-06-06T21:11:16.252566+00:00 app[web.3]: linger.ms = 0
2022-06-06T21:11:16.252566+00:00 app[web.3]: max.block.ms = 60000
2022-06-06T21:11:16.252567+00:00 app[web.3]: max.in.flight.requests.per.connection = 1
2022-06-06T21:11:16.252567+00:00 app[web.3]: max.request.size = 1048576
2022-06-06T21:11:16.252567+00:00 app[web.3]: metadata.max.age.ms = 300000
2022-06-06T21:11:16.252567+00:00 app[web.3]: metadata.max.idle.ms = 300000
2022-06-06T21:11:16.252567+00:00 app[web.3]: metric.reporters = []
2022-06-06T21:11:16.252568+00:00 app[web.3]: metrics.num.samples = 2
2022-06-06T21:11:16.252568+00:00 app[web.3]: metrics.recording.level = INFO
2022-06-06T21:11:16.252568+00:00 app[web.3]: metrics.sample.window.ms = 30000
2022-06-06T21:11:16.252569+00:00 app[web.3]: partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
2022-06-06T21:11:16.252569+00:00 app[web.3]: receive.buffer.bytes = 32768
2022-06-06T21:11:16.252569+00:00 app[web.3]: reconnect.backoff.max.ms = 1000
2022-06-06T21:11:16.252570+00:00 app[web.3]: reconnect.backoff.ms = 50
2022-06-06T21:11:16.252570+00:00 app[web.3]: request.timeout.ms = 30000
2022-06-06T21:11:16.252570+00:00 app[web.3]: retries = 2147483647
2022-06-06T21:11:16.252570+00:00 app[web.3]: retry.backoff.ms = 100
2022-06-06T21:11:16.252571+00:00 app[web.3]: sasl.client.callback.handler.class = null
2022-06-06T21:11:16.252571+00:00 app[web.3]: sasl.jaas.config = null
2022-06-06T21:11:16.252571+00:00 app[web.3]: sasl.kerberos.kinit.cmd = /usr/bin/kinit
2022-06-06T21:11:16.252572+00:00 app[web.3]: sasl.kerberos.min.time.before.relogin = 60000
2022-06-06T21:11:16.252572+00:00 app[web.3]: sasl.kerberos.service.name = null
2022-06-06T21:11:16.252572+00:00 app[web.3]: sasl.kerberos.ticket.renew.jitter = 0.05
2022-06-06T21:11:16.252573+00:00 app[web.3]: sasl.kerberos.ticket.renew.window.factor = 0.8
2022-06-06T21:11:16.252573+00:00 app[web.3]: sasl.login.callback.handler.class = null
2022-06-06T21:11:16.252573+00:00 app[web.3]: sasl.login.class = null
2022-06-06T21:11:16.252574+00:00 app[web.3]: sasl.login.refresh.buffer.seconds = 300
2022-06-06T21:11:16.252574+00:00 app[web.3]: sasl.login.refresh.min.period.seconds = 60
2022-06-06T21:11:16.252574+00:00 app[web.3]: sasl.login.refresh.window.factor = 0.8
2022-06-06T21:11:16.252574+00:00 app[web.3]: sasl.login.refresh.window.jitter = 0.05
2022-06-06T21:11:16.252575+00:00 app[web.3]: sasl.mechanism = GSSAPI
2022-06-06T21:11:16.252575+00:00 app[web.3]: security.protocol = SSL
2022-06-06T21:11:16.252575+00:00 app[web.3]: security.providers = null
2022-06-06T21:11:16.252575+00:00 app[web.3]: send.buffer.bytes = 131072
2022-06-06T21:11:16.252576+00:00 app[web.3]: socket.connection.setup.timeout.max.ms = 127000
2022-06-06T21:11:16.252576+00:00 app[web.3]: socket.connection.setup.timeout.ms = 10000
2022-06-06T21:11:16.252576+00:00 app[web.3]: ssl.cipher.suites = null
2022-06-06T21:11:16.252577+00:00 app[web.3]: ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
2022-06-06T21:11:16.252577+00:00 app[web.3]: ssl.endpoint.identification.algorithm =
2022-06-06T21:11:16.252577+00:00 app[web.3]: ssl.engine.factory.class = null
2022-06-06T21:11:16.252578+00:00 app[web.3]: ssl.key.password = [hidden]
2022-06-06T21:11:16.252578+00:00 app[web.3]: ssl.keymanager.algorithm = SunX509
2022-06-06T21:11:16.252578+00:00 app[web.3]: ssl.keystore.certificate.chain = null
2022-06-06T21:11:16.252578+00:00 app[web.3]: ssl.keystore.key = null
2022-06-06T21:11:16.252579+00:00 app[web.3]: ssl.keystore.location = /app/.keystore.jks
2022-06-06T21:11:16.252579+00:00 app[web.3]: ssl.keystore.password = [hidden]
2022-06-06T21:11:16.252579+00:00 app[web.3]: ssl.keystore.type = JKS
2022-06-06T21:11:16.252580+00:00 app[web.3]: ssl.protocol = SSL
2022-06-06T21:11:16.252580+00:00 app[web.3]: ssl.provider = null
2022-06-06T21:11:16.252580+00:00 app[web.3]: ssl.secure.random.implementation = null
2022-06-06T21:11:16.252580+00:00 app[web.3]: ssl.trustmanager.algorithm = PKIX
2022-06-06T21:11:16.252581+00:00 app[web.3]: ssl.truststore.certificates = null
2022-06-06T21:11:16.252581+00:00 app[web.3]: ssl.truststore.location = /app/.truststore.jks
2022-06-06T21:11:16.252581+00:00 app[web.3]: ssl.truststore.password = [hidden]
2022-06-06T21:11:16.252582+00:00 app[web.3]: ssl.truststore.type = JKS
2022-06-06T21:11:16.252582+00:00 app[web.3]: transaction.timeout.ms = 60000
2022-06-06T21:11:16.252582+00:00 app[web.3]: transactional.id = null
2022-06-06T21:11:16.252582+00:00 app[web.3]: value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
2022-06-06T21:11:16.252583+00:00 app[web.3]: (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391195+00:00 app[web.3]: [2022-06-06 21:11:16,390] WARN The configuration 'log4j.loggers' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391271+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'group.id' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391272+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'rest.advertised.port' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391272+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'plugin.path' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391273+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'status.storage.partitions' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391273+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'metrics.context.connect.kafka.cluster.id' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391304+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'offset.storage.partitions' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391812+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'topic.creation.enable' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391813+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'rest.port' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391855+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'config.storage.partitions' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391894+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'config.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391894+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'key.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391918+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391963+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.391988+00:00 app[web.3]: [2022-06-06 21:11:16,391] WARN The configuration 'log4j.root.loglevel' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
2022-06-06T21:11:16.392047+00:00 app[web.3]: [2022-06-06 21:11:16,392] INFO Kafka version: 6.1.4-ccs (org.apache.kafka.common.utils.AppInfoParser)
2022-06-06T21:11:16.392077+00:00 app[web.3]: [2022-06-06 21:11:16,392] INFO Kafka commitId: c9124241a6ff43bc (org.apache.kafka.common.utils.AppInfoParser)
2022-06-06T21:11:16.392131+00:00 app[web.3]: [2022-06-06 21:11:16,392] INFO Kafka startTimeMs: 1654549876391 (org.apache.kafka.common.utils.AppInfoParser)
2022-06-06T21:11:16.401532+00:00 app[web.3]: [2022-06-06 21:11:16,401] INFO ConsumerConfig values:
2022-06-06T21:11:16.401533+00:00 app[web.3]: allow.auto.create.topics = true
2022-06-06T21:11:16.401534+00:00 app[web.3]: auto.commit.interval.ms = 5000
2022-06-06T21:11:16.401534+00:00 app[web.3]: auto.offset.reset = earliest
2022-06-06T21:11:16.401535+00:00 app[web.3]: bootstrap.servers = [kafka-url-2:9096, kafka-url-1:9096, kafka-url-3:9096]
2022-06-06T21:11:16.401536+00:00 app[web.3]: check.crcs = true
2022-06-06T21:11:16.401536+00:00 app[web.3]: client.dns.lookup = use_all_dns_ips
2022-06-06T21:11:16.401536+00:00 app[web.3]: client.id = consumer-connect-demo-group-3
2022-06-06T21:11:16.401537+00:00 app[web.3]: client.rack =
2022-06-06T21:11:16.401537+00:00 app[web.3]: connections.max.idle.ms = 540000
2022-06-06T21:11:16.401537+00:00 app[web.3]: default.api.timeout.ms = 60000
2022-06-06T21:11:16.401538+00:00 app[web.3]: enable.auto.commit = false
2022-06-06T21:11:16.401538+00:00 app[web.3]: exclude.internal.topics = true
2022-06-06T21:11:16.401538+00:00 app[web.3]: fetch.max.bytes = 52428800
2022-06-06T21:11:16.401538+00:00 app[web.3]: fetch.max.wait.ms = 500
2022-06-06T21:11:16.401539+00:00 app[web.3]: fetch.min.bytes = 1
2022-06-06T21:11:16.401539+00:00 app[web.3]: group.id = connect-demo-group
2022-06-06T21:11:16.401539+00:00 app[web.3]: group.instance.id = null
2022-06-06T21:11:16.401540+00:00 app[web.3]: heartbeat.interval.ms = 3000
2022-06-06T21:11:16.401540+00:00 app[web.3]: interceptor.classes = []
2022-06-06T21:11:16.401540+00:00 app[web.3]: internal.leave.group.on.close = true
2022-06-06T21:11:16.401541+00:00 app[web.3]: internal.throw.on.fetch.stable.offset.unsupported = false
2022-06-06T21:11:16.401541+00:00 app[web.3]: isolation.level = read_uncommitted
2022-06-06T21:11:16.401541+00:00 app[web.3]: key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
2022-06-06T21:11:16.401542+00:00 app[web.3]: max.partition.fetch.bytes = 1048576
2022-06-06T21:11:16.401542+00:00 app[web.3]: max.poll.interval.ms = 300000
2022-06-06T21:11:16.401542+00:00 app[web.3]: max.poll.records = 500
2022-06-06T21:11:16.401542+00:00 app[web.3]: metadata.max.age.ms = 300000
2022-06-06T21:11:16.401543+00:00 app[web.3]: metric.reporters = []
2022-06-06T21:11:16.401544+00:00 app[web.3]: metrics.num.samples = 2
2022-06-06T21:11:16.401544+00:00 app[web.3]: metrics.recording.level = INFO
2022-06-06T21:11:16.401544+00:00 app[web.3]: metrics.sample.window.ms = 30000
2022-06-06T21:11:16.401544+00:00 app[web.3]: partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
2022-06-06T21:11:16.401545+00:00 app[web.3]: receive.buffer.bytes = 65536
2022-06-06T21:11:16.401545+00:00 app[web.3]: reconnect.backoff.max.ms = 1000
2022-06-06T21:11:16.401545+00:00 app[web.3]: reconnect.backoff.ms = 50
2022-06-06T21:11:16.401546+00:00 app[web.3]: request.timeout.ms = 30000
2022-06-06T21:11:16.401546+00:00 app[web.3]: retry.backoff.ms = 100
2022-06-06T21:11:16.401546+00:00 app[web.3]: sasl.client.callback.handler.class = null
2022-06-06T21:11:16.401546+00:00 app[web.3]: sasl.jaas.config = null
2022-06-06T21:11:16.401547+00:00 app[web.3]: sasl.kerberos.kinit.cmd = /usr/bin/kinit
2022-06-06T21:11:16.401547+00:00 app[web.3]: sasl.kerberos.min.time.before.relogin = 60000
2022-06-06T21:11:16.401547+00:00 app[web.3]: sasl.kerberos.service.name = null
2022-06-06T21:11:16.401548+00:00 app[web.3]: sasl.kerberos.ticket.renew.jitter = 0.05
2022-06-06T21:11:16.401548+00:00 app[web.3]: sasl.kerberos.ticket.renew.window.factor = 0.8
2022-06-06T21:11:16.401548+00:00 app[web.3]: sasl.login.callback.handler.class = null
2022-06-06T21:11:16.401548+00:00 app[web.3]: sasl.login.class = null
2022-06-06T21:11:16.401549+00:00 app[web.3]: sasl.login.refresh.buffer.seconds = 300
2022-06-06T21:11:16.401549+00:00 app[web.3]: sasl.login.refresh.min.period.seconds = 60
2022-06-06T21:11:16.401549+00:00 app[web.3]: sasl.login.refresh.window.factor = 0.8
2022-06-06T21:11:16.401550+00:00 app[web.3]: sasl.login.refresh.window.jitter = 0.05
2022-06-06T21:11:16.401550+00:00 app[web.3]: sasl.mechanism = GSSAPI
2022-06-06T21:11:16.401550+00:00 app[web.3]: security.protocol = SSL
2022-06-06T21:11:16.401551+00:00 app[web.3]: security.providers = null
2022-06-06T21:11:16.401551+00:00 app[web.3]: send.buffer.bytes = 131072
2022-06-06T21:11:16.401551+00:00 app[web.3]: session.timeout.ms = 10000
2022-06-06T21:11:16.401551+00:00 app[web.3]: socket.connection.setup.timeout.max.ms = 127000
2022-06-06T21:11:16.401552+00:00 app[web.3]: socket.connection.setup.timeout.ms = 10000
2022-06-06T21:11:16.401558+00:00 app[web.3]: ssl.cipher.suites = null
2022-06-06T21:11:16.401558+00:00 app[web.3]: ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
2022-06-06T21:11:16.401559+00:00 app[web.3]: ssl.endpoint.identification.algorithm =
2022-06-06T21:11:16.401559+00:00 app[web.3]: ssl.engine.factory.class = null
2022-06-06T21:11:16.401559+00:00 app[web.3]: ssl.key.password = [hidden]
2022-06-06T21:11:16.401560+00:00 app[web.3]: ssl.keymanager.algorithm = SunX509
2022-06-06T21:11:16.401560+00:00 app[web.3]: ssl.keystore.certificate.chain = null
2022-06-06T21:11:16.401560+00:00 app[web.3]: ssl.keystore.key = null
2022-06-06T21:11:16.401561+00:00 app[web.3]: ssl.keystore.location = /app/.keystore.jks
2022-06-06T21:11:16.401561+00:00 app[web.3]: ssl.keystore.password = [hidden]
2022-06-06T21:11:16.401561+00:00 app[web.3]: ssl.keystore.type = JKS
2022-06-06T21:11:16.401561+00:00 app[web.3]: ssl.protocol = SSL
2022-06-06T21:11:16.401561+00:00 app[web.3]: ssl.provider = null
2022-06-06T21:11:16.401562+00:00 app[web.3]: ssl.secure.random.implementation = null
2022-06-06T21:11:16.401562+00:00 app[web.3]: ssl.trustmanager.algorithm = PKIX
2022-06-06T21:11:16.401562+00:00 app[web.3]: ssl.truststore.certificates = null
2022-06-06T21:11:16.401563+00:00 app[web.3]: ssl.truststore.location = /app/.truststore.jks
2022-06-06T21:11:16.401563+00:00 app[web.3]: ssl.truststore.password = [hidden]
2022-06-06T21:11:16.401563+00:00 app[web.3]: ssl.truststore.type = JKS
2022-06-06T21:11:16.401563+00:00 app[web.3]: value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
2022-06-06T21:11:16.401564+00:00 app[web.3]: (org.apache.kafka.clients.consumer.ConsumerConfig)
2022-06-06T21:11:16.730633+00:00 app[web.3]: [2022-06-06 21:11:16,730] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Cluster ID: someid (org.apache.kafka.clients.Metadata)
2022-06-06T21:11:16.732682+00:00 app[web.3]: [2022-06-06 21:11:16,732] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Discovered group coordinator kafka-url-1:9096 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:16.736604+00:00 app[web.3]: [2022-06-06 21:11:16,736] INFO [Worker clientId=connect-1, groupId=connect-demo-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:16.765579+00:00 app[web.3]: [2022-06-06 21:11:16,765] INFO [Worker clientId=connect-1, groupId=connect-demo-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.337307+00:00 app[web.2]: [2022-06-06 21:11:19,337] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.337364+00:00 app[web.2]: [2022-06-06 21:11:19,337] INFO [Worker clientId=connect-1, groupId=connect-demo-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.342703+00:00 app[web.2]: [2022-06-06 21:11:19,342] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Successfully joined group with generation Generation{generationId=39, memberId='connect-1-id-1', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.347949+00:00 app[web.2]: [2022-06-06 21:11:19,347] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Successfully synced group in generation Generation{generationId=39, memberId='connect-1-id-1', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.342608+00:00 app[web.3]: [2022-06-06 21:11:19,342] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Successfully joined group with generation Generation{generationId=39, memberId='connect-1-id-2', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.347995+00:00 app[web.3]: [2022-06-06 21:11:19,347] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Successfully synced group in generation Generation{generationId=39, memberId='connect-1-id-2', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.339182+00:00 app[web.1]: [2022-06-06 21:11:19,339] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.339236+00:00 app[web.1]: [2022-06-06 21:11:19,339] INFO [Worker clientId=connect-1, groupId=connect-demo-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.341477+00:00 app[web.1]: [2022-06-06 21:11:19,341] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Successfully joined group with generation Generation{generationId=39, memberId='connect-1-id-3', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
2022-06-06T21:11:19.346767+00:00 app[web.1]: [2022-06-06 21:11:19,346] INFO [Worker clientId=connect-1, groupId=connect-demo-group] Successfully synced group in generation Generation{generationId=39, memberId='connect-1-id-3', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

Apache Flink - streaming app doesn't start from checkpoint after stop and start

I have the following Flink streaming application running locally, written with the SQL API:
object StreamingKafkaJsonsToCsvLocalFs {
val brokers = "localhost:9092"
val topic = "test-topic"
val consumerGroupId = "test-consumer"
val kafkaTableName = "KafKaTable"
val targetTable = "TargetCsv"
val targetPath = f"file://${new java.io.File(".").getCanonicalPath}/kafka-to-fs-csv"
def generateKafkaTableDDL(): String = {
s"""
|CREATE TABLE $kafkaTableName (
| `kafka_offset` BIGINT METADATA FROM 'offset',
| `seller_id` STRING
|) WITH (
| 'connector' = 'kafka',
| 'topic' = '$topic',
| 'properties.bootstrap.servers' = 'localhost:9092',
| 'properties.group.id' = '$consumerGroupId',
| 'scan.startup.mode' = 'earliest-offset',
| 'format' = 'json'
|)
|""".stripMargin
}
def generateTargetTableDDL(): String = {
s"""
|CREATE TABLE $targetTable (
| `kafka_offset` BIGINT,
| `seller_id` STRING
| )
|WITH (
| 'connector' = 'filesystem',
| 'path' = '$targetPath',
| 'format' = 'csv',
| 'sink.rolling-policy.rollover-interval' = '10 seconds',
| 'sink.rolling-policy.check-interval' = '1 seconds'
|)
|""".stripMargin
}
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI()
env.enableCheckpointing(1000)
env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
env.getCheckpointConfig.setCheckpointStorage(s"$targetPath/checkpoints")
val settings = EnvironmentSettings.newInstance()
.inStreamingMode()
.build()
val tblEnv = StreamTableEnvironment.create(env, settings)
tblEnv.executeSql(generateKafkaTableDDL())
tblEnv.executeSql(generateTargetTableDDL())
tblEnv.from(kafkaTableName).executeInsert(targetTable).await()
tblEnv.executeSql("kafka-json-to-fs")
}
}
As you can see, the checkpointing is enabled and when I execute this application I see that the checkpoint folder is created and populated.
The problem that I am facing with is -- when I stop&start my application (from the IDE) I expect it to start from the same point it stopped in the previous execution but instead I see that it consumes all the offsets from the earliest offset in the topic (I see it from the new generated output files that contain zero offset although the previous run processed those offsets).
What am I missing about checkpointing in Flink? I would expect it to be exactly once.
Flink only restarts from a checkpoint when recovering from a failure, or when explicitly restarted from a retained checkpoint via the command line or REST API. Otherwise, the KafkaSource starts from the offsets configured in the code, which defaults to the earliest offsets.
If you have no other state, you could instead rely on the committed offsets as the source of truth, and configure the Kafka connector to use the committed offsets as the starting position.
Flink's fault tolerance via checkpointing isn't designed to support mini-cluster deployments like the one used when running in an IDE. Normally the job manager and task managers are running in separate processes, and the job manager can detect that a task manager has failed, and can arrange for a restart.

pyflink TableException: Failed to execute sql

I use pyflink to run flink streaming, if I run flink with StandAlone mode, it works, but run flink with yarn-per-job mode, it failed, report "pyflink.util.exceptions.TableException: Failed to execute sql"
yarn per job command is: flink run -t yarn-per-job -Djobmanager.memory.process.size=1024mb -Dtaskmanager.memory.process.size=2048mb -ynm flink-cluster -Dtaskmanager.numberOfTaskSlots=2 -pyfs cluster.py ...
standalone command is: flink run -pyfs cluster.py ...
The python environment archive attached in cluster.py.
env = StreamExecutionEnvironment.get_execution_environment()
env_settings = EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build()
t_env = StreamTableEnvironment.create(env, environment_settings=env_settings)
curr_path = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
jars = f"""
file://{curr_path}/jars/flink-sql-connector-kafka_2.11-1.13.1.jar;
file://{curr_path}/jars/force-shading-1.13.1.jar"""
t_env.get_config().get_configuration().set_string("pipeline.jars", jars)
t_env.add_python_archive("%s/requirements/flink.zip" % curr_path)
t_env.get_config().set_python_executable("flink.zip/flink/bin/python")
env.set_stream_time_characteristic(TimeCharacteristic.EventTime)
env.set_parallelism(2)
env.get_config().set_auto_watermark_interval(10000)
t_env.get_config().get_configuration().set_boolean("python.fn-execution.memory.managed", True)
parse_log = udaf(LogParser(parsing_params),
input_types=[DataTypes.STRING(), DataTypes.STRING(), DataTypes.STRING(), DataTypes.STRING(),
DataTypes.STRING(), DataTypes.TIMESTAMP(3)],
result_type=DataTypes.STRING(), func_type="pandas")
process_ad = udf(ADProcessor(ad_params), result_type=DataTypes.STRING())
t_env.create_temporary_function('log_parsing_process', parse_log)
t_env.create_temporary_function('ad_process', process_ad)
tumble_window = Tumble.over("5.minutes").on("time_ltz").alias("w")
t_env.execute_sql(f"""
CREATE TABLE source_table(
ip VARCHAR, -- ip address
raws VARCHAR, -- message
host VARCHAR, -- host
log_type VARCHAR, -- type
system_name VARCHAR, -- system
ts BIGINT,
time_ltz AS TO_TIMESTAMP_LTZ(ts, 3),
WATERMARK FOR time_ltz AS time_ltz - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
'topic' = '{source_topic}',
'properties.bootstrap.servers' = '{source_servers}',
'properties.group.id' = '{group_id}',
'scan.startup.mode' = '{auto_offset_reset}',
'format' = 'json'
)
""")
sink_sql = f"""
CREATE TABLE sink (
alert VARCHAR, -- alert
start_time timestamp(3), -- window start timestamp
end_time timestamp(3) -- window end timestamp
) with (
'connector' = 'kafka',
'topic' = '{sink_topic}',
'properties.bootstrap.servers' = '{sink_servers}',
'json.fail-on-missing-field' = 'false',
'json.ignore-parse-errors' = 'true',
'format' = 'json'
)"""
t_env.execute_sql(sink_sql)
t_env.get_config().set_null_check(False)
source_table = t_env.from_path('source_table')
sink_table = source_table.window(tumble_window) \
.group_by("w, log_type") \
.select("log_parsing_process(ip, raws, host, log_type, system_name, time_ltz) AS pattern, "
"w.start AS start_time, "
"w.end AS end_time") \
.select("ad_process(pattern, start_time, end_time) AS alert, start_time, end_time")
sink_table.execute_insert("sink")
Error is:
File "/tmp/pyflink/xxxx/xxxx/workerbee/log_exception_detection_run_on_diff_mode.py ,line 148, in run_flink sink_table_execute_insert("test_sink")
File "/opt/flink/flink-1.13.1_scala_2.12/opt/python/pyflink.zip/pyflink/table/table.py, line 1056 in execute_insert
File "/opt/flink/flink-1.13.1_scala_2.12/opt/python/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in __call__
File "/opt/flink/flink-1.13.1_scala_2.12/opt/python/pyflink.zip/pyflink/util/exceptions.py", line 163, in deco
pyflink.util.exceptions.TableException: Failed to execute sql
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:777)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:742)
at org.apache.flink.table.api.internal.TableImpl.executeInsert(TableImpl.java:572)
at sun.reflect.NativeMetondAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMetondAccessorImpl.invoke(NativeMethodAccessorImpl.hava:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.hava:498)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
org.apache.flink.client.program.ProgramAbortException: java.lang.RuntimeException: Python process exits with code: 1
nodemanager log:
INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /opt/hadoop_data/tmp/nm-local-dir/usercache/root/appcache/applicatino_I1644370510310_0002/container_I1644370510310_0002_03_000001/default_container_executor.sh]
WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_I1644370510310_0002_03_000001 is : 1
WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_I1644370510310_0002_03_000001 and exit exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java: 1008)
at org.apache.hadoop.util.Shell.run(Shell.java: 901)
at org.apache.hadoop.util.Shell$ShellCommandExceutor.execute(Shell.java:1213
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:309)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:585)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.Call(ContainerLaunch.java:373)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.Call(ContainerLaunch.java:103)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPollExecutor.runWorker(ThreadPollExecutor.java:1149)
at java.util.concurrent.ThreadPollExecutor$Worker.run(ThreadPollExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: container id: container_I1644370510310_0002_03_000001
INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container launch failed : Container exited with a non-zero exit code 1
INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_I1644370510310_0002_03_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
Looks like a classloader related issue. classloader.check-leaked-classloader configuration can refer to https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/config/
In addition, you can try to use add_jar api instead of setting pipeline.jars config directly
def add_jars(self, *jars_path: str):
"""
Adds a list of jar files that will be uploaded to the cluster and referenced by the job.
:param jars_path: Path of jars.
"""
add_jars_to_context_class_loader(jars_path)
jvm = get_gateway().jvm
jars_key = jvm.org.apache.flink.configuration.PipelineOptions.JARS.key()
env_config = jvm.org.apache.flink.python.util.PythonConfigUtil \
.getEnvironmentConfig(self._j_stream_execution_environment)
old_jar_paths = env_config.getString(jars_key, None)
joined_jars_path = ';'.join(jars_path)
if old_jar_paths and old_jar_paths.strip():
joined_jars_path = ';'.join([old_jar_paths, joined_jars_path])
env_config.setString(jars_key, joined_jars_path)
after debug and check, finally I found the issue is I missed some flink hadoop jar packages:
commons-cli-1.4.jar
flink-shaded-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar
hadoop-yarn-api-3.3.1.jar

InfluxDB not starting: 8086 bind address already in use

I have an InfluxDB Version 1.8.9, but I can't start it.
In this example I'm logged in as a root.
netstat -lptn
gives me a range of services, none of them seem to listen to 8086. (there are other services running like grafana or MySQL, which seem to work fine)
To further confirm nothing is on 8086,I listened to that related Issue run: open server: open service: listen tcp :8086: bind: address already in use on starting influxdb
and run
netstat -a | grep 8086
which results in no results.
My config file on /etc/influxdb/influxdb.conf looks like this:
reporting-disabled = false
bind-address = "127.0.0.1:8086"
[meta]
#dir = "/root/.influxdb/meta"
dir = "/var/lib/influxdb/meta"
retention-autocreate = true
logging-enabled = true
[data]
dir = "/var/lib/influxdb/data"
index-version = "inmem"
wal-dir = "/var/lib/influxdb/wal"
wal-fsync-delay = "0s"
validate-keys = false
strict-error-handling = false
query-log-enabled = true
cache-max-memory-size = 1073741824
cache-snapshot-memory-size = 26214400
cache-snapshot-write-cold-duration = "10m0s"
compact-full-write-cold-duration = "4h0m0s"
compact-throughput = 50331648
compact-throughput-burst = 50331648
max-series-per-database = 1000000
max-values-per-tag = 100000
max-concurrent-compactions = 0
max-index-log-file-size = 1048576
series-id-set-cache-size = 100
series-file-max-concurrent-snapshot-compactions = 0
trace-logging-enabled = false
tsm-use-madv-willneed = false
...
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
suppress-write-log = false
write-tracing = false
flux-enabled = false
flux-log-enabled = false
pprof-enabled = true
pprof-auth-enabled = false
debug-pprof-enabled = false
ping-auth-enabled = false
prom-read-auth-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
https-private-key = ""
max-row-limit = 0
max-connection-limit = 0
shared-secret = ""
realm = "InfluxDB"
unix-socket-enabled = false
unix-socket-permissions = "0777"
bind-socket = "/var/run/influxdb.sock"
max-body-size = 25000000
access-log-path = ""
max-concurrent-write-limit = 0
max-enqueued-write-limit = 0
enqueued-write-timeout = 30000000000
...
So i tried to start my database:
service influxdb start
Which gives me
ob for influxdb.service failed because a timeout was exceeded. See
"systemctl status influxdb.service" and "journalctl -xe" for details.
result of systemctl status influxdb.service
● influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
Active: activating (start) since Tue 2021-09-21 18:37:12 CEST; 1min 7s ago
Docs: https://docs.influxdata.com/influxdb/
Main PID: 32016 (code=exited, status=1/FAILURE); Control PID: 5874 (influxd-systemd)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/influxdb.service
├─5874 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
└─5965 sleep 10
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515897Z lvl=info msg="Registered diagnostics client" log_id=0WjJLI7l000 service=monitor name=runtime
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515907Z lvl=info msg="Registered diagnostics client" log_id=0WjJLI7l000 service=monitor name=network
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515923Z lvl=info msg="Registered diagnostics client" log_id=0WjJLI7l000 service=monitor name=system
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515977Z lvl=info msg="Starting precreation service" log_id=0WjJLI7l000 service=shard-precreation check_interval=10m advanc
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515995Z lvl=info msg="Starting snapshot service" log_id=0WjJLI7l000 service=snapshot
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516015Z lvl=info msg="Starting continuous query service" log_id=0WjJLI7l000 service=continuous_querier
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516011Z lvl=info msg="Storing statistics" log_id=0WjJLI7l000 service=monitor db_instance=_internal db_rp=monitor interval=
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516037Z lvl=info msg="Starting HTTP service" log_id=0WjJLI7l000 service=httpd authentication=false
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516052Z lvl=info msg="opened HTTP access log" log_id=0WjJLI7l000 service=httpd path=stderr
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: run: open server: open service: listen tcp :8086: bind: address already in use
I can't really understand where I did something wrong, since I configured :8086 in the config file. Can you help me?
It appears to be a typo in the configuration file.
As stated in the documentation, the configuration file should hold http-bind-address instead of bind-address. As well as a locked port by the first configuration.
The first few lines of the file /etc/influxdb/influxdb.conf should look like so:
reporting-disabled = false
http-bind-address = "127.0.0.1:8086"
A suggested approach would be to:
bind-address to http-bind-address
Changing the port from default 8086 to a known free port
(Optional) Back to the default port.
From your config:
reporting-disabled = false
bind-address = "127.0.0.1:8086"
...
[http]
enabled = true
bind-address = ":8086"
Both your 'native' service and the 'http' service are configured to use the same port 8086. This cannot work and you probably want to change the 'native' port back to its default of 8088.

Resources