Tracing context lost opentracing and brave for kafka - apache-flink

I've some flink jobs which uses kafka as source and sink and I want to add tracing to it, so that any message consumed/produced from/to Kafka is well traced, for that I'm using kafka interceptors to intercepts messages and log trace, span and parent traceId, for that I'm using
opentracing-kafka-client(v0.1.11) in conjunction with brave-opentracing(v0.35.1), the reason why I'm using custom interceptors because I need to log messages in a specified format.
After configuring interceptors they are getting invoked and it uses tracing information (from headers) coming from upstream system and logs it but when it comes to producing message again to kafka then tracing context is lost for instance consider below scenario
1) Message put on Kafka by some rest service
2) Message consumed by flink job and interceptors kicks in and uses tracing information from header and logs it
3) After processing message is produced by flink job to Kafka
It works well until step #2 but when it comes to producing message then tracing information from previous step is not used because it does not have any headers information and hence it produces entirely new trace.
I'm registering tracer as below :-
public class MyTracer {
private static final Tracer INSTANCE = BraveTracer.create(Tracing.newBuilder().build());
public static void registerTracer() {
GlobalTracer.registerIfAbsent(INSTANCE);
}
public static Tracer getTracer() {
return INSTANCE;
}
}
And I'm using TracingConsumerInterceptor and TracingProducerInterceptor from opentracing kafka.

Related

Flink listener is not getting called when using statementSet.execute

I'm using flink 1.13, using statementSet.execute but added a listener in Flink stream env, onJobSubmitted is getting called when the job is submitted (no compile issues with plan) but to bug the pipeline, I have a string field in Kafka but int field in table DDL, so it is failing at task manager, but somehow onJobExecuted with throwable is not getting called, am I missing something?

Apache Flink - how to stop and resume stream processing on downstream failure

I have a Flink application that consumes incoming messages on a Kafka topic with multiple partitions, does some processing then sends them to a sink that sends them over HTTP to an external service. Sometimes the downstream service is down the stream processing needs to stop until it is back in action.
There are two approaches I am considering.
Throw an exception when the Http sink fails to send the output message. This will cause the task and job to restart according to the configured restart strategy. Eventually the downstream service will be back and the system will continue where it left off.
Have the Sink sleep and retry on failure; it can do this continually until the downstream service is back.
From what I understand and from my PoC, with 1. I will lose exactly-least once guarantees since the sink itself is external state. As far as I can see, you cannot make a simple HTTP endpoint transactional, as it needs to be to implement TwoPhaseCommitSinkFunction.
With 2. this is less of an issue since pipeline will not proceed until the sink makes a successful write, and I can rely on back pressure throughout the system to pause the retrieval of messages from the Kafka source.
The main questions I have are:
Is it a correct assumption that you can't make a TwoPhaseCommitSinkFunction for a simple HTTP endpoint?
Which of the two strategies, or neither, makes the most sense?
Am I missing simpler obvious solutions?
I think you can try AsyncIO in Flink - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/.
Try to make the HTTP endpoint send a response once all operation has been done for the request, e.g. In http server, the process for the request has been done and the result has been committed to DB. Then use a http async client in AsyncIO operator. The AsyncIO operator will wait until the response is received by the operator. If any error happened, the Flink streaming pipeline will fail and restart the pipeline based on recovery strategy.
All requests to HTTP endpoint without receiving response will be in the internal buffer of AsyncIO operator, and once streaming pipeline failed, the requests pending in the buffer will be saved in the checkpoint state. It will also trigger back pressure when the internal buffer is full.

Full duplex TCP connections using Netty4 and Apache Camel

For an IoT project I am working on I am researching a next, enhanced, version of our “Socket Handler” which is over 5 years of age and has evolved in a big beast doing apart from handling socket connections with IoT devices also in thread processing that has become a real pain to manage.
For my total rewrite I am looking into Apache Camel as a routing and transformation toolkit and understand how this can help us split processing steps into micro-services, loosely coupled through message queues.
One thing that I have trouble understanding however is how I can implement the following logic “the Apache Camel way”:
An IoT device sends an initial message which contains its id, some extra headers and a message payload.
Apart from extracting the message payload and routing it to a channel, I also need to use the device Id to check a message queue, named after the device id, for any commands that have to go to the device over the same socket connection that received the initial message.
Although it seems that Netty4, which is included in Camel, can deal with synchronous duplex comms, I cannot see how the above logic can be implemented in the Camel Netty4 component. Camel Routing seems to be one way only.
Is there a correct way to do this or should I forget about using camel for this and just use Netty4 bare?
It is possible to reproduce a full duplex communication by using two Camel routes :
thanks by using reuseChannel property
As this full duplex communication will be realised thanks to two different Camel routes, sync property must be set to false.
Here the first route:
from("netty4:tcp://{{tcpAddress}}:{{tcpPort}}?decoders=#length-decoder,#string-decoder&encoders=#length-encoder,#bytearray-encoder&sync=false&reuseChannel=true") .bean("myMessageService", "receiveFromTCP").to("jms:queue:<name>")
This first route will create a TCP/IP consumer thanks to a server socket (it is also possible to use a client socket thanks to property clientMode)
As we want to reuse the just created connection, it is important to initialize decoders but also encoders that will be used later thanks to a bean (see further). This bean will be responsible of sending data by using the Channel created in this first route (Netty Channel contains a pipeline for decoding / encoding messages before receiving from / sending to TCP/IP.
Now, we want to send back some data to the parter connected to the consumer (from) endpoint of the first route. Since we are not able to do it thanks to classical producer endpoint (to), we use a bean object :
from("jsm:queue:<name>").bean("myMessageService", "sendToTCP");
Here the bean code :
public class MessageService {
private Channel openedChannel;
public void sendToTCP(final Exchange exchange) {
// opened channel will use encoders before writing on the socket already
// created in the first route
openedChannel.writeAndFlush(exchange.getIn().getBody());
}
public void receiveFromTCP(final Exchange exchange) {
// record the channel created in the first route.
this.openedChannel = exchange.getProperty(NettyConstants.NETTY_CHANNEL, Channel.class);
}
}
Of course,
the same bean instance is used in the two routes, you need to use a registry to do so :
SimpleRegistry simpleRegistry = new SimpleRegistry();
simpleRegistry.put("myMessageService", new MessageService());
Since the bean is used in two different asynchronous routes, you will have to deal with some unexected situations, by for example protecting the access of openedChannel member, or dealing with unexpected deconnexion.
This post helped me a lot to find this solution :
how-to-send-a-response-back-over-a-established-tcp-connection-in-async-mode-usin
reuseChannel property documentation
After end of camel route, exchange's body and headers will back to requester as response.

Alpakka KinesisSink : Can not push messages to Stream

I am trying to use the alpakka kinesis connector to send messages to a Kinesis Stream but I have no success with it. I tried the code below but nothing in my stream.
implicit val sys = ActorSystem()
implicit val mat = ActorMaterializer()
implicit val kinesisAsync: AmazonKinesisAsync = AmazonKinesisAsyncClientBuilder.defaultClient()
val debug = Flow[PutRecordsRequestEntry].map { reqEntry =>
println(reqEntry)
reqEntry
}
val entry = new PutRecordsRequestEntry()
.withData(ByteBuffer.wrap("Hello World".getBytes))
.withPartitionKey(Random.nextInt.toString)
Source.tick(1.second, 1.second, entry).to(KinesisSink("myStreamName", KinesisFlowSettings.defaultInstance)).run()
// 2) Source.tick(1.second, 1.second,entry).via(debug).to(KinesisSink("myStreamName", inesisFlowSettings.defaultInstance)).run()
Using a Sink.foreach(println) instead of KinesisSink prints out the PutRecordsRequestEntry every 1 second => EXPECTED
Using KinesisSink, the entry is generated only once.
What Am I doing wrong ?
I am checking my stream with a KinesisSource and reading is working ( tested with another stream)
Also the monitoring dashboard of AWS Kinesis doesnt show any PUT requests.
Note 1: I tried to enable the debug log of alpakka but with no effect
<logger name="akka.stream.alpakka.kinesis" level="DEBUG"/>
in my logback.xml + debug on root level
Some troubleshooting steps to consider below - I hope they help.
I suspect you're likely missing credentials and/or region configuration for your Kinesis client.
Kinesis Firehose
The Kinesis Producer Library (what Alpakka seems to be using) does not work with Kinesis Firehose. If you're trying to write to Firehose this isn't going to work.
Application Logging
You'll probably want to enable logging for the Kinesis Producer Library, not just in Alpakka itself. Relevant documentation is available here:
Configuring the Kinesis Producer Library
Configuration Defaults for Kinesis Producer Library
AWS Side Logging
AWS CloudTrail is automatically enabled out of the box for Kinesis streams, and by default AWS will keep 90 days of CloudTrail logs for you.
https://docs.aws.amazon.com/streams/latest/dev/logging-using-cloudtrail.html
You can use the CloudTrail logs to see the API calls your application is making to Kinesis on your behalf. There's usually a modest delay in requests showing up - but this will let you know if the request is failing due to insufficient IAM permissions or some other issue with your AWS resource configuration.
Check SDK Authentication
The Kinesis client will be using the DefaultAWSCredentialsProviderChain credentials provider to make requests to AWS.
You'll need to make sure you are providing valid AWS credentials with IAM rights to make those requests to Kinesis. If your code is running on AWS, the preferred way of giving your application credentials is using IAM Roles (specified at instance launch time).
You'll also need to specify the AWS Region when building the client in your code. Use your application.properties for configuring this, or if your application is part of a CloudFormation stack that lives in a single region - using the instance metadata service to retrieve the current region when your code is running on AWS.
The problem was an access denied / permission on the action on the stream.
I had to add the akka actor config for logging
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
stdout-loglevel = "DEBUG"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
logger-startup-timeout = "30s"
}
to see debug lines and I actually run in debug and step in each stage.
It required permission "PutRecords" in the IAM role

Set-up queue policy not to send expired message to Dead-Letter Queue on Camel endpoint

I have a small Camel route which just forward messages to another queue with an expiration time like this:
#Override
public void configure() throws Exception {
defaultOnException();
// Route all messages generated by system A (in OUTBOUND_A) to system B (INBOUND_B)
// #formatter:off
from("activemq:queue:OUTBOUND_A")
// ASpecificProcessor transform the coming message to another one.
.processor(new ASpecificProcessor())
.to("activemq:INBOUND_B?explicitQosEnabled=true&timeToLive={{b.inbound.message.ttl}}");
// #formatter:on
}
I need the messages posted in INBOUND_B to be persistent and by default the expired message goes to ActiveMQ.DLQ queue after expired.
I know I can modify the ActiveMQ configuration in the conf/activemq.xml with
<policyEntry queue="INBOUND_B">
<!--
Tell the dead letter strategy not to process expired messages
so that they will just be discarded instead of being sent to
the DLQ
-->
<deadLetterStrategy>
<sharedDeadLetterStrategy processExpired="false" />
</deadLetterStrategy>
</policyEntry>
But I would prefer not to change the ActiveMQ configuration (because it needs a restart) and I am wondering if it is possible to send such policy through the Camel endpoint configuration?
No, ActiveMQ broker side configuration cannot be updated via the client, that would lead to all sorts of security problems. You would need to update the broker configuration and possibly not need a restart if you use the runtime configuration plugin on the broker.

Resources