MQTTIO Connection with Apache Beam behaves differently for different topics - apache-flink

When I install a Mosquitto Broker and publish messages to a topic and subscribe the the messages using an Apache Beam MQTTIO pipeline and print the message in the console, I am able to get all the messages.
Even after a gap of 5 minutes if I publish a message, I am able to see the message in the console of the Beam application
Now I changed nothing but the ServerUri in the ConnectionConfiguration of MQTTIO.Read() and gave a corresponding topic that is available in the that broker. Refer Documentation
This change made the application work differently, only the topics that have higgh frequency of messages are being printed in the console. When the message frequency if 1 message per minute or somewhere near to that range of frequency, only the first message is getting printed in the console.
I even tried out withMaxNumRecords and withMaxReadTime but still it just lists the first message.
Is there Some timeout that is happening in the broker configuration because of its idlenesss between messages?
The same behaviour was observed in runners - DirectRunner and FlinkRunner
Changing the options to --streaming=true for the pipeline using the arguments did not work as well

Related

replay failure messages whenever I want in any time

#camel Hi Devs, I am currently working on a camel to transform messages from Source to Target systems, I am stuck with an issue i.e., I want redelivery my message when any exceptions occurred or due to failure caused by endpoints. I had checked the camel docs then I got a info related to Redelivery Polices It is working as per the given delay time. But my problem is used to replay messages whenever I want. For example last year there are some messages which got failure those payloads are stored in my system. So I want to replay those messages this year. like Replay. Can any devs help me on this cause? Thanks.
You can simply use the DeadLetterChannel EIP (https://camel.apache.org/components/3.16.x/eips/dead-letter-channel.html).
This will put your failed messages in a special channel (typically a persistant JMS queue). To replay a message, you only have to move it from the DLC ("myqueue.dead") to the original queue ("myqueue") .

How can I send a notification to an endpoint when a Throttler starts/stops throttling?

As the title says, I would like to send a notification to an endpoint if messages to it start getting throttled, and another message when the throttling stops.
I currently have the following (very basic) route configuration:
from("test-jms:queue:test.queue")
.throttle(2)
.to("file://test");
This configuration throttles messages just fine, but I need a way to let the consumer know that the messages are being throttled.
When the Throttler starts throttling, I would like to send a notification to the 'to' endpoint so those reading the messages know that they are being throttled. I would also like to be able to send another message when the Throttler is no longer throttling, so the consumer knows the messages are up to date.
This doesn't appear to be something the Throttler does. The only way I see of getting a notification when it starts throttling is setting rejectExecution to true, at which point it will throw an exception. The problem is that execution stops at that point, and no more messages are passed through (since an exception was thrown).
My current thoughts are that I will need to create a custom bean/processor/something that performs essentially the same function as the Throttler, but also injects a message when the throttling starts or stops. I don't want to do that unless I really need to, though. Any help is appreciated. Thanks!
No the throttler eip does not support such information (however you may be able to grab some statistics via JMX). A different thought would be to reverse the direction so the consumers signals upstream when they want new messages (this is what reactive systems does).
I assume the above to write to a file is just some example, what consumers are you using in real life, and do they really need to know that some messages are backed up within a short-time period of 1 second because they are throttled? Also since your source is JMS, you can also look at the route throttling policy, where you can suspend/resume the JMS consumer instead of using the throttler EIP.

Google PubSub Simultaneous Publish Requests

In Google PubSub, the publish call from the client can be called asynchronously. Because of this, I would think that it would be possible to have multiple publish requests triggered and sent to the server, all at the same time, especially if the batch thresholds are too low.
If this is true, how does the pubsub client control the number of simultaneous publish requests that can be created? Is there a hard limit, or an error that can occur if too many requests are created? Is this the intended use of having an asynchronous publisher, or is simply to allow for other non-publishing activity to occur?
Though this question applies to any of the clients, we are specifically having an issue with the C# client, and are intermittently receiving the following error:
Grpc.Core.RpcException: Status(StatusCode=DeadlineExceeded, Detail="Deadline Exceeded")
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Google.Api.Gax.Grpc.ApiCallRetryExtensions.<>c__DisplayClass0_0`2.<<WithRetry>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
My thought is that we are sending too many publish requests..., but I am not sure.
I would advice using the raw gRPC code, but use the client library that has a very thin wrapper.
Looking at the client source code always helps me, you can find for c# code here PublisherClient.cs (thin wrapper)
If you are using PublishAsync it queues/batch the messages anyway, the behaviour is controlled the settings you give to the client (see PublisherServiceApiClient for how to tune it). You can also control the number of client connections that are used to send the queues in the client. I suggest playing with the batch-size first, then the number of connections till you found your sweet spot for your throughput.

App Engine silently fails on some requests

Some requests silently fail in my python app, intermittently and unpredictably. The hallmarks of the failure are:
Request returns a 200, so the client doesn't know there's a problem.
Request does NOT successfully execute on the server.
No logging statements are recorded for the request.
Below is an example from my logs of a bunch of requests which are each supposed to write an entity to the datastore. You can see for the lower, successful request, a blue 'i' is present, indicating that info level logs were recorded. When I examine the datastore, an entity was successfully written for this request.
However, for the failed request, you can see there is just a white box, and there are no logging statements present at all. While the server returned a 200, no entity was written to the datastore for this request.
Has anyone encountered something like this before on App Engine? Any ideas on how to debug it? I've seen it in multiple different apps myself, but I've never been able to figure it out.
EDIT
To clarify, the main problem here is that code doesn't execute, as measured by the failure to write an entity. The spurious 200 and lack of logging is an associated symptom.
From a comment originally, but seems to be the resolution path for this issue:
Given that there are no log statements at all in the line and you appear to unpack the arguments and log them as soon as you enter the handler, this starts to look like an infrastructure/platform issue.
In such a case, it's best to open a public issue tracker issue, with "Type-Production" as a tag, including your app's app id and a timeframe, and as much information about your app and request handler involved as possible, and platform support will pick up the issue in the course of triage.
That said, it's worth examining the handler to make absolutely sure there's no way you could be exiting from the handler and sending a 200 without logging anything or seeing an exception. It all depends on what the code handling the request is capable of, what stack of libraries it's build upon, etc.

receive an alert when job is *not* running

I know how to set up a job to alert when it's running.
But I'm writing a job which is meant to run many times a day, and I don't want to be bombarded by emails, but rather I'd like a solution where I get an alert when the job hasn't been executed for X minutes.
This can be acheived by setting the job to alert on execution, and then setting up some process which checks for these alerts, and warns when no such alert is seen for X minutes.
I'm wondering if anyone's already implemented such a thing (or equivalent).
Supporting multiple jobs with different X values would be great.
The danger of this approach is this: suppose you set this up. One day you receive no emails. What does this mean?
It could mean
the supposed-to-be-running job is running successfully (and silently), and so the absence-of-running monitor job has nothing to say
or alternatively
the supposed-to-be-running job is NOT running successfully, but the absence-of-running monitor job has ALSO failed
or even
your server has caught fire, and can't send any emails even if it wants to
Don't seek to avoid receiving success messages - instead devise a strategy for coping with them. Because the only way to know that a job is running successfully is getting a message which says precisely this.

Resources