PubSub: StatusRuntimeException: UNAVAILABLE from StreamingSubscriberConnection - google-cloud-pubsub

We are running our Spring Boot app in a GCP with Kubernetes. We are using spring-cloud-gcp-starter-pubsub 1.1.0.RC1 -> google-cloud-pubsub:1.54.0. Lately we started to get exceptions:
logger_name com.google.cloud.pubsub.v1.StreamingSubscriberConnection
method onFailure
severity WARN
stack_tracecom.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE:
with exception messages like:
- Network closed for unknown reason io exception
- HTTP/2 error code:
- NO_ERROR Received Goaway max_age
- Authentication backend unavailable.
- The service was unable to fulfill your request. Please try again.
[code=8a75] 502:Bad Gateway
The exceptions most often come in clusters:
Feb 04 11:29:00.615
Feb 04 11:29:00.479
Feb 04 10:35:48.256
Feb 04 10:35:32.024
Feb 04 10:35:03.760
Feb 04 10:34:52.094
Feb 04 07:36:31.430
Feb 04 07:06:17.025
Feb 04 06:42:13.529
Feb 04 04:32:50.265
Feb 04 04:32:49.845
Feb 04 04:32:49.746
Feb 04 02:57:36.678
Feb 04 02:57:35.700
We get about 10 of these exceptions each day and can not find any relation to something happening in the system like deploys, heavy load etc.
My questions are:
Will the messages be handled by the subscriber although the ack doesn't succeed? Looks like it will try to ack them again... but want to be sure.
How can I continue investigate what is going on?

If you did not yet ack the messages, Cloud Pub/Sub will continue trying to deliver them until you do ack them or they expire based on your settings.
Without knowing more about your code, its difficult to give you any advice on how to proceed. Long standing grpc channels (such as the streaming pull channel used for subscribers) can be broken by transient network errors. I'd suggest you proceed by filing a bug against the corresponding client libraries you're using to see if they can resolve this issue or push this upstream to the Cloud PubSub client libraries which they are (presumably) using.

Related

Informix server is not able to start time to time

I am trying to find the root cause of the issue with the Informix installation where the server is restarted every night (some legacy setup). Most of the restarts are OK, but from time to time the database does not start.
It ends just with the following lines:
...
Thu Dec 15 00:36:17 2022
00:36:17 Successfully added a bufferpool of page size 2K.
00:36:17 Event alarms enabled. ALARMPROG = '/opt/IBM/informix/12.10/etc/alarmprogram.sh'
00:36:17 Booting Language <c> from module <>
00:36:17 Loading Module <CNULL>
00:36:17 Booting Language <builtin> from module <>
00:36:17 Loading Module <BUILTINNULL>
00:36:22 Entries in the surrogates file /etc/informix/allowed.surrogates are loaded into surrogate cache.
00:36:22 Trusted host cache successfully built:/etc/hosts.equiv.
00:36:22 DR: DRAUTO is 0 (Off)
00:36:22 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled)
00:36:22 Event notification facility epoll enabled.
00:36:23 IBM Informix Dynamic Server Version 12.10.FC2WE Software Serial Number AAA#B000000
00:36:23 (5) connection rejected - no calls allowed for sqlexec
00:36:23 (6) connection rejected - no calls allowed for sqlexec
When I then SSH to the server, I am able to start it normally. Do you please have any idea what can cause this issue?
Thank you.
On the link bellow, there is a log from a successful start and from a failed on.
https://pastebin.com/hQPBccX6

ESP32 AWS IoT transportStatus=-1

I'm following the espressif docs for connecting a ESP32 to AWS IoT shadow. I'm using the example github.com/espressif/esp-aws-iot for shadow mqtt synchronisation. I set everything in the config but when I run it on the ESP32, I get the following error:
--- until here everything runs fine ---
--- mqtt connects to aws and it's success ---
--- and then ---
I coreMQTT: SUBSCRIBE topic $aws/things/MY_DEVICE_NAME/shadow/name/MY_SHADOW_NAME/delete/accepted to broker.
E coreMQTT: A single byte was not read from the transport: transportStatus=-1.
E coreMQTT: Receiving incoming packet length failed. Status=MQTTRecvFailed
E coreMQTT: Exiting process loop due to failure: ErrorStatus=MQTTRecvFailed
E coreMQTT: MQTT_ProcessLoop returned with status = 4.
I tried increasing the network buffer for MQTT packets via the config to 4096 but that didn't help. Anyone know what the problem might be?
I found the error. Something was messed up with the policy I created for that device. I'm still not 100% sure what exactly was the problem, but when I removed the code that tried to delete the shadow first before the actual shadow operations begin, the example worked fine. I can now pub/sub to my shadow.
For anyone having the same problem, comment the code from line 691-746 (part for deletion).

PJSIP connection errors on ios 11 when coming from background during Push Notification

How do we recover a lost PJSIP UDP socket when coming from background during a CallKit Push notification? We get the following errors when trying to register with Asterisk server.
ioq_select Error replacing socket [120009]: Bad file descriptor
We've tried closing and recreating the pjsip transport when we encounter these errors, but that is effective only ~50% of the time. By the time the transport is successfully created the call is lost. Is there a more robust way to handle the UDP socket loss?
Any attempt to proactively close down the socket/transport when entering the background (in application:applicationDidEnterBackground) results in an unregister packet being sent to asterisks server. Any background calls go straight to voicemail.
We are testing on ios 11 and 12 using pjsip 2.8

Does a 'Broken pipe' exception cancel my job?

Currently I am running a Flink program on a remote cluster of 4 machines using 144 TaskSlots. After running for around 30 minutes I received the following error:
INFO
org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet - Info
server for jobmanager: Failed to write json updates for job
b2eaff8539c8c9b696826e69fb40ca14, because
org.eclipse.jetty.io.RuntimeIOException:
org.eclipse.jetty.io.EofException at
org.eclipse.jetty.io.UncheckedPrintWriter.setError(UncheckedPrintWriter.java:107)
at
org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:280)
at
org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:295)
at
org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet.writeJsonUpdatesForJob(JobManagerInfoServlet.java:588)
at
org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet.doGet(JobManagerInfoServlet.java:209)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
at org.eclipse.jetty.server.Server.handle(Server.java:352) at
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
at
org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
at java.lang.Thread.run(Thread.java:745) Caused by:
org.eclipse.jetty.io.EofException at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:905)
at
org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:427)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:78) at
org.eclipse.jetty.server.HttpConnection$Output.flush(HttpConnection.java:1139)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:159) at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:86) at
java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:154)
at org.eclipse.jetty.server.HttpWriter.write(HttpWriter.java:258) at
org.eclipse.jetty.server.HttpWriter.write(HttpWriter.java:107) at
org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:271)
... 24 more Caused by: java.io.IOException: Broken pipe at
sun.nio.ch.FileDispatcherImpl.write0(Native Method) at
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at
sun.nio.ch.IOUtil.write(IOUtil.java:51) at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:185)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:256)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:849)
... 33 more
I know that java.io.IOException: Broken pipe means that the JobManager lost some kind of connection so I guess the whole job failed and I have to restart it. Although I think the process is not running anymore the WebInterface still lists it as running. Additionally the JobManager is still present when I use jps to identify my running processes on the cluster. So my question is if my job is lost and whether this error is happening randomly sometimes or whether my program caused it.
EDIT: My TaskManagers still send Heartbeats every few seconds and seem to be running.
It's actually a problem of the JobManagerInfoServlet, Flink's web server, which cannot sent the latest JSON updates of the requested job to your browser because of the java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method). Thus, only the GET request to the server failed.
Such a failure should not affect the execution of the currently running Flink job. Simply refreshing your browser (with Flink's web UI) should send another GET request which then hopefully completes successfully.

XBee End device stops responding after network join

I have two XBee S2 modules.
First module has freshly uploaded ZigBee Coordinator API version 21A7. PAN ID = 1000; AP=2; the rest has default values.
Second has freshly loaded ZigBee End Device API version 29A7. PAN ID=1000; AP=2; the rest has default values.
While end device is not joined to coordinator, it responds to all AT commands. For example AT NI command (7E 00 04 08 01 4E 49 5F) returns correct AT Command Response.
After turning on coordinator, end device correctly joins coordinator's network, but stops responding to local (and transmitting remote) AT commands.
Despite this non-functionality end device is correctly responding to remote AT commands from coordinator.
Do you have some idea please?
It sounds like the end device might be sleeping once it's joined to the coordinator. You can't send serial data to it while it's sleeping, and may need to monitor the CTS signal coming from the XBee. Or, make use of the "sleep request" pin on the end device for the host to signal the XBee module to wake up.
If you don't have low power requirements on your project, I'd recommend using a "router" device configuration instead of a sleepy end device. Routers on the network form a mesh for transferring information, and you don't have to worry about multiple issues related to sleeping (host can't send serial data to sleeping end device, remote devices can only have one outstanding frame pending for a sleeping device, etc.)

Resources