Observed non-terminating stream error 503 DNS resolution failed for pubsub.googleapis.com:443: UNAVAILABLE: OS Error - google-cloud-pubsub

I have Feed Completion listener program for GCS using a Pub/Sub subscription in tandem with a post-processing job.'''
subscriber = pubsub_v1.SubscriberClient(credentials=cred) subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION_NAME)
flow_control = pubsub_v1.types.FlowControl(max_messages=500)
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback, flow_control=flow_control)
with subscriber:
try:
streaming_pull_future.result(timeout=TIMEOUT)
except TimeoutError:
streaming_pull_future.cancel()
logging.info("The filewatcher process has completed.")
successes, failures = [], []
for feed in created_local_done_file:
if created_local_done_file[feed]:
successes.append(feed)
else:
failures.append(feed)
logging.info("These Juniper feeds completed and proceeded to the next step:")
counter = 1
for feed in successes:
logging.info("\t%s: %s", counter, feed)
counter += 1
logging.warning("These Juniper feeds did not publish a message to pub/sub (possibly due to failure):")
counter = 1
for feed in failures:
logging.warning("\t%s: %s", counter, feed)
counter += 1
if warn_of_dupes:
logging.warning("Reminder: There are overlapping feed names in the config files. This could cause errors/unexpected behaviour in the future!")
logging.info("Exiting...")
if len(failures) > 0:
sys.exit(1)
else:
sys.exit(0)
except Exception as error:
streaming_pull_future.cancel()
logging.critical("Listening for messages on %s threw an exception: %s.", SUBSCRIPTION_NAME, error)
sys.exit(1)
2022-06-17 09:12:25,932 PUB/SUB INFO -- List of feeds detected by the reader: {'XXX', 'YYY'}
2022-06-17 09:12:26,057 PUB/SUB INFO -- There are 2 feeds detected in total.
2022-06-17 09:12:26,057 PUB/SUB INFO -- Observed non-terminating stream error 503 DNS resolution failed for pubsub.googleapis.com:443: UNAVAILABLE: OS Error
2022-06-17 09:12:26,057 PUB/SUB INFO -- Listening for messages on projects/hsbc-11545401-datamesh-dev/subscriptions/datamesh-dev-control-api-service...
2022-06-17 09:12:26,057 PUB/SUB INFO -- Observed recoverable stream error 503 DNS resolution failed for pubsub.googleapis.com:443: UNAVAILABLE: OS Error

Related

PAHO MQTT 5 throwing exception when using same clientId in routes

When using paho-mqtt5:test more than once with same clientId then it throw exception Client not connected but if i will use different clientId for each to and from then it will work fine
2021-10-05 19:25:28,650 ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #0 - timer://test) Failed delivery for (MessageId: 871E4623819E4FB-000000000000001B on ExchangeId: 871E4623819E4FB-000000000000001B). Exhausted after delivery attempt: 1 caught: Client is not connected (32104)
Message History (complete message history is disabled)
---------------------------------------------------------------------------------------------------------------------------------------
RouteId ProcessorId Processor Elapsed (ms)
[route1 ] [route1 ] [from[timer://test?period=1000] ] [ 0]
...
[route1 ] [to1 ] [paho:test ] [ 0]
Stacktrace
---------------------------------------------------------------------------------------------------------------------------------------
: Client is not connected (32104)
at org.eclipse.paho.mqttv5.client.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:32)
at org.eclipse.paho.mqttv5.client.internal.ClientComms.sendNoWait(ClientComms.java:231)
at org.eclipse.paho.mqttv5.client.MqttAsyncClient.publish(MqttAsyncClient.java:1530)
at org.eclipse.paho.mqttv5.client.MqttClient.publish(MqttClient.java:564)
at org.apache.camel.component.paho.mqtt5.PahoMqtt5Producer.process(PahoMqtt5Producer.java:55)
at org.apache.camel.support.AsyncProcessorConverterHelper$ProcessorToAsyncProcessorBridge.process(AsyncProcessorConverterHelper.java:66)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:172)
at org.apache.camel.processor.errorhandler.RedeliveryErrorHandler$SimpleTask.run(RedeliveryErrorHandler.java:463)
at org.apache.camel.impl.engine.DefaultReactiveExecutor$Worker.schedule(DefaultReactiveExecutor.java:179)
at org.apache.camel.impl.engine.DefaultReactiveExecutor.scheduleMain(DefaultReactiveExecutor.java:64)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:184)
at org.apache.camel.impl.engine.CamelInternalProcessor.process(CamelInternalProcessor.java:398)
at org.apache.camel.component.timer.TimerConsumer.sendTimerExchange(TimerConsumer.java:210)
at org.apache.camel.component.timer.TimerConsumer$1.run(TimerConsumer.java:76)
at java.base/java.util.TimerThread.mainLoop(Timer.java:556)
at java.base/java.util.TimerThread.run(Timer.java:506)
Here is my code which is throwing exception
#ApplicationScoped
class TestRouter : RouteBuilder() {
override fun configure() {
val mqtt5Component = PahoMqtt5Component()
mqtt5Component.configuration = PahoMqtt5Configuration().apply {
brokerUrl = "tcp://192.168.99.101:1883"
clientId = "paho123"
isCleanStart = true
}
context.addComponent("paho-mqtt5", mqtt5Component)
from("timer:test?period=1000").setBody(constant("Testing timer2")).to("paho-mqtt5:test")
from("paho-mqtt5:test").process { e ->
val body = (e.`in`?.body as? ByteArray)?.let { String(it) }
println("test body 1 => $body")
}
}
}
#William, this is expected behavior
The message broker uses the client id to differentiate between clients so it can perform housekeeping for a client connection that is no longer used
In addition, a client may have a "Last Will and Testament" that the broker keeps track of
It is acceptable to append a random number to the end of your current 'clientId' since it is likely no one but you will care about this
If you have access to the individuals login, you could use that as well but you would still want to make each session unique in case they run multiple sessions
Maybe I don't understand what your problem is
Each client must have a unique Id
What are you observing that makes you think that it is creating multiple connections for a single client?
Is there a chance you are opening multiple windows and each is generating a different clientId?
This is a good way to diagnose issues by monitoring what the server is seeing
My paho-mqtt client (Javascript) is connecting as "webclient" and I append a randome number (webclient173) to identify this client
To troubleshoot, I would suggest you close all connections on the client and monitor the log of the MQTT process
When the monitor is in place, open a connection from a client that currently has no connections
This is an example connection to my Mosquitto log file
$ tail -f /var/log/mosquitto/mosquitto.log
1635169943: No will message specified.
1635169943: Sending CONNACK to webclient173 (0, 0)
1635169943: Received SUBSCRIBE from webclient173
1635169943: testtopic (QoS 0)
1635169943: Sending SUBACK to webclient173
1635170003: Received PINGREQ from webclient173
1635170003: Sending PINGRESP to webclient173
1635170003: Received PINGREQ from webclient173
1635170003: Sending PINGRESP to webclient173
What does your log show?

Getting Status Code: 500 errors when accessing localstack Kinesis stream with Camel

I'm receiving 500 status code errors when connecting with Camel to my localstack Kinesis stream. Accessing and putting msgs to the stream via awslocal works fine. So it's probably not a localstack issue.
Localstack
# Start localstack
DEBUG=1 SERVICES=kinesis AWS_CBOR_DISABLE=true CBOR_ENABLED=false localstack start
# Setup stream
awslocal kinesis create-stream --shard-count 4 --stream-name mystream
Java Config (I'm using Quarkus)
public void configure() {
BasicAWSCredentials awsCreds = new BasicAWSCredentials(mystream.getAccessKeyId(), mystream.getSecretKey());
AmazonKinesis client = AmazonKinesisClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
.withEndpointConfiguration(
new AwsClientBuilder.EndpointConfiguration(mystream.getServiceEndpoint(), mystream.getRegion()))
.build();
getContext().setTracing(true);
getContext().getRegistry().bind("mystreamClient", client);
from("aws-kinesis://mystream?amazonKinesisClient=mystreamClient&bridgeErrorHandler=true&shard-id=1").routeId("mystream1").log("Received mystream");
}
Error msg:
ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #0 - aws-kinesis://mystream) Failed delivery for (MessageId: 76F94B329B1EE4F-0000000000000002 on ExchangeId: 76F94B329B1EE4F-0000000000000002). Exhausted after delivery attempt: 1 caught: com.amazonaws.services.kinesis.model.AmazonKinesisException: null (Service: AmazonKinesis; Status Code: 500; Error Code: null; Request ID: null)

Google Cloud Run pubsub pull listener app fails to start

I'm testing pubsub "pull" subscriber on Cloud Run using just listener part of this sample java code (SubscribeAsyncExample...reworked slightly to fit in my SpringBoot app):
https://cloud.google.com/pubsub/docs/quickstart-client-libraries#java_1
It fails to startup during deploy...but while it's trying to start, it does pull items from the pubsub queue. Originally, I had an HTTP "push" receiver (a #RestController) on a different pubsub topic and that worked fine. Any suggestions? I'm new to Cloud Run. Thanks.
Deploying...
Creating Revision... Cloud Run error: Container failed to start. Failed to start and then listen on the port defined
by the PORT environment variable. Logs for this revision might contain more information....failed
Deployment failed
In logs:
2020-08-11 18:43:22.688 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 4606 ms
2020-08-11T18:43:25.287759Z Listening for messages on projects/ce-cxmo-dev/subscriptions/AndySubscriptionPull:
2020-08-11T18:43:25.351650801Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x31,0x3eca02dfd974,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11T18:43:25.351770555Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x12,0x3eca02dfd97c,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11 18:43:25.680 WARN 1 --- [ault-executor-0] i.g.n.s.i.n.u.internal.MacAddressUtil : Failed to find a usable hardware address from the network interfaces; using random bytes: ae:2c:fb:e7:92:9c:2b:24
2020-08-11T18:45:36.282714Z Id: 1421389098497572
2020-08-11T18:45:36.282763Z Data: We be pub-sub'n in pull mode2!!
Nothing else after this and the app stops running.
#Component
public class AndyTopicPullRecv {
public AndyTopicPullRecv()
{
subscribeAsyncExample("ce-cxmo-dev", "AndySubscriptionPull");
}
public static void subscribeAsyncExample(String projectId, String subscriptionId) {
ProjectSubscriptionName subscriptionName =
ProjectSubscriptionName.of(projectId, subscriptionId);
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
// Handle incoming message, then ack the received message.
System.out.println("Id: " + message.getMessageId());
System.out.println("Data: " + message.getData().toStringUtf8());
consumer.ack();
};
Subscriber subscriber = null;
try {
subscriber = Subscriber.newBuilder(subscriptionName, receiver).build();
// Start the subscriber.
subscriber.startAsync().awaitRunning();
System.out.printf("Listening for messages on %s:\n", subscriptionName.toString());
// Allow the subscriber to run for 30s unless an unrecoverable error occurs.
// subscriber.awaitTerminated(30, TimeUnit.SECONDS);
subscriber.awaitTerminated();
System.out.printf("Async subscribe terminated on %s:\n", subscriptionName.toString());
// } catch (TimeoutException timeoutException) {
} catch (Exception e) {
// Shut down the subscriber after 30s. Stop receiving messages.
subscriber.stopAsync();
System.out.printf("Async subscriber exception: " + e);
}
}
}
Kolban question is very important!! With the shared code, I would like to say "No". The Cloud Run contract is clear:
Your service must answer to HTTP request. Out of request, you pay nothing and no CPU is dedicated to your instance (the instance is like a daemon when no request is processing)
Your service must be stateless (not your case here, I won't take time on this)
If you want to pull your PubSub subscription, create an endpoint in your code with a Rest controller. While you are processing this request, run your pull mechanism and process messages.
This endpoint can be called by Cloud Scheduler regularly to keep the process up.
Be careful, you have a max request processing timeout at 15 minutes (today, subject to change in a near future). So, you can't run your process more than 15 minutes. Make it resilient to fail and set your scheduler to call your service every 15 minutes

Gatling active users drops to negative for a longer run after timeout exception

I'm running a simulation using gatling.
5 users per second for 150 mins.
After a certain exception:
15:58:19.643 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_outbound_msg' failed for user 29989
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.643 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_outbound_msg' failed for user 29989: j.n.s.SSLException: handshake timed out
15:58:19.643 [ERROR] i.g.h.c.i.DefaultHttpClient - Failed to install SslHandler
15:58:19.678 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_inbound_msg' failed for user 29984
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.678 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_inbound_msg' failed for user 29984: j.n.s.SSLException: handshake timed out
the number of active users dropped to -1, then everytime this exception happens, the number of users keeps dropping.
Example:
---- FacebookOutboundSimulation ------------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
---- FacebookInboundMessageSimulation ------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
================================================================================
Why does this happen and how to fix this?
The issue was with gatling-charts-highcharts-bundle-3.0.0-RC1 when I switched to gatling-charts-highcharts-bundle-3.0.0-RC3 that was released on 2nd October 2018 it got resolved.
Sidenote: Use AsJson instead of AsJSON.

How to catch BigQuery loading errors from an AppEngine pipeline

I have built a pipeline on AppEngine that loads data from Cloud Storage to BigQuery. This works fine, ..until there is any error. How can I can loading exceptions by BigQuery from my AppEngine code?
The code in the pipeline looks like this:
#Run the job
credentials = AppAssertionCredentials(scope=SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery_service = build("bigquery", "v2", http=http)
jobCollection = bigquery_service.jobs()
result = jobCollection.insert(projectId=PROJECT_ID,
body=build_job_data(table_name, cloud_storage_files))
#Get the status
while (not allDone and not runtime.is_shutting_down()):
try:
job = jobCollection.get(projectId=PROJECT_ID,
jobId=insertResponse).execute()
#Do something with job.get('status')
except:
exc_type, exc_value, exc_traceback = sys.exc_info()
logging.error(traceback.format_exception(exc_type, exc_value, exc_traceback))
time.sleep(30)
This gives me status error, or major connectivity errors, but what I am looking for is functional errors from BigQuery, like fields formats conversion errors, schema structure issues, or other issues BigQuery may have while trying to insert rows to tables.
If any "functional" error on BigQuery's side happens, this code will run successfully and complete normally, but no table will be written on BigQuery. Not easy to debug when this happens...
You can use the HTTP error code from the exception. BigQuery is a REST API, so the response codes that are returned match the description of HTTP error codes here.
Here is some code that handles retryable errors (connection, rate limit, etc), but re-raises when it is an error type that it doesn't expect.
except HttpError, err:
# If the error is a rate limit or connection error, wait and
# try again.
# 403: Forbidden: Both access denied and rate limits.
# 408: Timeout
# 500: Internal Service Error
# 503: Service Unavailable
if err.resp.status in [403, 408, 500, 503]:
print '%s: Retryable error %s, waiting' % (
self.thread_id, err.resp.status,)
time.sleep(5)
else: raise
If you want even better error handling, check out the BigqueryError class in the bq command line client (this used to be available on code.google.com, but with the recent switch to gCloud, it isn't any more. But if you have gcloud installed, the bq.py and bigquery_client.py files should be in the installation).
The key here is this part of the pasted code:
except:
exc_type, exc_value, exc_traceback = sys.exc_info()
logging.error(traceback.format_exception(exc_type, exc_value, exc_traceback))
time.sleep(30)
This "except" is catching every exception, logging it, and letting the process continue without any consideration for re-trying.
The question is, what would you like to do instead? At least the intention is there with the "#Do something" comment.
As a suggestion, consider App Engine's task queues to check the status, instead of a loop with a 30 second wait. When tasks get an exception, they are automatically retried - and you can tune that behavior.

Resources