PubSub stops pulling messages from the topic - google-cloud-pubsub

We are facing this strange PubSub behaviour when our MDM server is idle over a period of time. We create a topic and a subscription during the server start-up, and our server is programmed to pull the subscription messages from the topic asynchronously using the Receive() function. We are creating the PubSub client using the service account credentials.
What we have observed is that, when we run our MDM server on a MAC machine, we see this RPC error intermittently in our server logs:
{
"error": "rpc error: code = Unauthenticated desc = Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.",
"file": "/opt/bigfix/src/bigfix/mdm-android/internal/pubsub/subscriber.go:50",
"func": "bigfix/mdm-android/internal/pubsub.GetPubSubMessages",
"level": "error",
"module": "mqpublisher",
"msg": "Error receiving Pub/Sub messages",
"time": "2022-03-03T04:50:25Z"
}
Note that this error is caught on the line subscription.Receive() in our code. And, any device enrollment messages post this RPC error, aren’t getting delivered to our PubSub client. When we try to pull the pending messages from the topic in Google console, we are able to see all the undelivered messages there. When we restart our server, PubSub connection is made afresh and all these pending commands are pushed to our PubSub client.
We tried to fix the issue in 2 ways:
The version of PubSub package that we were using was 1.6.1, and we tried to upgrade it to the latest version which is 1.18.0, but the problem still persists
Tried to wrap subscription.Receive() with a retry function so that it can keep retrying upon errors, but still the problem persists.
And no matter if we receive this RPC error or not, our MDM server is not able to pull any messages from the topic when it is idle over a period of time. This behaviour is seen only when our server runs on a MAC machine, and works fine in other operating systems.
Kindly guide us as to fix this issue.
Thanks,
Porkodi

Related

Network request failed from fetch in reactjs app

I am using fetch in a NodeJS application. Technically, I have a ReactJS front-end calling the NodeJS backend (as a proxy), and then the proxy calls out to backend services on a different domain.
However, from logging errors from consumers (I haven't been able to reproduce this issue myself) I see that a lot of these proxy calls (using fetch) throw an error that just says Network Request Failed, which is of no help. Some context:
This only occurs on a subset of all total calls (lets say 5% of traffic)
Users that encounter this error can often make the same call again some time later (next couple minutes/hours/days) and it will go through
From Application Insights, I can see no correlation between browsers, locations, etc
Calls often return fast, like < 100 ms
All calls are HTTPS, non are HTTP
We have a fetch polyfill from fetch-ponyfill that will take over if fetch is not available (Internet Explorer). I did test this package itself and the calls went through fine. I also mentioned that this error does occur on browsers that do support fetch, so I don't think this is the error.
Fetch settings for all requests
Method is set per request, but I've seen it fail on different types (GET, POST, etc)
Mode is set to 'same-origin'. I thought this was odd, since we were sending a request from one domain to another, but I tried to set it differently and it didn't affect anything. Also, why would some requests work for some, but not for others?
Body is set per request, based on the data being sent.
Headers is usually just Accept and Content-Type, both set to JSON.
I have tried researching this topic before, but most posts I found referenced React native applications running on iOS, where you have to set some security permissions in the plist file to allow HTTP requests or something to do with transport security.
I have implement logging specific points for the data in Application Insights, and I can see that fetch() was called, but then() was never reached; it went straight to the .catch(). So it's not even reaching code that parses the request, because apparently no request came back (we then parse the JSON response and call other functions, but like I said, it doesn't even reach this point).
Which is also odd, since the request never comes back, but it fails (often) within 100 ms.
My suspicions:
Some consumers have some sort of add-on for there browser that is messing with the request. Although, I run with uBlock Origin and HTTPS Everywhere and I have not seen this error. I'm not sure what else could be modifying requests that would cause it to immediately fail.
The call goes through, which then reaches an Azure Application Gateway, which might fail for some reason (too many connected clients, not enough ports, etc) and returns a response that immediately fails the fetch call without running the .then() on the response.
For #2, I remember I had traced a network call that failed and returned Network Request Failed: Made it through the proxy -> made it through the Application Gateway -> hit the backend services -> backend services sent a response. I am currently requesting access to backend service logs in order to verify this on some more recent calls (last time I did this, I did it through a screenshare with a backend developer), and hopefully clear up the path back to the client (the ReactJS application). I do remember though that it made it to the backend services successfully.
So I'm honestly not sure what's going on here. Does anyone have any insight?
Based on your excellent description and detective work, it's clear that the problem is between your Node app and the other domain. The other domain is throwing an error and your proxy has no choice but to say that there's an error on the server. That's why it's always throwing a 500-series error, the Network Request Failed error that you're seeing.
It's an intermittent problem, so the error is inconsistent. It's a waste of your time to continue to look at the browser because the problem will have been created beyond that, either in your proxy translating that request or on the remote server. You have to find that error.
Here's what I'd do...
Implement brute-force logging in your Node app. You can use Bunyan, or Winston or just require(fs) and write out to some file when an error occurs. Then look at the results. Only log it out when the response code from the other server is in the 400 or 500 ranges. Log the request object and the response object.
Something like this with Bunyan:
fetch(urlToRemoteServer)
.then(res => res.json())
.then(res => whateverElseYoureDoing(res))
.catch(err => {
// Get the request & response to the remote server
log.info({request: req, response: res, err: err});
});
where the res in this case is the response we just got from the other domain and req is our request to them.
The logs on your Azure server will then have the entire request and response. From this you can find commonalities. and (🀞) the cause of the problem.

React WebSocket on server, Error during WebSocket handshake

I have made an application that uses a WebSocket. I want to run this application on a PLESK Server, but I get this error message 'WebSocket connection to 'wss://sub.domain.com/' failed: Error during WebSocket handshake: Unexpected response code: 404' when I visit the website.
After reading a few blogs, I came to the conclusion you have to adjust something in the NGINX settings, but this also appears to have no effect, or I have to do it wrong, that is also possible. (Another blog)
At the moment our API is installed on a subdomain (with the NodeJS add-on in plesk) and it also works. As soon as I start up the files locally I can connect to the WebSocket and the API so it must be up to the PLESK Server I think. At this moment only the NGINX 'Proxy mode' is off and the 'Additional nginx directives' empty.
Perhaps one of you is familiar with this problem?

How to fetch data via http request from Node.Js on AppEngine?

Everything works perfect when I run locally. When I deploy my app on AppEngine, for some reason, the most simple request gets timeout errors. I even implemented retry and, while I made some progress, it still not working well.
I don't think it matter since I don't have the problem running on local, but here's the code I just used for request-retry module:
request({
url: url,
maxAttempts: 5,
retryDelay: 1000, // 1s delay
}, function (error, res, body) {
if (!error && res.statusCode === 200) {
resolve(body);
} else {
console.log(c.red, 'Error getting data from url:', url, c.Reset);
reject(error);
}
});
Any suggestions?
Also, I can see this errors in the Debug:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
────────────────────
The process handling this request unexpectedly died. This is likely to cause a new process to be used for the next request to your application. (Error code 203)
The error 203 means that Google App Engine detected that the RPC channel has closed unexpectedly and shuts down the instance. The request failure is caused by the instance shutting down.
The other message about a request causing a new process to start in you application is most likely caused by the instances shutting down. This message appears when a new instance starts serving a request. As your instances were dying due to the error 203, new instances were taking its place, serving your new requests and sending that message.
An explaination for why it's working on Google Cloud Engine (or locally) is because the App Engine component causing the error is not present on those environments.
Lastly, if you are still interested in solving the issue with App Engine and are entitled to GCP support, I suggest contacting with the Technical Support team. The issue seems exclusive to App Engine, but I can't answer further about the reason why, that's why I'm suggesting contacting with support. They have more tools available and will be able to help investigate the issue more thoughtfully.

Ionic App aborting request with response status -1

I'm using Ionic platform for my mobile application. Using angular
$http for sending requests to server.
Intermittently when Mobile app tries to access server $http goes to it's errorCallback with response status -1 only no other
data.
When I check log on server, not able to see any hit.
I've changed timeout of application to 2 minutes using interceptors.
I have used chrome debugger but it won't show anything apart from
request it forms, shows nothing in response and preview columns.
I got that in Ionic we use pre-flight to check if server is alive
before sending actual request. But it's for CORS; we have enabled
CORS on server and thats why app is working good since last 15 days.
Thought of using network packet tracer tool but if call not logged on
server no use of it. as Status -1 says $http aborted the request.
My Question is why it's aborting when I click once and do send
when I click same button again.
Please me help to figure out an issue.
After lots of debugging and surfing over internet for issue.
I guess that an issue was like mobile app sending pre-flight messages and so $http aborting the request and even some time Server played a culprit here how will tell you;
We have server hosted on AWS in where we had Load balancer in different zone and actual API server is in different zone. After changing them to same zone ask, production people to test now they are not getting this issue.
The another reason was we were using unstable mobile networks to test.
If any one have any thing else on this please let me know.

How to debug sporadic 403 connection issues with angular-signalr-hub and windows authenitification

I am using signalR with signalr-hub and a windows authentication ASP.net Web Api project.
I have created two hubs in my project that both subscribe to the stateChanged and errorHandler listeners. Sometimes, the page loads without issue, but other times a 403 connection issue is thrown. The error message is as follows:
Error: No transport could be initialized successfully. Try specifying a different transport or none at all for auto initialization.
This is sometimes followed by the error message:
Error: Error during start request. Stopping the connection.
responseText: "Unrecognized user identity. The user identity cannot change during an active SignalR connection."
I believe that the error is caused by a race condition with windows authentication. That when the browser is first loaded the connection is anonymous by default, then signalR starts to connect, then windows auth returns the actual user, then SignalR notes that the connection has changed mid request and throws an error.
I have so far looked in trying to delay the connection to signalR with a $timeout to wait for Windows Auth first, but this only seemed to create more frequent errors.
I will look to supply more information with this question soon, in the meantime, I would appreciate any general ideas in how to solve\ get around\ debug this issue.
The solution is to disable anonymous authentication. Otherwise if SignalR is connecting and authentication transitions from anonymous to user authentication whilst the signalr request is processing, then an error is thrown.
Anonymous authentication can be disabled in web.config as follows:
<system.web>
<authentication mode="Windows" />
<authorization>
<deny users="?" />
</authorization>
...
I found this answer here: answer

Resources