GAE custom Go runtime - internal.flushLog error - google-app-engine

I have recently changed to use custom Go runtime on GAE, and noticed many errors like this from logs:
internal.flushLog: Flush RPC: Call error 3: invalid security ticket: 6c8027dc99b3ed3e
internal.flushLog: Flush RPC: Canceled: (timeout)
The server is still running well, but I have no idea about that error, as well as why it happens.
I'm using a custom Go runtime by using Dockerfile, and App Engine Release is 1.9.37.
Any help to clarify the error would be highly appreciated. Thanks.

This is a known issue with the Go runtime on App Engine Flexible. It tends to happen when a line is logged right before the end of a request/response.
What happens is that when the line is logged it is actually put in a list of log lines to be batched together and sent to the application server as an RPC at periodic intervals. The security ticket is canceled at the end of a request/response which sometimes can happen before the log lines have been flushed. It's harmless, except that you may lose a log line or two. :\
We're actively working on fixing it.

Related

Invalid JWT token happens randomly when using Snowflake+DBT

We've been using Snowflake+DBT with key pair authentication for a long time now, and we've never had any issues.
Recently, we started getting random connection errors on some models:
250001 (08001): Failed to connect to DB: account.region.snowflakecomputing.com:443. JWT token is invalid.
Most of the models will work, but some of them might fail. This can happen to any model, and it's never the same one -- sometimes it's the very first one, sometimes it's the very last one, and sometimes it's a bunch of them. It happens in runs with lots of models, or in runs with a single model.
It's very inconsistent, and there doesn't seems to be any kind of pattern to it. Sometimes it works, and sometimes it doesn't.
We've also tried with 1, 4, or 8 threads, and it happens regardless.
Obviously, there's nothing wrong with the credentials or configurations — otherwise, nothing would run at all. So I assume there must be something wrong with how DBT is handling the connection(s).
Interestingly, the errors only happens locally (so far). We haven't seen it in DBT Cloud runs.
DBT version is 0.20.2 in both cases. We tried with other versions (0.21.0, 0.20.0 & 0.19.1), and the issue persists. I don't know why we're just encountering this, as we've used these other versions previously without any issues.
It's similar to this question, except in our case it doesn't happen consistently at all. We tried connecting "without a region" (using Snowflake Organizations), but it doesn't make any difference:
250001 (08001): Failed to connect to DB: organization-account.snowflakecomputing.com:443. JWT token is invalid.
Is there anything we can do to resolve this?
EDIT: When the error happens, the model hangs for 60 seconds until the error appears.
EDIT 2: I think the error might have started happening when we started using the DBT-provided Docker images. Not sure exactly what might be wrong with them, but we'll try going back to our own custom images and see if that works.
This has since stopped happening. 🤷‍♂️
Possibly because of this change from Snowflake:
"An improvement has been added to Snowflake's Cloud Services layer to relax the validity restrictions."
Also, we are now using different Docker containers, so that might have something to do with it as well:
First, we switched to xemuliam/dbt, from Docker Hub.
But since that is no longer maintained, we are now using the official DBT Docker images from GitHub (which were not available at the time the question was posted):
dbt-core
dbt-postgres
dbt-snowflake

Is Asynchronous function useful in case of buffer pool destroyed?

I am doing a project in apache flink where I need to call multiple APIs so as to achieve my goal. The result of each API is required for the next API to work. Also as I am doing it on a KeyedStream, the same flow will be applicable to multiple data at once.
Below dig. can explain the scenario
/------API1---API2----
KeyedStream ----|------API1---API2----
\------API1---API2----
As I am doing all this, I am getting an exception saying "Buffer pool destroyed" after the job runs for sometime. Is it something related to API call, do I need to make use of Asynchronous function?? Please suggest. Thanks in advance.
a few things that are typically needed to help answer questions about Flink...
What version are you running?
How are you running it (from IDE, YARN cluster, stand-alone, etc)?
What's the complete stack trace for the exception?
(often) Can you share your code?
But at a high level, the "buffer pool destroyed" message you mentioned is not the root cause of failover, it's just a byproduct of Flink trying to kill off the workflow after an error has happened. So you need to dig deeper in the logs (typically Task Manager logs are where you'd look first).

App Engine silently fails on some requests

Some requests silently fail in my python app, intermittently and unpredictably. The hallmarks of the failure are:
Request returns a 200, so the client doesn't know there's a problem.
Request does NOT successfully execute on the server.
No logging statements are recorded for the request.
Below is an example from my logs of a bunch of requests which are each supposed to write an entity to the datastore. You can see for the lower, successful request, a blue 'i' is present, indicating that info level logs were recorded. When I examine the datastore, an entity was successfully written for this request.
However, for the failed request, you can see there is just a white box, and there are no logging statements present at all. While the server returned a 200, no entity was written to the datastore for this request.
Has anyone encountered something like this before on App Engine? Any ideas on how to debug it? I've seen it in multiple different apps myself, but I've never been able to figure it out.
EDIT
To clarify, the main problem here is that code doesn't execute, as measured by the failure to write an entity. The spurious 200 and lack of logging is an associated symptom.
From a comment originally, but seems to be the resolution path for this issue:
Given that there are no log statements at all in the line and you appear to unpack the arguments and log them as soon as you enter the handler, this starts to look like an infrastructure/platform issue.
In such a case, it's best to open a public issue tracker issue, with "Type-Production" as a tag, including your app's app id and a timeframe, and as much information about your app and request handler involved as possible, and platform support will pick up the issue in the course of triage.
That said, it's worth examining the handler to make absolutely sure there's no way you could be exiting from the handler and sending a 200 without logging anything or seeing an exception. It all depends on what the code handling the request is capable of, what stack of libraries it's build upon, etc.

Linq-To-Sql and MARS woes - A severe error occurred on the current command. The results, if any, should be discarded

We have built a website based on the design of the Kigg project on CodePlex:
http://kigg.codeplex.com/releases/view/28200
Basically, the code uses the repository pattern, with a repository implementation based on Linq-To-Sql. Full source code can be found at the link above.
The site has been running for some time now and just about a year ago we started to get errors like:
There is already an open DataReader associated with this Command which must be closed first.
ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.
These are the closest error examples I can find based on my memory. These errors started to occur when the site traffic started to pick up. After banging my head against the wall, I figured out assumed that the problem is inherit within Linq-To-Sql and how we are using the same connection to call multiple commands in a single web request.
Evenually, I discovered MARS (Multiple Active Result Sets) and added that to the data context's connection string and like magic, all of my errors went away.
Now, fast forward about 1 year and the site traffic has increased tremendously. Every week or so, I will get an error in SQL Server that reads:
A severe error occurred on the current command. The results, if any, should be discarded
Immediately after this error, I receive hundreds to thousands of InvalidCastException errors in the error logs. Basically, this error shows up for each and every call to the Linq-To-Sql data context. Only after I restart the web server do these errors clear up.
I read a post on the Micosoft Support site that descrived my problem (minus the InvalidCastException errors) and stating the solution is that if I'm going to use MARS that I should also use Asncronous Processing=True. I tried this, but it did not solve my problem either.
Not really sure where to go from here. Hopefully someone here has seen and solved this problem before.
I have the same issue. Once the errors start, I have to restart the IIS Application Pool to fix.
I have not been able to reproduce the bug in dev despite trying many different scenarios involving multi-threading, leaving connections open, etc etc.
One possible lead I do have is that amongst the errors in the server Event Log is an OutOfMemoryException for the Application Pool. Perhaps this is the underlying cause of the spurious SQL Datareader errors (a memory leak elsewhere). Although again I haven't been able to reproduce this in dev.
Obviously if you are using a 64 bit OS then this is probably not the cause in your case.
So after much refactoring and re-architecting, we figured out that problem all along is MARS (Multiple Active Result Sets) itself. Not sure why or what happens exactly but MARS somehow gets result sets mixed up and doesn't recover until the web app is restarted.
We removed MARS and the errors stopped.
If I remember correctly, we added MARS to solve the problem where a connection/command was already closed using LinqToSql and we tried to access an object graph that hadn't been loaded. Without MARS, we'd get an error. But when we added MARS, it seemed to not care about it. This is really a great example of us not really understanding what the heck we were doing and we learned some valuable (and expensive) lessons from this.
Hope this helps others who have experienced this.
Thanks to all how have contributed their comments and answers.
I understand you figured out the solution..
Following is not a direct solution to the problem; but it is good for others to take a look at
What does "A severe error occurred on the current command. The results, if any, should be discarded." SQL Azure error mean?
http://social.msdn.microsoft.com/Forums/en-US/bbe589f8-e0eb-402e-b374-dbc74a089afc/severe-error-in-current-command-during-datareaderread

Silverlight, WCF and NotFound, oh my

I know this is a hot topic on StackOverflow, but do bear with me.
We have a Silverlight 3 application talking to a WCF service. Every now and then, calls to the WCF service return a NotFound exception.
I've read pretty much every post on SO and Google on this subject but I can't figure out what's going wrong. Here are some of my findings:
The exception happens on random calls and at random moments. Sometimes a method will work for 50 times and suddently it bugs out. I have a feeling it's related to a timeout, since it's most reproducable if I let the application idle for a while before invoking a call, but this is not always the case - sometimes the one of the first calls in the application fails.
We use the SilverlightFaultBehavior to convert the HTTP error code to 200 and we have plenty of instances where throwing an exception on the serverside actually bubbles up to the client side, so I can confirm this should be working as expected.
Fiddler shows nothing special at the moment the exception occurs. I don't even see the call in question. This worries me, but it might mean that the exception is a result of a call that happened minutes ago and timed out?
Service Trace Viewer shows nothing.
I attach Visual Studio to to Silverlight project and to the WCF services project, set debugging to break on all exceptions (thrown or handled) and it doesn't break (except in Silverlight to tell me about the NotFound problem). This causes me to think that maybe the NotFound is not in response to an exception on the WCF service side?
I really have no idea where to go from here. Any help at all, any pointers or ideas of things to try are welcome.
Here are some thoughts for the points you mentioned:
1) The exception happens on random calls and at random moments - Make sure the data being sent as a return value of method is valid. I had a case when sending an object with some empty properties caused a failure in serialization. I found this out using IIS logs/ Service Trace Logs.
2) So, did you find anything useful?
3) I don't think fiddler can help with this kind of an error.
4) Are you sure about this? Did you set up Trace Logs correctly?
5) You won't find any exceptions that can help you here. The actual exception (when you see 'Not Found' error) is raised while wrapping the message/data from server side or unwrapping message/data on client side.
So, to summarize make sure the data is in correct format (may seem to be correct for you but not WCF, just play with it for a while with different values) and verify the Trace Logging again.
What is a binding of the service? Where is it hosted: IIS or VS Deployment server?
I have seen this problem recently, something was wrong with IIS. It couldn't even open *.svc files.
So here is a plan of activities:
Try to open svc file using http address like http://localhost/MyApp/MyService.svc
If it opens, write a console application and test the service.
If it works, write a silverlight simple application.
I hope this will help.
I fixed this by adding
minFreeMemoryPercentageToActivateService="1"
to Web.config. By default it is
minFreeMemoryPercentageToActivateService="5"
which sometime causes this error.

Resources