Realtime API Fatal Network Error on Load - google-drive-realtime-api

I've been seeing instances of 'fatal_network_error' popping up in production more frequently recently.
This particular error unfortunately isn't documented anywhere as to what the issue may be, so apart from retrying the doc in the client open I'm unsure of how to prevent something like this from consistently happening to people. In particular, one document [ID: 0B1es-bMybSeSb3hnSHdRUTNLSGc] that I have access to appears to be hitting this nearly 100% of the times I have opened it in Firefox (though it appears this is not exclusive to FF from other error reports).
After re-requesting the same document to be opened a second time (after handling the fatal error), the document load is successful. Looking at the network requests identical requests are made with the exact same params (rctype,recver,id,access_token). Additionally the response from the server is identical apart from what looks like a version hash and possibly a version number after the Document ID.
Is this a known issue? Are there workarounds?
I have shared the document publicly, please let me know if there is any other information that I can provide that may help track this down.

Related

Invalid JWT token happens randomly when using Snowflake+DBT

We've been using Snowflake+DBT with key pair authentication for a long time now, and we've never had any issues.
Recently, we started getting random connection errors on some models:
250001 (08001): Failed to connect to DB: account.region.snowflakecomputing.com:443. JWT token is invalid.
Most of the models will work, but some of them might fail. This can happen to any model, and it's never the same one -- sometimes it's the very first one, sometimes it's the very last one, and sometimes it's a bunch of them. It happens in runs with lots of models, or in runs with a single model.
It's very inconsistent, and there doesn't seems to be any kind of pattern to it. Sometimes it works, and sometimes it doesn't.
We've also tried with 1, 4, or 8 threads, and it happens regardless.
Obviously, there's nothing wrong with the credentials or configurations — otherwise, nothing would run at all. So I assume there must be something wrong with how DBT is handling the connection(s).
Interestingly, the errors only happens locally (so far). We haven't seen it in DBT Cloud runs.
DBT version is 0.20.2 in both cases. We tried with other versions (0.21.0, 0.20.0 & 0.19.1), and the issue persists. I don't know why we're just encountering this, as we've used these other versions previously without any issues.
It's similar to this question, except in our case it doesn't happen consistently at all. We tried connecting "without a region" (using Snowflake Organizations), but it doesn't make any difference:
250001 (08001): Failed to connect to DB: organization-account.snowflakecomputing.com:443. JWT token is invalid.
Is there anything we can do to resolve this?
EDIT: When the error happens, the model hangs for 60 seconds until the error appears.
EDIT 2: I think the error might have started happening when we started using the DBT-provided Docker images. Not sure exactly what might be wrong with them, but we'll try going back to our own custom images and see if that works.
This has since stopped happening. 🤷‍♂️
Possibly because of this change from Snowflake:
"An improvement has been added to Snowflake's Cloud Services layer to relax the validity restrictions."
Also, we are now using different Docker containers, so that might have something to do with it as well:
First, we switched to xemuliam/dbt, from Docker Hub.
But since that is no longer maintained, we are now using the official DBT Docker images from GitHub (which were not available at the time the question was posted):
dbt-core
dbt-postgres
dbt-snowflake

How can I dump the .parquet data that is in Azure DataLakeStorage to a Microsoft SQL Server database using Nifi?

I've been looking for information for a long time and I can't get it. I'm starting to think it can't be done if the .parquet are in Azure DataLake Storage.
I have a folder with subfolders in Azure DataLake Storage. In these subfolders there are many .parquet. I manage to get them out using ListAzureDataLakeStorage + FetchAzureDataLakeStorage combination. Then I try to pass them through a PutDatabaseRecord (which I think is the correct processor for the dump in the DB).
I think I have the PutDatabaseRecord well configured. But when executing it gives me an error: "Failed to process session due to Failed to process StandardFlowFileRecord due to java.lang.NullPointerException: Name is null".
I'm not sure I'm using the PutDatabaseRecord right. I thought that PutDatabaseRecord read the flowfiles that came to it interpreting their content as .parquet (it is supposed to use a ParquetReader as a RecordReader), being able to understand the data as records. But it surprises me that it is not necessary to indicate how to interpret the .parquet, nor how to map its columns with those of the DB table. It still doesn't work as I think and it needs the flowfile content to already arrive as records?
The truth is that I can't explain myself better either because I don't really understand what is considered a record in Nifi or how a record is related to a reading of a .parquet.
Either I am missing a processor or something I am configuring wrong. But the only thing I find is the FetchParquet, which seems to be able to read a .parquet and put it into the FlowFile as records. However, it can only be used with ListHDFS or ListFile, which do not allow me to fetch data from Azure Data Lake Storage
After several tests (using the ConvertRecord and QueryRecord processors), I have come to the conclusion that the problem is in the reading that the ParquetReader does of the content of the FlowFiles that arrive. Well, every processor that needs a ParquetReader gives the same error. Downloading the content of the FlowFile that enters the processor that the ParquetReader uses (whatever it is) and using a .parquet viewer I have verified that this content is fine.
Without knowing what to do, I have attached a screenshot of the specific error. I still don't know what "Name" the error refers to.
Error Name is null
Note: I also posted my problem on Cloudera, perhaps better explained. I leave the link in case someone wants to look at it. (https://community.cloudera.com/t5/Support-Questions/How-can-I-dump-the-parquet-data-that-is-in-Azure/td-p/316020)
In the end, the closest thing to the error I was getting was found here (https://issues.apache.org/jira/browse/NIFI-7817). It seems that it is an error related to the creation of the ParquetReader. This makes sense because it would hit any processor that used a ParquetReader. In addition, the FlowFiles did not even enter the processor that used it.
I was using Nifi version 1.12.1. I have downloaded version 1.13.2 and it no longer gives the Name error. In addition, it is seen that the Flow Files already enter the processor. On the download page of the new version (https://nifi.apache.org/download.html) you can access the Release Notes and the Migration Guidance to know what has been fixed with respect to previous versions and with which processors you have to be careful when migrating.
However, even though the data goes into the processor, it still gives me an error, but it is different and I will open it in another post.

App Engine silently fails on some requests

Some requests silently fail in my python app, intermittently and unpredictably. The hallmarks of the failure are:
Request returns a 200, so the client doesn't know there's a problem.
Request does NOT successfully execute on the server.
No logging statements are recorded for the request.
Below is an example from my logs of a bunch of requests which are each supposed to write an entity to the datastore. You can see for the lower, successful request, a blue 'i' is present, indicating that info level logs were recorded. When I examine the datastore, an entity was successfully written for this request.
However, for the failed request, you can see there is just a white box, and there are no logging statements present at all. While the server returned a 200, no entity was written to the datastore for this request.
Has anyone encountered something like this before on App Engine? Any ideas on how to debug it? I've seen it in multiple different apps myself, but I've never been able to figure it out.
EDIT
To clarify, the main problem here is that code doesn't execute, as measured by the failure to write an entity. The spurious 200 and lack of logging is an associated symptom.
From a comment originally, but seems to be the resolution path for this issue:
Given that there are no log statements at all in the line and you appear to unpack the arguments and log them as soon as you enter the handler, this starts to look like an infrastructure/platform issue.
In such a case, it's best to open a public issue tracker issue, with "Type-Production" as a tag, including your app's app id and a timeframe, and as much information about your app and request handler involved as possible, and platform support will pick up the issue in the course of triage.
That said, it's worth examining the handler to make absolutely sure there's no way you could be exiting from the handler and sending a 200 without logging anything or seeing an exception. It all depends on what the code handling the request is capable of, what stack of libraries it's build upon, etc.

OData HTTP400 Timeout Error

This is one of the most bizarre problems I've come across since I started using OData for my mobile apps. The OData server I've developed is backed by SQL Express 2008 and this combination has been installed on 50 different servers and/or PCs over the last 15 months. All 50 servers have been running stable with consistent function for large amounts of data.
A couple of days ago one of my clients contacted me indicating that my client app (running on iOS7) was having an odd error come up when POSTing data to their server. The error had an HTTP code of 400 and the error text is "The operation couldn't be completed. (Timeout error 400.)". My first question is: why is a timeout error coming back with a 400 code? Generally when I get timeouts (due to firewall, etc) they're in the 100x range. There is no indication in the event logs on the server of ANY problems occurring. My own logs (stored in the SQL database) show no error (which is odd because I'm using the generic exception catching method in my OData service to log any problems). I haven't got to the step of adding logging of all requests as yet.
The error is only being raised when posting one particular set of data. All other posts from the device are functioning perfectly. I got the client to re-install the app (deleting all data) and then to download the data set that was causing the error. The download worked fine. We began making changes to the data to replicate what the data looked like when the error occurred in incremental changes, posting the change to the server and observing the result. Most of the incremental changes work fine but certain combinations cause the error to occur. One of the increments involves a large volume of changes and that posts fine, but subsequent alteration of any of the objects (sometimes altering as little as 6 characters in a text field) cause the error to occur. And yet in some circumstances altering objects that have already been posted to the server works without a problem.
I wiped the service components from the server and undertook a fresh install. I shifted TCP ports in case 443 had another listener causing problems. I reset the server. None of these change the behaviour of the error.
My last ditch solution is to completely re-install IIS and .NET Framework but I'd obviously like to avoid this as it's not my server... The server is overseas from my current location so debugging isn't really an option. Hoping someone has an idea as to what I can do diagnostically to try and determine the source of this bizarre 'gremlin'.
Have you tried a more thorough traffic analysis using a tool like Fiddler? The "timeout" error does indeed seem odd and what stood from you post was that your server was "overseas". Could there be something with the "times" that are being used/generated, e.g. server time, local time, etc?
Just to confirm, the "same" exact set of data always fails? Can you replicate this via a remote debugger or via localhost? If so, can you turn on "verbose errors"?

Linq-To-Sql and MARS woes - A severe error occurred on the current command. The results, if any, should be discarded

We have built a website based on the design of the Kigg project on CodePlex:
http://kigg.codeplex.com/releases/view/28200
Basically, the code uses the repository pattern, with a repository implementation based on Linq-To-Sql. Full source code can be found at the link above.
The site has been running for some time now and just about a year ago we started to get errors like:
There is already an open DataReader associated with this Command which must be closed first.
ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.
These are the closest error examples I can find based on my memory. These errors started to occur when the site traffic started to pick up. After banging my head against the wall, I figured out assumed that the problem is inherit within Linq-To-Sql and how we are using the same connection to call multiple commands in a single web request.
Evenually, I discovered MARS (Multiple Active Result Sets) and added that to the data context's connection string and like magic, all of my errors went away.
Now, fast forward about 1 year and the site traffic has increased tremendously. Every week or so, I will get an error in SQL Server that reads:
A severe error occurred on the current command. The results, if any, should be discarded
Immediately after this error, I receive hundreds to thousands of InvalidCastException errors in the error logs. Basically, this error shows up for each and every call to the Linq-To-Sql data context. Only after I restart the web server do these errors clear up.
I read a post on the Micosoft Support site that descrived my problem (minus the InvalidCastException errors) and stating the solution is that if I'm going to use MARS that I should also use Asncronous Processing=True. I tried this, but it did not solve my problem either.
Not really sure where to go from here. Hopefully someone here has seen and solved this problem before.
I have the same issue. Once the errors start, I have to restart the IIS Application Pool to fix.
I have not been able to reproduce the bug in dev despite trying many different scenarios involving multi-threading, leaving connections open, etc etc.
One possible lead I do have is that amongst the errors in the server Event Log is an OutOfMemoryException for the Application Pool. Perhaps this is the underlying cause of the spurious SQL Datareader errors (a memory leak elsewhere). Although again I haven't been able to reproduce this in dev.
Obviously if you are using a 64 bit OS then this is probably not the cause in your case.
So after much refactoring and re-architecting, we figured out that problem all along is MARS (Multiple Active Result Sets) itself. Not sure why or what happens exactly but MARS somehow gets result sets mixed up and doesn't recover until the web app is restarted.
We removed MARS and the errors stopped.
If I remember correctly, we added MARS to solve the problem where a connection/command was already closed using LinqToSql and we tried to access an object graph that hadn't been loaded. Without MARS, we'd get an error. But when we added MARS, it seemed to not care about it. This is really a great example of us not really understanding what the heck we were doing and we learned some valuable (and expensive) lessons from this.
Hope this helps others who have experienced this.
Thanks to all how have contributed their comments and answers.
I understand you figured out the solution..
Following is not a direct solution to the problem; but it is good for others to take a look at
What does "A severe error occurred on the current command. The results, if any, should be discarded." SQL Azure error mean?
http://social.msdn.microsoft.com/Forums/en-US/bbe589f8-e0eb-402e-b374-dbc74a089afc/severe-error-in-current-command-during-datareaderread

Resources