BigQuery Throwing Import Error, No Information Provided - google-app-engine

I am trying to import a CSV file into my BigQuery Table. This import has worked in the past, but now I am getting the following error message:
{"message":"Too many errors encountered. Limit is: 0.","reason":"invalid"}
All other fields are empty when I run the debugger.
This is... not helpful. I am unaware of any issues with the data itself, as the export/import data has not changed. Curiously, when trying to use a previous Job Template and run through the web console, the web console itself hangs and the dialog never goes away once I hit the blue "Submit" button.
Job Id: job_e0faf560d3df424ea74519e1b24a23f7
I am generating a CSV and exporting it to Google Cloud Storage. I am using AppEngine and have switched to the new Google Cloud Storage Client Library. I had uploaded the file using the GcsFileOptions.getDefaultInstance() as well as constructing my own GSFileOptions setting the content type to CSV.
After failure, I downloaded the file from Google Cloud Storage, change the encoding (tried ASCII and UTF8) and still have gotten the same result.
I am using AppEngine 1.8.1.1 and the BigQuery Library (google-api-services-bigquery-v2-rev89-1.15.0-rc). This was working as expected previously, so I'm not sure what has happened. Any suggestions are welcome. Thank you!

There are two error fields on the bigquery job. The first is the error result, which tells you whether (and why) the job failed. The error result in your case that the job failed due to encountering too many input errors during the import.
The second field is the error stream, which tells you about errors encountered during the job. If you had set the maxBadRecords field, for example, you could have errors in the error stream, but the actual job might succeed.
I looked up your job in the BigQuery logs, and was able to find that the error stream indicates an error on line 6253: "Too few columns: expected 80 column(s) but got 1 column(s). For additional help: http://goo.gl/RWuPQ"
Can you verify that line 6253 is correct?
-- Jordan Tigani / BigQuery Engineer

Today there is some general problem with app engine:
"We are still investigating the issue with Google App Engine, primarily (but not restricted to) Datastore latency.
We will provide another status update in the next two hours."
https://groups.google.com/forum/#!topic/google-appengine-downtime-notify/1pJZnl4EMKk

Related

I am unable to export my redcap data and receiving an error notification. notification says to much data although it is a tiny project. solutions?

The problem I am trying to export my redcap data to a CSV form, and unable to do so. I am receiving an error notification that says there is to much data, although it is a tiny project. help will be much appreciated.
The full error text: " We are sorry, but apparently the data export is not able to complete successfully. It may simply be that there is too much data trying to be exported at once, in which it is causing REDCap to crash. If this error occurs again, it is recommended that you attempt to export a smaller data set (fewer fields and/or perhaps fewer records) so that this error does not occur. Our apologies for this inconvenience."
what I have tried
I have made sure I have the necessary user rights.
I have tried through a colleges redcap user (who has thenecessary user rights) .
have tried exporting only one instrument (no success)
have created a test project with only 2 questions. also in the new test project I receive the same notification.
could not export data both in development mode and in production.
any ideas?
Many Thanks!
The institution had blocked the option for file upload due to security reasons. Apparently REDCap's exporting system uses the uploading mechanism, and therefore landed up being disabled as well.
The local storage folder location was pointing to a folder that was missing on the server. Just created the folder and upload and subsequent problem was fixed

How can I dump the .parquet data that is in Azure DataLakeStorage to a Microsoft SQL Server database using Nifi?

I've been looking for information for a long time and I can't get it. I'm starting to think it can't be done if the .parquet are in Azure DataLake Storage.
I have a folder with subfolders in Azure DataLake Storage. In these subfolders there are many .parquet. I manage to get them out using ListAzureDataLakeStorage + FetchAzureDataLakeStorage combination. Then I try to pass them through a PutDatabaseRecord (which I think is the correct processor for the dump in the DB).
I think I have the PutDatabaseRecord well configured. But when executing it gives me an error: "Failed to process session due to Failed to process StandardFlowFileRecord due to java.lang.NullPointerException: Name is null".
I'm not sure I'm using the PutDatabaseRecord right. I thought that PutDatabaseRecord read the flowfiles that came to it interpreting their content as .parquet (it is supposed to use a ParquetReader as a RecordReader), being able to understand the data as records. But it surprises me that it is not necessary to indicate how to interpret the .parquet, nor how to map its columns with those of the DB table. It still doesn't work as I think and it needs the flowfile content to already arrive as records?
The truth is that I can't explain myself better either because I don't really understand what is considered a record in Nifi or how a record is related to a reading of a .parquet.
Either I am missing a processor or something I am configuring wrong. But the only thing I find is the FetchParquet, which seems to be able to read a .parquet and put it into the FlowFile as records. However, it can only be used with ListHDFS or ListFile, which do not allow me to fetch data from Azure Data Lake Storage
After several tests (using the ConvertRecord and QueryRecord processors), I have come to the conclusion that the problem is in the reading that the ParquetReader does of the content of the FlowFiles that arrive. Well, every processor that needs a ParquetReader gives the same error. Downloading the content of the FlowFile that enters the processor that the ParquetReader uses (whatever it is) and using a .parquet viewer I have verified that this content is fine.
Without knowing what to do, I have attached a screenshot of the specific error. I still don't know what "Name" the error refers to.
Error Name is null
Note: I also posted my problem on Cloudera, perhaps better explained. I leave the link in case someone wants to look at it. (https://community.cloudera.com/t5/Support-Questions/How-can-I-dump-the-parquet-data-that-is-in-Azure/td-p/316020)
In the end, the closest thing to the error I was getting was found here (https://issues.apache.org/jira/browse/NIFI-7817). It seems that it is an error related to the creation of the ParquetReader. This makes sense because it would hit any processor that used a ParquetReader. In addition, the FlowFiles did not even enter the processor that used it.
I was using Nifi version 1.12.1. I have downloaded version 1.13.2 and it no longer gives the Name error. In addition, it is seen that the Flow Files already enter the processor. On the download page of the new version (https://nifi.apache.org/download.html) you can access the Release Notes and the Migration Guidance to know what has been fixed with respect to previous versions and with which processors you have to be careful when migrating.
However, even though the data goes into the processor, it still gives me an error, but it is different and I will open it in another post.

Is my GAE Search corrupt?

I've got a single index in a GAE Search application.
When I call index.put I get the OverQuotaError: The API call search.IndexDocument() required more quota than is available.
When I got to the GAE Console and look under Search then my index articleIndex contains no documents but its Amount Used is 78.2KB. I've also tried retrieving documents but none are returned.
I've tried using a new index but I can the same error message in my application's logs.
I have a copy of my application and that continues to work fine - this uses the same code and data but in a separate application space.
I created a new app with the code from my "corrupted" installation and the new installation indexes fine.
Has anyone had a GAE Search index that, although empty, is listed as taking up space?
I've tried running my GAE routines right after my daily quota is renewed ().
The quota representing storage usage is reconciled nightly, so if you have recently removed documents from the index that fact will not immediately be reflected. However, you say that using a new index still produces the same problem?
Note that daily quota renewal (not to be confused with the reconciliation mentioned above) does not affect the storage limit.
If you are still having troubles, you can file a report on the external issue tracker, mention the app ID and index name, and we can help investigate the current status of the index.

What can I do with generated error logs?

I'm currently working on a web application which generates daily error (and non error) logs.
The current system outputs a log per task to a text file, and outputs critical errors as well as "start" and "finish" type messages to an email account.
The current workflow is as follows: scour the email box for errors, then go and find the .txt file to look at the associated errors and find the cause.
There are around 30 txt files split across about 5 servers.
This system was set up before me, but I'm looking for any advice on how to deal with the situation.
I have control of the script forming the error logs so can do pretty much anything - but I'm lost where to start: I'd considered some kind of web facing dashboard tool, maybe output the files to RSS or something?
Are there any external or internal tools I should be using?
Of course you may use the SQL Server Reporting Services or review this comparison table, there are some packages which may support SQL Server but they may be overwhelming for your task.
It's not really clear what your problem is or what you want to do, but if I understand correctly, your biggest problem is that some messages are logged to a log file but others are sent by email. Therefore, there is no single location that has all error messages in it and that makes analysis and troubleshooting difficult.
The best solution would be to use a logging framework that supports multiple logging destinations (file, DB, email) and severities. That would allow you to specify a configuration like "all errors are logged to a text file and critical ones are also sent by email", so you can ensure that you have everything in one place for general analysis but critical errors are also handled with priority.
You didn't mention what programming language you use, but assuming it's .NET-based then log4net and Enterprise Library are two common frameworks and there are many questions about them here on SO. Googling should give you a good idea of the pros and cons for your situation. If you're using a different language then you can look for the equivalent package: log4j (Java), logging (Python) etc.

App Engine backup never finishes only clue is failure in map reduce worker_callback

Over the last few weeks we have repeatedly failed on doing a complete backup of the data store using the datastore admin tool. We thought the issues had to do with quota errors we were running into so we switched our application from a free to a paid app and we still have problems.
Each time we are attempting to back up to the blobstore and what occurs is that the process never finishes. We see the backup in our Pending Backups list but it never actually completes. We only have a total of 43MB of data right now so we don't see it as a data transfer problem. Looking at our default Task Queues it shows that we have two pending tasks one is a call to /_ah/mapreduce/controller_callback and another is a call to /_ah/mapreduce/worker_callback
The worker_callback racks up its retry count and the only error clue we have is on the Previous Run tab it shows the last http response code to be 500. There is no error message, nothing shows up in our error logs, it just keeps trying over and over again.
We've been able to narrow the backup problems to a specific entity kind for a particular namespace but we can't figure out why that entity kind is failing whereas the others are not. The major difference is the entity kind has a large number of embedded entities, but if the app engine is able to read / put those entities we can't understand why it seems to be having problems backing it up. The particular namespace that the error occurs in has the largest data stored for that entity kind compared to the other namespaces we have setup.
We think if we can see what error is occurring in the worker_callback we may be able to figure out why the backup is failing, or what is wrong with our data that's preventing the backup. Is there something we need to setup / enable through settings / configuration files to give us more detailed information on the backup? Or is there some other avenue we should explore to figure out how to investigate/fix this problem?
I should mention we are using the Java SDK as well as Objectify V3 to work with the data store. We are also backing up data to the Blobstore.
Thank you.
Well with the app engine team's help we figured what the problem was and we worked around the issue. I want to give details in case anyone else runs into this problem.
From issue 8363 the app engine team indicated that from their logs they could see that the map reduce failed because of the large number of properties that our entity kind had. The specific entity kind that was causing the failure had a large number of variable properties that was generating errors when map reduce tried to write out a schema. They indicated that the solution on their end was to ignore entities that were like this in the backup to make it so the backup worked successfully.
What we did to work around the issue and make the backup work was change how we told objectify to store out data. The large number of properties were being created due to our use of the #embedded keyword on a HashMap() class member field. Since the embedded keyword breaks down classes into individual components it was generating a large number of properties. We switched the member field to be #serialized and then ran a conversion process to make it use the new serialized property. This made the backup / restore work again.
You can read more about the differences between embedded and serialized on objectify's website
snielson, would you mind opening an issue on our Public issue tracker here. Remember to add your Application ID so we can further debug this specific scenario.
Thanks!

Resources