GAE NDB restore fails with "Unexpected error contacting datastore" - google-app-engine

I'm used to performing regular backup and restore of my NDB datastore, often restoring to a different 'project', where I will be testing a new version of the application. Up to now, it has worked fine.
Now, Ndb restore operation fails in a systematic way, with the trace in the log:
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/_internal/mapreduce/handlers.py", line 526, in handle
ctx.flush()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/_internal/mapreduce/context.py", line 455, in flush
pool.flush()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/datastore_admin/utils.py", line 695, in flush
datastore._GetConnection()._reserve_keys(self.keys)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 2170, in _reserve_keys
self._async_reserve_keys(None, keys).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 921, in get_result
results = self.__rpcs[0].get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 2211, in __reserve_keys_hook
self.check_rpc_success(rpc)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1371, in check_rpc_success
raise _ToDatastoreError(err)
InternalError: Unexpected error contacting datastore (2016-11-19T18:01:37+00:00).
Any clue ? I may have missed something, but I would swear nothing has changed as compared to last week's situation where such restore worked ok.
Just in case there was something wrong with my latest backup, I tried to restore an old backup, which had been successfully restored a couple of times. Same InternalError on restore. Including in a brand new GAE project.
Having tried many (many) times now, I found the 'unexpected errors' always happen for the same entities. The whole backup/restore involves about 30 different kinds, of which 5 fail with said error. Among these, in some cases there are entities that are restored, but not all, in other cases no entities have been restored. It's like some specific entities could be responsible for the exception. But, once again, this also happens for old backups, which had worked in the past.
There are few questions related to Ndb backup and restore (and fewer answers still). Don't Ndb applications use backups ?

So the final word on this is not hidden at the bottom of a long list of comments, I'll post an answer.
It seems this was a bug that had appeared a few days/weeks back, which a few users like me have experienced when restoring to another project.
The Google project team did a great job, fixing the issue promptly. Many thanks to #Ed Davisson.

Related

flink checkpoint stuck on bad file

I am new to flink (1.3.2) and I have a question and want to see if anyone can help here.
So we have a s3 path that flink is monitoring that path to see new files available.
val avroInputStream_activity = env.readFile(format, path, FileProcessingMode.PROCESS_CONTINUOUSLY, 10000)
I am doing both internal and external check pointing and let's say there is a bad file came to the path and flink will do several retries. I want to take those bad files to some error folders and let the process continue. However, since the file path persisted in the checkpoint, when I tried to resume from external checkpoint (I removed the bad file), it threw the following error on no file been found.
java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No such file or directory: s3a://myfile
I have two questions here:
How do people handle exceptions like bad file or records.
Is there a way to skip this bad file and move on from checkpoint?
Thanks in advance.
Best practice is to keep your job running by catching any exceptions, such as those caused by bad input data. You can then use side outputs to create an output stream containing only the bad records. For example, you might send them to a bucketing file sink for further analysis.

How to debug the cause for SQLiteError(11): database corruption

I have been tasked with finding the cause for 2 repeated lines we find in about 5% our logs in our Beta environment (this app is not in prod yet). They are as follow:
SQLite error (11): database corruption at line 55472 of [cf538e2783]
SQLite error (11): database corruption at line 55514 of [cf538e2783]
That's it. No other info is available, and it's always those two lines. They are being logged by System.Data.SQLiteLog.LogEventHandler and the Log System is Trace.
Interestingly, it appears that the database continues to function and usage of our application continues without any apparent problems.
So my question is: Is there actually an issue, and, if so, how do you go about identifying it?
In my research on the issue I have to believe that there is a legitimate problem here. According to www.sqlite.org corruption in SQLite should be rather rare. A false corruption report can be caused due to database shrinkage (8.1), but I don't think that's the case here. I must assume at this point that the report is correct and the database is actually corrupt.
However, how could the database continue to function normally for hours of use if it is, in fact, corrupted?
Many sites give solutions on how to fix this issue (by reverting to a backup, or otherwise recreating the database), but I need to find what's causing this issue.
I have tried installing DB Browser for SQLite and browsing the database, but all seems well and there doesn't seem to be a way to "browse" to the lines that are reportedly corrupt. (There must be a way to do so, though, right? Why would they give line numbers if it's impossible to know what those line numbers are?)
Are there operations or areas that are likely culprits in this circumstance that I can investigate? How do I identify the lines that are being referenced in the error messages?

How does logging in managed VMs work?

I'm reading Google's docs on logging in managed VMs, and they're rather thin on detail, and I have more questions than answers after reading:
Files in /var/log/app_engine/custom_logs are picked up automatically it says – is this path pre-existing or do you also have to mkdir -p it?
Do I have to deal with log rotation/truncation myself?
How large can the files be?
If you write a file ending in .log.json and some bit of it is corrupt, does that break the whole file or will Google pick up the bits that can be read?
Is there a performance benefit/cost to log things this way, over using APIs?
UPDATE: I managed to have logs show up in the log viewer, but only when logging files with the .log suffix, whenever I try .log.json they are not being picked up and I can't see any errors anywhere. The JSON output seems fine, and conforms to the requirement of having one object per line. Does anyone know how to debug this?

DuplicatePropertyError: Class class_name already has property property_name

I'm developing for GAE-Python 2.7 on Mac using Eclipse + PyDev and, since the SDK 1.7.6 (where the new dev_appserver was introduced), I'm having a "DuplicatePropertyError" 4 out of 5 times (on average) when executing the first request. Also, even if the first request goes fine, the error might appear in later requests.
This error is only happening in development (in production everything goes fine) and prior to 1.7.6 I never had this.
I didn't pay too much attention to this problem because, so far, I was using the old_dev_appserver in order to be able to continue debugging my app (Google broke support to pydevd by using stdin/stdout for ipc in the new dev server). However, since the old dev server will be dropped from July 1st, I think it's time to start using the new one :-).
Is anyone else experiencing this problem? Any solution/workaround?
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/db/__init__.py", line 515, in __init__
_initialize_properties(cls, name, bases, dct)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/db/__init__.py", line 430, in _initialize_properties
attr.__property_config__(model_class, attr_name)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/db/__init__.py", line 3686, in __property_config__
self.collection_name))
DuplicatePropertyError: Class Organization already has property wagesheetrow_set
INFO 2013-06-07 15:13:54,864 server.py:585] default: "GET / HTTP/1.1" 500 -
Just figured it out! Apparently the error is coming when creating a ReferenceProperty without giving it an explicit collection_name.
For example, this might trigger the error:
class WageSheetRow(db.Model):
organization = db.ReferenceProperty(Organization)
and this is the correct way:
class WageSheetRow(db.Model):
organization = db.ReferenceProperty(Organization, collection_name='aName')
This never happened before with the old server but, apparently, the new server (1.7.6+) has changed the way of initialising instances.
It's also to mention that exactly the same code might trigger the error in one specific machine, but not in another one running exactly the same piece of code.

Deleting files and corresponding entries from database

I have a web site which handles files management. User can upload file, add description, edit and delete. What are the best practices for that kind of scenario?
I store files in the file system.
How should I handle deleting of the file? In this case I have to entities to delete: file and entry in database. First scenario is that I delete file and if there was no error I delete the entry from database. But then if the entry from database couldn't be deleted I cannot restore my file. So the second scenario is oposite: first entry from the database and then file. But again, when file cannot be deleted I cannot restore the entry in db. Which approach is better? Or is there any other?
I think the problem is universal for all web programming languages and all databases engines. But let's say that I have MySQL and PHP, so deleting the file from the level of database store procedure is not possible.
I usually find it best to go with doing soft deletes, especially for data in the database.
Doing it this way you could simply put the file in a new location to denote it as being deleted, and mark the entry in the database as being removed. This then allows you to still work with the file if the database delete fails for some reason.
Once you have the file in a new location, you can either backup that location to another place or set something up to periodically delete items from that location.
Here is one possibility.
You could first move the file to a 'Deleted' folder, then delete the entry in the database. If that fails, restore the file from the 'Deleted' folder.
To get rid of the files in the 'Deleted' folder, either do it right after the entry from the database is deleted. If that fails, then you end up you orphan files in a 'Deleted' folder... Depending on you requirement, this might not be a problem.
Another option would be to have a call (maybe on SessionEnd, or a service) that would do the clean up of the database.
I would personally make sure the database gets the priority as it is more fundamental to the system. So i'd make sure the db row gets deleted, then delete the file. If the file deletion fails, i'd make it fail silently. I would then have a cronjob checking against all files if they have their db counterpart, and if not, marked them for deletion, so the system stays clean and coherent.
There is often a garbage collection phase, and in fact your database has something similar called the "transaction log" which it can use to rewind or play forward a transaction.
In the case of your file delete you will have a clean-up process that runs periodically (perhaps manually in the event of a crash, or automatically every so often) that compares what is on the disk with what is in the database and makes an appropriate correction.
In order for any operation to be "atomic" there must be a method of cleaning up in the event of a crash. The key is finding a method that cleans up consistently such that a failure at any point within the "atomic" operation doesn't leave the system unrecoverable.
Vista introduced Transactional NTFS that will allow you to wrap file system operations along with database operations into a transaction that can be rolled back if either failed. This is not in .Net 3.5 and I do not know if it will be part of .Net 4.0, but there is evidence that it works today with a bit of leg work, i.e. by using Win32 calls (see this article).
I know nothing about transaction management in PHP.

Resources