I'm trying to trouble shoot some issues I'm having with an export task I have created. I'm attempting to export CSV data using Google Cloud Storage and I seem to be unable to export all my data. I'm assuming it has something to do with the (FAR TOO LOW) 30 second file limit when I attempt to restart the task.
I need to trouble shoot, but I can't seem to find where my local/development server writing the files out. I see numerous entries in the GsFileInfo table so I assume something is going on, but I can't seem to find the actual output file.
Can someone point me to the location of the Google Cloud Storage files in the local AppEngine development environment?
Thanks!
Looking at dev_appserver code, looks like you can specify a path or it will calculate a default based on the OS you are using.
blobstore_path = options.blobstore_path or os.path.join(storage_path,
'blobs')
Then it passed this path to blobstore_stub (GCS storage is backed by blobstore stub), which seems to shard files by their blobstore key.
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
return os.path.join(self._DirectoryForBlob(blob_key), str(blob_key)[1:])
For example, i'm using ubuntu and started with dev_appserver.py --storage_path=~/tmp, then i was able to find files under ~/tmp/blobs and datastore under ~/tmp/datastore.db. Alternatively, you can go to local admin_console, the blobstore viewer link will also display gcs files.
As tkaitchuck mentions above, you can use the included LocalRawGcsService to pull the data out of the local.db. This is the only way to get the file, as they are stored in the Local DB using the blobstore. Here's the original answer:
which are the files uri on GAE java emulating cloud storage with GCS client library?
Related
I installed Google Cloud SDK and it dumped a .boto file directory in to the My Documents folder (e.g. C:\Users\John) which is a wildly inappropriate location. I do see many instances of the boto file in the Python files, a couple of dozens of instances / examples:
return os.path.join(self.LegacyCredentialsDir(account), '.boto')
os.path.expanduser(os.path.join('~', '.boto')),
Where do I go to change the path to something appropriate? An appropriate path would be something such as C:\Users\John\AppData\Roaming\gcloud\.boto in example.
At the top of the file:
This file contains credentials and other configuration information needed
by the boto library, used by gsutil. You can edit this file (e.g., to add
credentials) but be careful not to mis-edit any of the variable names (like
"gs_access_key_id") or remove important markers (like the "[Credentials]" and
"[Boto]" section delimiters).
[Credentials]
Google OAuth2 credentials are managed by the Cloud SDK and
do not need to be present in this file.
To add HMAC google credentials for "gs://" URIs, edit and uncomment the
following two lines:
The latest versions of Boto don't seem to be a great fit for App Engine. I ran into this issue about a year ago, and I don't remember all of the details, but I avoided Boto3 and stuck with Boto 2.47 and that worked well for me.
For my use case, I only needed help with SES. If you need many other AWS services then YMMV.
On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.
My app stores a bunch of images as blobs. This is roughly how I store images.
from google.appengine.api import files
# ...
fname = files.blobstore.create(mime_type='image/jpeg')
with files.open(fname, 'a') as f:
f.write(image_byte)
files.finalize(fname)
blob_key = files.blobstore.get_blob_key(fname)
To serve these images, I use images.get_serving_url(blob_key).
Here are my questions:
Will I have to copy over all blobs to Google Cloud Storage? In other words, will I be able to access my existing blobs using GCS client library and existing blob keys? Or, will I have to copy the blobs over to GCS and get new blob keys?
Assuming I do have to copy them over to GCS, what is the easiest way? Is there a migration tool or something? Failing that, is there some sample code I can copy-paste?
Thanks!
The files have all been going into GCS for a while. The blobstore is just an alternate way to access it. The blob keys and access shouldn't be affected.
You will, however, need to stop using the files API itself and start using the GCS API to create the files.
1) No, you can still use the blobstore. You can also upload files to the blobstore when you use the BlobstoreUploadHandler.
2) Migration is easy when you use the blobstore, bacause you can create a blobkey for GCS objects. And when you use the default GCS bucket you have free quota.
from google.appengine.api import app_identity
import cloudstorage as gcs
default_bucket = app_identity.get_default_gcs_bucket_name()
gcs_filename = '/%s/%s' % (default_bucket, image_file_name)
with gcs.open(gcs_filename, 'w', content_type='image/jpeg') as f:
f.write(image_byte)
blob_key = blobstore.create_gs_key('/gs' + gcs_filename)
# and create a serving url
I received an email from Google Cloud Platform on May 19, 2015, an excerpt is shown here:
The removal of the Files API will happen in the following manner.
On May 20th, 2015 no new applications will have access to the Files
API. Applications that were created prior to May 20th, 2015 will
continue to run without any issues. That said, we strongly encourage
developers to start switching over to the Cloud Storage Client Library
today.
On July 28th, 2015 starting at 12pm Pacific Time, the Files API will
be temporarily shutdown for 24 hrs.
On August 4th, 2015, we will permanently shut down the Files API at
12:00pm Pacific time.
Since I was using the exact same code to write a blobstore file, I spent a day researching the GCS system. After failing to get a "service account" to work (by going through poorly documented OAuth2 confusion), I gave up on using GCS.
Now I am using ndb's BlobProperty. I keep the blobs in a separate model using both a parent key and a key name (as filename) to locate the images. Using a separate model keeps the huge blob out of my regular entities so fetches aren't slowed down by their sheer size. I wrote a separate REST API just for the images.
Me too faced same issue while running GAE server locally:
com.google.appengine.tools.cloudstorage.NonRetriableException: com.google.apphosting.api.ApiProxy$FeatureNotEnabledException: The Files API is disabled. Further information: https://cloud.google.com/appengine/docs/deprecations/files_api
Here in my case this is fixed my issue:
Simply I changed
This:
compile 'com.google.appengine.tools:appengine-gcs-client:0.4.1'
To:
compile 'com.google.appengine.tools:appengine-gcs-client:0.5'
in build.gradle file, because Files API(Beta) is deprecaated on June 12, 2013 and Turndowned on September 9, 2015. (Source)
From this MVN Repo latest one is 'com.google.appengine.tools:appengine-gcs-client:0.5'
I'm trying to upload to GCS using the Blobstore. I have set the GCS bucket name while generating the upload url, and the file gets uploaded successfully.
In the upload handler, blobInfo.getFilename() returns the right file name. But the file actually got saved in the GCS bucket in some different file name. Each time, the file name is some random hash like this one:
L2FwcGhvc3RpbmdfcHJvZC9ibG9icy9BRW5CMlVvbi1XNFEyWEJkNGlKZHNZRlJvTC0wZGlXVS13WTF2c0g0LXdzcEVkaUNEbEEyc3daS3Vham1MVlZzNXlCSk05ZnpKc1RudDJpajF1TmxwdWhTd2VySVFLdUw3US56ZXFHTEZSLVoxT3lablBI
Is this how it will work? Is this an anomaly?
I store the file name to the datastore based on the data returned from blobInfo.getFilename(), which is the correct value of file name. But I'm unable to access the file using the GcsFilename since the file is stored in GCS with that random hash as file name.
Any pointers would be greatly helpful.
Thanks!
PS: The blobstore page says that BlobInfo is currently not available for GCS objects. But BlobInfo.getFilename returns the right value for me. Is that something wrong from my end?
It's how it works, see https://cloud.google.com/appengine/docs/python/blobstore/fileinfoclas ...:
FileInfo metadata is not persisted to datastore [...] You must save
the gs_object_name yourself in your upload handler or this data will
be lost
I personally recommend that new applications use https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/ directly, rather than the blobstore emulation on top of it.
The latter is currently provided essentially only for (limited, partial) backwards compatibility: it's not really all that suitable for new applications.
I was trying to load one of my data store table to BigQuery. When I found there is an option "AppEngine Datastore Backup" in the web ui of BigQuery, I'm very happy cause all my data are located in one data store table. It should be the easiest approach (I thought) to just export data via "Datastore Admin" page of Google App Engine and then import it in BigQuery.
The export process went quite smoothly and I happily watched all mapper tasks successfully finished. After this step, what I got are 255 files in one of my Cloud Storage bucket. The problem arose when I try to import it in the web-ui of BigQuery. I input the url of one of the 255 files as the source of data load. And all I got is following error message:
Errors:
Not Found: URI gs://your_backup_hscript/datastore_backup_queue_status_backup_2013_05_23_QueueStats-1581059100105C09ECD88-output-54-retry-0
I'm sure above URL is the right one cause I can download it with gsutil. And I can import one CSV file located in the same bucket. May I know your suggestion of next step?
Found the reason now. I shall use the file with ".backup_info" suffix instead of arbitrary data file.
Cheers!