I currently have a Google App Engine python script that reads a firestore. The structure is as follows
<collection name=Location1>
<document name=timestamp1>
data-array
<document name=timestamp2>
data-array
<document name=timestamp3>
data-array
<collection name=Location2>
<document name=timestamp1>
data-array
<document name=timestamp2>
data-array
<document name=timestamp3>
data-array
This is basically a cache. This works best for my appengine as it dynamically generates a webpage so it needs to be fast. It takes location as an input and tries to pull the past 12hrs worth of data from the "cache". If it does not exist, it pulls the missing data from web and writes it into the cache. The issue is that I am not deleting old data and there is no guarantee the location (collection) will ever get called again. I could restructure my data to have timestamp collections with location documents but that makes looking up data difficult.
Therefore, I would like a separate program (cloud function?) to periodically scan my firestore and delete anything more than X hours from each collection or delete the entire collection if all the documents are older/deleted. I realize I probably need to add a timestamp entry to each document so I can query on it.
All I have been able to find regarding how to accomplish this is at https://firebase.google.com/docs/firestore/solutions/delete-collections however, I have having trouble understanding it and it seems to require me to specify the collection.
Firestore doesn't have any built-in scheduling capabilities. You'll have to use another product in tandem with it.
If you're already using Google App Engine, you can just set up a cron job and have it execute a program to delete documents. If you prefer Cloud Functions, you can use a Firebase scheduled function, or if you don't use the Firebase tooling, you can put together the same functionality by configuring Cloud Scheduler to invoke an HTTP endpoint of your choice.
See also: Cloud Functions for Firebase trigger on time?
Related
I'm trying to use Google BigQuery to download a large dataset for the GitHub Data Challenge. I have designed my query and am able to run it in the console for Google BigQuery, but I am not allowed to export the data as CSV because it is too large. The recommended help tells me to save it to a table. This requires requires me to enable billing on my account and make a payment as far as I can tell.
Is there a way to save datasets as CSV (or JSON) files for export without payment?
For clarification, I do not need this data on Google's cloud and I only need to be able to download it once. No persistent storage required.
If you can enable the BigQuery API without enabling billing on your application, you can try using the getQueryResult API call. You're best bet is probably to enable billing (you probably won't be charged for the limited usage you need as you will probably stay within the free tier but if you do get charged it should only be a few cents) and save your query as a Google Storage object. If its too large I don't think you'll be able to use the Web UI effectively.
See this exact topic documentation:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Summary: Use the extract operation. You can export CSV, JSON, or Avro. Exporting is free, but you need to have Google Cloud Storage activated to put the resulting files there.
use BQ command line tool
$ bq query
use the --format flag to save results as CSV.
From the tutorial, which I confirmed by creating a simple project, the index.yaml file is auto-generated when a query is run. What I further observe is that until then the admin console (http://localhost:8080/_ah/admin/datastore) does not show the data-store.
My problem is this: I have a project for which data/entities are to be added manually through the datastore admin console. The website is only used to display/retrieve data, not to add data to the data-store.
How do I get my data-store to appear on the console so I can add data?
Yes, try retrieving from the empty data-store through the browser just so I can get the index.yaml to populate, etc. But that does not work.
The easiest way is probably just to create a small python script inside your project folder and create your entities in script. Assign it to a URL handler that you'll use once, then disable.
You can even do it from the python shell. It's very useful for debugging, but you'll need to set it up once.
http://alex.cloudware.it/2012/02/your-app-engine-app-in-python-shell.html
In order to do the same on production, use the remote_api:
https://developers.google.com/appengine/articles/remote_api
This is a very strange question.
The automatic creation of index.yaml only happens locally, and is simply to help you create that file and upload it to AppEngine. There is no automatic creation or update of that file once it's on the server: and as the documentation explains, no queries can be run unless the relevant index already exists in index.yaml.
Since you need indexes to run queries, you must create that file locally - either manually, or by running the relevant queries against your development datastore - then upload it along with your app.
However, this has nothing at all to do with whether the datastore viewer appears in the admin. Online, it will always show, but only entity kinds that actually have an instance in the store will be shown. The datastore viewer knows nothing about your models, it only knows about kinds that exist in the datastore.
On your development server you can use the interactive console to create/instantiate/save an entity, which should cause the entity class to appear in the datastore interface, like so:
from google.appengine.ext import ndb
class YourEntityModel(ndb.Model):
pass
YourEntityModel().put()
Hey guys kind of a n00b in App engine and I have been strugling with this is there a way that I can add/bulk default data to Data Store.
I would like to create catalogs or example data, as well user or permission. I am not using the default App engine user instead I am using webapp2 User auth session base model.
Thanks
You can use the bulkloader: https://developers.google.com/appengine/docs/python/tools/uploadingdata
Or upload data to the blobstore and move it to the datastore.
This is a large topic but, I am using Java code running in task queues to do this.
Much easier to create random test and demo data through code.
Much more friendly to unit testing.
This requires no dependencies. It is just code running and accessing the datastore.
Sometimes easier to manipulate the datastore through code instead of scripts when logic is involved in the changes.
Allows us to upload new task definitions (a Java classes) embedded in a new app version. Then, we trigger the tasks executions by calling a servlet URL. These task classes are then removed from the next app version.
And using tasks, you get around the request execution timeout. If a task is long running, we split it as sequential tasks. When a task completes, it queues the next one automatically.
Of course, this requires a fair amount of coding but is really simple and flexible at the same time.
There is search support (experimental) for python and Java, and eventually Go also may supported. Till then, how can I do minimal search on my records?
Through the mailing list, I got an idea about proxying the search request to a python backend. I am still evaluating GAE, and not used backends yet. To setup the search with a python backed, do I have to send all the request (from Go) to data store through this backend? How practical is it, and disadvantages? Any tutorial on this.
thanks.
You could make a RESTful Python app that with a few handlers and your Go app would make urlfetches to the Python app. Then you can run the Python app as either a backend or a frontend (with a different version than your Go app). The first handler would receive a key as input, would fetch that entity from the datastore, and then would store the relevant info in the search index. The second handler would receive a query, do a search against the index, and return the results. You would need a handler for removing documents from the search index and any other operations you want.
Instead of the first handler receiving a key and fetching from the datastore you could also just send it the entity data in the fetch.
You could also use a service like IndexDen for now (especially if you don't have many entities to index):
http://indexden.com/
When making urlfetches keep in mind the quotas currently apply even when requesting URLs from your own app. There are two issues in the tracker requesting to have these quotas removed/increased when communicating with your own apps but there is no guarantee that will happen. See here:
http://code.google.com/p/googleappengine/issues/detail?id=8051
http://code.google.com/p/googleappengine/issues/detail?id=8052
There is full text search coming for the Go runtime very very very soon.
I'm developing a spring based web application that is deployed on Google app Engine.
I have a manager that store data in application scope. I'm using a spring bean (singleton) that hold a map and simply perform get and remove from the map,
however GAE is a distributed environment and this design has a problem since each application instance will have its own manager and request are not guaranteed to be made to the same application instance.
So I've looked around and found 2 possible solution:
use data store
use memcache
storing the data will cause a lot of read and writes and eventually I do not need the data to be saved.
second looked promising but Google mentioned that:
In general, an application should not expect a cached value to always be available.
and I need a guarantee that when I ask my manger to get a value it will return it.
is there any other solution?
did I miss anything?
A common solution is storing the values in a memcache, backed by the datastore.
First fetch the application-scope value from the memcache, and if the memcache returns zero-result (a cache-miss event), fetch the value from the datastore and put the fetched value on the memcache.
In this way, the next fetch to the memcache would return your application-scope data, reducing the need for a (relatively) costly read to the datastore.
App Engine memcache is a least-recently used cache, so values that are frequently read would suffer from few cache-miss event.
A more complex solution to have a application-scope value is to store the values in-memory at a resident backend, and have all your other instances request/update the value from/to this particular backend via a servlet handler.