Can I make a collection append-only in Cloud Firestore? - database

I want to write game events and audit logs from my app to Cloud Firestore. Once written, I don't want the user to be able to modify or delete these events/logs.
How can I do this?

Rules in Cloud Firestore makes it quite simply to make a collection, or even the entire database, into an append-only system from the mobile & web clients.
Example
Below is a set of rules that will turn the root collection audit_logs into an append-only collection.
service cloud.firestore {
match /databases/{database}/documents/ {
function permission_granted() {
return request.auth != null; // Change this to your logic.
}
match /audit_logs/{log} {
allow update,delete: if false;
allow read, create, list: if permission_granted();
}
}
}
Let's break down the most important pieces.
Function: permission_granted()
function permission_granted() {
return request.auth != null; // Change this to your logic.
}
This one is just a placeholder for however you want to restrict insert new documents or reading existing documents in the collection. In this case it's letting anyone who has signed in using Firebase Auth -> You might want it more restrictive.
It just returns true or false, which we'll use later to actually enforce.
Match: Root collection audit_log
match /audit_logs/{log} {
...
}
This one's simple, we're just matching against any requests regarding for the root collect called audit_logs. The document Id in questions is made available via $(log) due to the {log} piece.
Blocking any modification that is append-only
allow update,delete: if false;
The 2 write methods that are not append-only are update and delete, so here we just universally disallow any mobile & web SDK from performing them.
Allow the rest
allow read, create, list: if permission_granted();
Lastly, using the permission_granted function we set up earlier, we allow reading, listing, and creating new documents in the collection.

Related

How to integrate custom authentication provider into IdentityServer4

Is it possible to somehow extend IdentityServer4 to run custom authentication logic? I have the requirement to validate credentials against a couple of existing custom identity systems and struggle to find an extension point to do so (they use custom protocols).
All of these existing systems have the concept on an API key which the client side knows. The IdentityServer job should now be to validate this API key and also extract some existing claims from the system.
I imagine to do something like this:
POST /connect/token
custom_provider_name=my_custom_provider_1&
custom_provider_api_key=secret_api_key
Then I do my logic to call my_custom_provider_1, validate the API key, get the claims and pass them back to the IdentityServer flow to do the rest.
Is this possible?
I'm assuming you have control over the clients, and the requests they make, so you can make the appropriate calls to your Identity Server.
It is possible to use custom authentication logic, after all that is what the ResourceOwnerPassword flow is all about: the client passes information to the Connect/token endpoint and you write code to decide what that information means and decide whether this is enough to authenticate that client. You'll definitely be going off the beaten track to do what you want though, because convention says that the information the client passes is a username and a password.
In your Startup.ConfigureServices you will need to add your own implementation of an IResourceOwnerPasswordValidator, kind of like this:
services.AddTransient<IResourceOwnerPasswordValidator, ResourceOwnerPasswordValidator>();
Then in the ValidateAsync method of that class you can do whatever logic you like to decide whether to set the context.Result to a successful GrantValidationResult, or a failed one. One thing that can help you in that method, is that the ResourceOwnerPasswordValidationContext has access to the raw request. So any custom fields you add into the original call to the connect/token endpoint will be available to you. This is where you could add your custom fields (provider name, api key etc).
Good luck!
EDIT: The above could work, but is really abusing a standard grant/flow. Much better is the approach found by the OP to use the IExtensionGrantValidator interface to roll your own grant type and authentication logic. For example:
Call from client to identity server:
POST /connect/token
grant_type=my_crap_grant&
scope=my_desired_scope&
rhubarb=true&
custard=true&
music=ska
Register your extension grant with DI:
services.AddTransient<IExtensionGrantValidator, MyCrapGrantValidator>();
And implement your grant validator:
public class MyCrapGrantValidator : IExtensionGrantValidator
{
// your custom grant needs a name, used in the Post to /connect/token
public string GrantType => "my_crap_grant";
public async Task ValidateAsync(ExtensionGrantValidationContext context)
{
// Get the values for the data you expect to be used for your custom grant type
var rhubarb = context.Request.Raw.Get("rhubarb");
var custard = context.Request.Raw.Get("custard");
var music = context.Request.Raw.Get("music");
if (string.IsNullOrWhiteSpace(rhubarb)||string.IsNullOrWhiteSpace(custard)||string.IsNullOrWhiteSpace(music)
{
// this request doesn't have the data we'd expect for our grant type
context.Result = new GrantValidationResult(TokenRequestErrors.InvalidGrant);
return Task.FromResult(false);
}
// Do your logic to work out, based on the data provided, whether
// this request is valid or not
if (bool.Parse(rhubarb) && bool.Parse(custard) && music=="ska")
{
// This grant gives access to any client that simply makes a
// request with rhubarb and custard both true, and has music
// equal to ska. You should do better and involve databases and
// other technical things
var sub = "ThisIsNotGoodSub";
context.Result = new GrantValidationResult(sub,"my_crap_grant");
Task.FromResult(0);
}
// Otherwise they're unauthorised
context.Result = new GrantValidationResult(TokenRequestErrors.UnauthorizedClient);
return Task.FromResult(false);
}
}

Securing system-generated nodes in firebase

I've been going through the rules guide but haven't found an answer to this.
App users are able to submit "scores" of different types, which are then processed in JS and written to a "ranking" node. I have it set up so that every time a new score is submitted, the rankings are automatically recalculated and a new child is written if the user doesn't exist or updated if the user exists.
My question is how to secure this "ranking" node. Everyone should be able to read it, nobody except the system should be able to write it. This would prevent people from submitting their own rankings and aggregate scores.
EDIT
This is the operation:
Ref.child('rankings').child(uid).once('value', function (snapshot) {
if (snapshot.exists()) {
snapshot.ref().update(user); //user object created upstream
} else {
var payload = {};
payload[uid] = user;
snapshot.ref().parent().update(payload);
}
});
How would I add custom authentication to this call? Also, since I'm using AngularJS, is there any way to hide this custom token or would I have to route it through a backend server?
The key part of your problem definition is:
only the system should be able to write it.
This requires that you are able to recognize "the system" in your security rules. Since Firebase security is user-based, you'll have to make your "system" into a user. You can do this by either recording the uid from a regular user account or by minting a custom token for your "system".
Once you have that, the security for your ranking node becomes:
".read": true,
".write": "auth.uid == 'thesystem'"
In the above I assume you mint a custom token and specify thesystem as the uid.

way to script an export of all AD users vcards

i'm looking for an easy way to export all active directory users info into unique vcards for each. there is some info i'd like to leave out of the vcard like home phone, and emergency contact. i've looked around the web and have little luck finding anything. any help would be appreciated.
I doubt there will be a very easy way. Ultimately, you need to
enumerate all your users (or a subset therefore)
iterate over the resulting list of users
export each user's data to a VCard
For the searching & iterating part, you can use a PrincipalSearcher to do your searching:
// create your domain context
using (PrincipalContext ctx = new PrincipalContext(ContextType.Domain))
{
// define a "query-by-example" principal - here, we search for a UserPrincipal
// this "QBE" user would give you the ability to further limit what you get back
// as results from the searcher
UserPrincipal qbeUser = new UserPrincipal(ctx);
// create your principal searcher passing in the QBE principal
PrincipalSearcher srch = new PrincipalSearcher(qbeUser);
// find all matches
foreach(var found in srch.FindAll())
{
UserPrincipal foundUser = found as UserPrincipal;
if(foundUser != null)
{
ExportToVCard(foundUser);
}
}
}
And now all that's left to do is create the ExportToVCard function :-) See e.g. this blog post with code samples and further links for help.
If you haven't already - absolutely read the MSDN article Managing Directory Security Principals in the .NET Framework 3.5 which shows nicely how to make the best use of the new features in System.DirectoryServices.AccountManagement. Or see the MSDN documentation on the System.DirectoryServices.AccountManagement namespace.
If you just want the data itself, I would take a look at Softerra's free LDAP Browser, found here.
Setup a profile for your directory server - once it's connected in the browser, you'll see the default schema for the BaseDN you've provided during the initial setup. On the server icon, right click, and hit "Export Data".
The export wizard will walk you through most of the process, but the important part is Step 3. If you want to find all users, just set your search filter to (objectClass=user), make sure your search scope is SubTree, and then then edit what attributes you want to return.
You'll have to process the results into VCards, but this is the easiest\fastest way of getting all the users and attributes that you want.

Google Custom Search and Passing along Querystring Variables

I am working on a web app project that has been in development for long time. The app has two sides, the majority of the site is publicly accessible. However, there are sections that require the user to be logged in before they can access certain content.
When the user logs in they get a sessionid (GUID) which is stored in a table in the database which tracks all sort for data about the user and their activity.
Every page of the app was written to look if this session id variable exists or not in the querystring. If a user tries to access one of these protected areas, the app checks to see if this sessiond variable is in the querystring. If i is not, they are redirected to the login screen.
The flow of the site moves has the user moving seamlessly from secured areas to non-secured areas, back and forth, etc.
So we did a test run with the Google Custom Search and it does an awesome job picking up all our dynamic content in these public areas. However, we have not been able to figure out how to pass the sessionid along with the search results IF the user is logged in already.
Is it possible to pas querystring variables that already exist in the url along with the search results?
As far as I know, this is not possible. Google doesn't give you the possibilty to modify the URL's of the Search Results in their Custom Search.
A possible solution would be to store your Session-Key to a Cookie, rather than passing it with every URL.
Use the parseQueryFromUrl function
function parseQueryFromUrl () {
var queryParamName = "q";
var search = window.location.search.substr(1);
var parts = search.split('&');
for (var i = 0; i < parts.length; i++) {
var keyvaluepair = parts[i].split('=');
if (decodeURIComponent(keyvaluepair[0]) == queryParamName) {
return decodeURIComponent(keyvaluepair[1].replace(/\+/g, ' '));
}
}
return '';
}
Select RESULTS ONLY option in the Look & Feel and it will provide you with the code.
www.google.com/cse/

How to delete all datastore in Google App Engine?

Does anyone know how to delete all datastore in Google App Engine?
If you're talking about the live datastore, open the dashboard for your app (login on appengine) then datastore --> dataviewer, select all the rows for the table you want to delete and hit the delete button (you'll have to do this for all your tables).
You can do the same programmatically through the remote_api (but I never used it).
If you're talking about the development datastore, you'll just have to delete the following file: "./WEB-INF/appengine-generated/local_db.bin". The file will be generated for you again next time you run the development server and you'll have a clear db.
Make sure to clean your project afterwards.
This is one of the little gotchas that come in handy when you start playing with the Google Application Engine. You'll find yourself persisting objects into the datastore then changing the JDO object model for your persistable entities ending up with obsolete data that'll make your app crash all over the place.
The best approach is the remote API method as suggested by Nick, he's an App Engine engineer from Google, so trust him.
It's not that difficult to do, and the latest 1.2.5 SDK provides the remote_shell_api.py out of the shelf. So go to download the new SDK. Then follow the steps:
connect remote server in your commandline: remote_shell_api.py yourapp /remote_api
The shell will ask for your login info, and if authorized, will make a Python shell for you. You need setup url handler for /remote_api in your app.yaml
fetch the entities you'd like to delete, the code looks something like:
from models import Entry
query = Entry.all(keys_only=True)
entries =query.fetch(1000)
db.delete(entries)
\# This could bulk delete 1000 entities a time
Update 2013-10-28:
remote_shell_api.py has been replaced by remote_api_shell.py, and you should connect with remote_api_shell.py -s your_app_id.appspot.com, according to the documentation.
There is a new experimental feature Datastore Admin, after enabling it in app settings, you can bulk delete as well as backup your datastore through the web ui.
The fastest and efficient way to handle bulk delete on Datastore is by using the new mapper API announced on the latest Google I/O.
If your language of choice is Python, you just have to register your mapper in a mapreduce.yaml file and define a function like this:
from mapreduce import operation as op
def process(entity):
yield op.db.Delete(entity)
On Java you should have a look to this article that suggests a function like this:
#Override
public void map(Key key, Entity value, Context context) {
log.info("Adding key to deletion pool: " + key);
DatastoreMutationPool mutationPool = this.getAppEngineContext(context)
.getMutationPool();
mutationPool.delete(value.getKey());
}
EDIT:
Since SDK 1.3.8, there's a Datastore admin feature for this purpose
You can clear the development server datastore when you run the server:
/path/to/dev_appserver.py --clear_datastore=yes myapp
You can also abbreviate --clear_datastore with -c.
If you have a significant amount of data, you need to use a script to delete it. You can use remote_api to clear the datastore from the client side in a straightforward manner, though.
Here you go: Go to Datastore Admin, and then select the Entity type you want to delete and click Delete. Mapreduce will take care of deleting!
There are several ways you can use to remove entries from App Engine's Datastore:
First, think whether you really need to remove entries. This is expensive and it might be cheaper to not remove them.
You can delete all entries by hand using the Datastore Admin.
You can use the Remote API and remove entries interactively.
You can remove the entries programmatically using a couple lines of code.
You can remove them in bulk using Task Queues and Cursors.
Or you can use Mapreduce to get something more robust and fancier.
Each one of these methods is explained in the following blog post:
http://www.shiftedup.com/2015/03/28/how-to-bulk-delete-entries-in-app-engine-datastore
Hope it helps!
The zero-setup way to do this is to send an execute-arbitrary-code HTTP request to the admin service that your running app already, automatically, has:
import urllib
import urllib2
urllib2.urlopen('http://localhost:8080/_ah/admin/interactive/execute',
data = urllib.urlencode({'code' : 'from google.appengine.ext import db\n' +
'db.delete(db.Query())'}))
Source
I got this from http://code.google.com/appengine/articles/remote_api.html.
Create the Interactive Console
First, you need to define an interactive appenginge console. So, create a file called appengine_console.py and enter this:
#!/usr/bin/python
import code
import getpass
import sys
# These are for my OSX installation. Change it to match your google_appengine paths. sys.path.append("/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine")
sys.path.append("/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/yaml/lib")
from google.appengine.ext.remote_api import remote_api_stub
from google.appengine.ext import db
def auth_func():
return raw_input('Username:'), getpass.getpass('Password:')
if len(sys.argv) < 2:
print "Usage: %s app_id [host]" % (sys.argv[0],)
app_id = sys.argv[1]
if len(sys.argv) > 2:
host = sys.argv[2]
else:
host = '%s.appspot.com' % app_id
remote_api_stub.ConfigureRemoteDatastore(app_id, '/remote_api', auth_func, host)
code.interact('App Engine interactive console for %s' % (app_id,), None, locals())
Create the Mapper base class
Once that's in place, create this Mapper class. I just created a new file called utils.py and threw this:
class Mapper(object):
# Subclasses should replace this with a model class (eg, model.Person).
KIND = None
# Subclasses can replace this with a list of (property, value) tuples to filter by.
FILTERS = []
def map(self, entity):
"""Updates a single entity.
Implementers should return a tuple containing two iterables (to_update, to_delete).
"""
return ([], [])
def get_query(self):
"""Returns a query over the specified kind, with any appropriate filters applied."""
q = self.KIND.all()
for prop, value in self.FILTERS:
q.filter("%s =" % prop, value)
q.order("__key__")
return q
def run(self, batch_size=100):
"""Executes the map procedure over all matching entities."""
q = self.get_query()
entities = q.fetch(batch_size)
while entities:
to_put = []
to_delete = []
for entity in entities:
map_updates, map_deletes = self.map(entity)
to_put.extend(map_updates)
to_delete.extend(map_deletes)
if to_put:
db.put(to_put)
if to_delete:
db.delete(to_delete)
q = self.get_query()
q.filter("__key__ >", entities[-1].key())
entities = q.fetch(batch_size)
Mapper is supposed to be just an abstract class that allows you to iterate over every entity of a given kind, be it to extract their data, or to modify them and store the updated entities back to the datastore.
Run with it!
Now, start your appengine interactive console:
$python appengine_console.py <app_id_here>
That should start the interactive console. In it create a subclass of Model:
from utils import Mapper
# import your model class here
class MyModelDeleter(Mapper):
KIND = <model_name_here>
def map(self, entity):
return ([], [entity])
And, finally, run it (from you interactive console):
mapper = MyModelDeleter()
mapper.run()
That's it!
You can do it using the web interface. Login into your account, navigate with links on the left hand side. In Data Store management you have options to modify and delete data. Use respective options.
I've created an add-in panel that can be used with your deployed App Engine apps. It lists the kinds that are present in the datastore in a dropdown, and you can click a button to schedule "tasks" that delete all entities of a specific kind or simply everything. You can download it here:
http://code.google.com/p/jobfeed/wiki/Nuke
For Python, 1.3.8 includes an experimental admin built-in for this. They say: "enable the following builtin in your app.yaml file:"
builtins:
- datastore_admin: on
"Datastore delete is currently available only with the Python runtime. Java applications, however, can still take advantage of this feature by creating a non-default Python application version that enables Datastore Admin in the app.yaml. Native support for Java will be included in an upcoming release."
Open "Datastore Admin" for your application and enable Admin. Then all of your entities will be listed with check boxes. You can simply select the unwanted entites and delete them.
This is what you're looking for...
db.delete(Entry.all(keys_only=True))
Running a keys-only query is much faster than a full fetch, and your quota will take a smaller hit because keys-only queries are considered small ops.
Here's a link to an answer from Nick Johnson describing it further.
Below is an end-to-end REST API solution to truncating a table...
I setup a REST API to handle database transactions where routes are directly mapped through to the proper model/action. This can be called by entering the right url (example.com/inventory/truncate) and logging in.
Here's the route:
Route('/inventory/truncate', DataHandler, defaults={'_model':'Inventory', '_action':'truncate'})
Here's the handler:
class DataHandler(webapp2.RequestHandler):
#basic_auth
def delete(self, **defaults):
model = defaults.get('_model')
action = defaults.get('_action')
module = __import__('api.models', fromlist=[model])
model_instance = getattr(module, model)()
result = getattr(model_instance, action)()
It starts by loading the model dynamically (ie Inventory found under api.models), then calls the correct method (Inventory.truncate()) as specified in the action parameter.
The #basic_auth is a decorator/wrapper that provides authentication for sensitive operations (ie POST/DELETE). There's also an oAuth decorator available if you're concerned about security.
Finally, the action is called:
def truncate(self):
db.delete(Inventory.all(keys_only=True))
It looks like magic but it's actually very straightforward. The best part is, delete() can be re-used to handle deleting one-or-many results by adding another action to the model.
You can Delete All Datastore by deleting all Kinds One by One.
with google appengine dash board. Please follow these Steps.
Login to https://console.cloud.google.com/datastore/settings
Click Open Datastore Admin. (Enable it if not enabled.)
Select all Entities and press delete.(This Step run a map reduce job for deleting all selected Kinds.)
for more information see This image http://storage.googleapis.com/bnifsc/Screenshot%20from%202015-01-31%2023%3A58%3A41.png
If you have a lot of data, using the web interface could be time consuming. The App Engine Launcher utility lets you delete everything in one go with the 'Clear datastore on launch' checkbox. This utility is now available for both Windows and Mac (Python framework).
For the development server, instead of running the server through the google app engine launcher, you can run it from the terminal like:
dev_appserver.py --port=[portnumber] --clear_datastore=yes [nameofapplication]
ex: my application "reader" runs on port 15080. After modify the code and restart the server, I just run "dev_appserver.py --port=15080 --clear_datastore=yes reader".
It's good for me.
Adding answer about recent developments.
Google recently added datastore admin feature. You can backup, delete or copy your entities to another app using this console.
https://developers.google.com/appengine/docs/adminconsole/datastoreadmin#Deleting_Entities_in_Bulk
I often don't want to delete all the data store so I pull a clean copy of /war/WEB-INF/local_db.bin out source control. It may just be me but it seems even with the Dev Mode stopped I have to physically remove the file before pulling it. This is on Windows using the subversion plugin for Eclipse.
PHP variation:
import com.google.appengine.api.datastore.Query;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
define('DATASTORE_SERVICE', DatastoreServiceFactory::getDatastoreService());
function get_all($kind) {
$query = new Query($kind);
$prepared = DATASTORE_SERVICE->prepare($query);
return $prepared->asIterable();
}
function delete_all($kind, $amount = 0) {
if ($entities = get_all($kind)) {
$r = $t = 0;
$delete = array();
foreach ($entities as $entity) {
if ($r < 500) {
$delete[] = $entity->getKey();
} else {
DATASTORE_SERVICE->delete($delete);
$delete = array();
$r = -1;
}
$r++; $t++;
if ($amount && $amount < $t) break;
}
if ($delete) {
DATASTORE_SERVICE->delete($delete);
}
}
}
Yes it will take time and 30 sec. is a limit. I'm thinking to put an ajax app sample to automate beyond 30 sec.
for amodel in db.Model.__subclasses__():
dela=[]
print amodel
try:
m = amodel()
mq = m.all()
print mq.count()
for mw in mq:
dela.append(mw)
db.delete(dela)
#~ print len(dela)
except:
pass
If you're using ndb, the method that worked for me for clearing the datastore:
ndb.delete_multi(ndb.Query(default_options=ndb.QueryOptions(keys_only=True)))
For any datastore that's on app engine, rather than local, you can use the new Datastore API. Here's a primer for how to get started.
I wrote a script that deletes all non-built in entities. The API is changing pretty rapidly, so for reference, I cloned it at commit 990ab5c7f2063e8147bcc56ee222836fd3d6e15b
from gcloud import datastore
from gcloud.datastore import SCOPE
from gcloud.datastore.connection import Connection
from gcloud.datastore import query
from oauth2client import client
def get_connection():
client_email = 'XXXXXXXX#developer.gserviceaccount.com'
private_key_string = open('/path/to/yourfile.p12', 'rb').read()
svc_account_credentials = client.SignedJwtAssertionCredentials(
service_account_name=client_email,
private_key=private_key_string,
scope=SCOPE)
return Connection(credentials=svc_account_credentials)
def connect_to_dataset(dataset_id):
connection = get_connection()
datastore.set_default_connection(connection)
datastore.set_default_dataset_id(dataset_id)
if __name__ == "__main__":
connect_to_dataset(DATASET_NAME)
gae_entity_query = query.Query()
gae_entity_query.keys_only()
for entity in gae_entity_query.fetch():
if entity.kind[0] != '_':
print entity.kind
entity.key.delete()
continuing the idea of svpino it is wisdom to reuse records marked as delete. (his idea was not to remove, but mark as "deleted" unused records). little bit of cache/memcache to handle working copy and write only difference of states (before and after desired task) to datastore will make it better. for big tasks it is possible to write itermediate difference chunks to datastore to avoid data loss if memcache disappeared. to make it loss-proof it is possible to check integrity/existence of memcached results and restart task (or required part) to repeat missing computations. when data difference is written to datastore, required computations are discarded in queue.
other idea similar to map reduced is to shard entity kind to several different entity kinds, so it will be collected together and visible as single entity kind to final user. entries are only marked as "deleted". when "deleted" entries amount per shard overcomes some limit, "alive" entries are distributed between other shards, and this shard is closed forever and then deleted manually from dev console (guess at less cost) upd: seems no drop table at console, only delete record-by-record at regular price.
it is possible to delete by query by chunks large set of records without gae failing (at least works locally) with possibility to continue in next attempt when time is over:
qdelete.getFetchPlan().setFetchSize(100);
while (true)
{
long result = qdelete.deletePersistentAll(candidates);
LOG.log(Level.INFO, String.format("deleted: %d", result));
if (result <= 0)
break;
}
also sometimes it useful to make additional field in primary table instead of putting candidates (related records) into separate table. and yes, field may be unindexed/serialized array with little computation cost.
For all people that need a quick solution for the dev server (as time of writing in Feb. 2016):
Stop the dev server.
Delete the target directory.
Rebuild the project.
This will wipe all data from the datastore.
I was so frustrated about existing solutions for deleting all data in the live datastore that I created a small GAE app that can delete quite some amount of data within its 30 seconds.
How to install etc: https://github.com/xamde/xydra
For java
DatastoreService db = DatastoreServiceFactory.getDatastoreService();
List<Key> keys = new ArrayList<Key>();
for(Entity e : db.prepare(new Query().setKeysOnly()).asIterable())
keys.add(e.getKey());
db.delete(keys);
Works well in Development Server
You have 2 simple ways,
#1: To save cost, delete the entire project
#2: using ts-datastore-orm:
https://www.npmjs.com/package/ts-datastore-orm
await Entity.truncate();
The truncate can delete around 1K rows per seconds
Here's how I did this naively from a vanilla Google Cloud Shell (no GAE) with python3:
from google.cloud import datastore
client = datastore.Client()
query.keys_only()
for counter, entity in enumerate(query.fetch()):
if entity.kind.startswith('_'): # skip reserved kinds
continue
print(f"{counter}: {entity.key}")
client.delete(entity.key)
This takes a very long time even with a relatively small amount of keys but it works.
More info about the Python client library: https://googleapis.dev/python/datastore/latest/client.html
As of 2022, there are two ways to delete a kind from a (largeish) datastore to the best of my knowledge. Google recommends using a Dataflow template. The template will basically pull each entity one by one subject to a GQL query, and then delete it. Interestingly, if you are deleting a large number of rows (> 10m), you will run into datastore troubles; as it will fail to provide enough capacity, and your operations to the datastore will start timing out. However, only the kind you are mass deleting from will be effected.
If you have less than 10m rows, you can just use this go script:
import (
"cloud.google.com/go/datastore"
"context"
"fmt"
"google.golang.org/api/option"
"log"
"strings"
"sync"
"time"
)
const (
batchSize = 10000 // number of keys to get in a single batch
deleteBatchSize = 500 // number of keys to delete in a single batch
projectID = "name-of-your-GCP-project"
serviceAccount = "path-to-sa-file"
table = "kind-to-delete"
)
func min(a, b int) int {
if a < b {
return a
}
return b
}
func deleteBatch(table string) int {
ctx := context.Background()
client, err := datastore.NewClient(ctx, projectID, option.WithCredentialsFile(serviceAccount))
if err != nil {
log.Fatalf("Failed to open client: %v", err)
}
defer client.Close()
query := datastore.NewQuery(table).KeysOnly().Limit(batchSize)
keys, err := client.GetAll(ctx, query, nil)
if err != nil {
fmt.Printf("%s Failed to get %d keys : %v\n", table, batchSize, err)
return -1
}
var wg sync.WaitGroup
for i := 0; i < len(keys); i += deleteBatchSize {
wg.Add(1)
go func(i int) {
batch := keys[i : i+min(len(keys)-i, deleteBatchSize)]
if err := client.DeleteMulti(ctx, batch); err != nil {
// not a big problem, we'll get them next time ;)
fmt.Printf("%s Failed to delete multi: %v", table, err)
}
wg.Done()
}(i)
}
wg.Wait()
return len(keys)
}
func main() {
var globalStartTime = time.Now()
fmt.Printf("Deleting \033[1m%s\033[0m\n", table)
for {
startTime := time.Now()
count := deleteBatch(table)
if count >= 0 {
rate := float64(count) / time.Since(startTime).Seconds()
fmt.Printf("Deleted %d keys from %s in %.2fs, rate %.2f keys/s\n", count, table, time.Since(startTime).Seconds(), rate)
if count == 0 {
fmt.Printf("%s is now clear.\n", table)
break
}
} else {
fmt.Printf("Retrying after short cooldown\n")
time.Sleep(10 * time.Second)
}
}
fmt.Printf("Total time taken %s.\n", time.Since(globalStartTime))
}

Resources