'IntegrityError: column username is not unique' while using Django User model in test - database

While running some tests, I started to get an IntegrityError in my setUp function. Here is my code:
def setUp(self):
self.client = Client()
self.emplUser = User.objects.create_user('employee#email.com', 'employee#email.com', 'nothing')
self.servUser1 = User.objects.create_user('thebestcompany#email.com', 'thebestcompany#email.com', 'nothing')
self.servUser2 = User.objects.create_user('theothercompany#email.com', 'theothercompany#email.com', 'nothing')
self.custUser1 = User.objects.create_user('john#email.com', 'john#email.com', 'nothing')
self.custUser2 = User.objects.create_user('marcus#email.com', 'marcus#email.com', 'nothing')
... save users here ...
Im wondering as to how this IntegrityError keeps getting raised. I delete all the users in the tearDown function and am using sqlite3 as my DB backend. I see no conflicting usernames and in production, I have no issues with using emails as usernames.
This started happening only half an hour ago, out of the blue. Has anyone run into a solution to this problem?

I'm sure you're not suffering this problem anymore since you wrote 18 months ago, but I had this problem too, and finally figured out what was happening. When using Postgres for test cases, DB changes are done in a transaction and simply rolled back, and so it is not necessary to explicitly clear tables in tearDown(), however, in SQLite, it is necessary.

Late but more appropriate answer, for the people who would land there after a google search:
When there is interaction with the database in your tests (typically, creating model instances), you should subclass your test class from django.test.TestCase, which flushes the database after each test is run.
Then you don't need to write a tedious tearDown method in all your test classes.
See https://docs.djangoproject.com/en/dev/topics/testing/overview/#writing-tests

Related

Create a Database Entry in Django Before Running Tests

I'm using Django 1.4 with Python 2.7 on Ubuntu 13.04. The Django project I'm working on has an entire test suite using Nose and Coverage. I need to create a user in the database prior to any tests being run. Where would I do this?
system_user = User(username=accounts_constants.SYSTEM_USER)
system_user.save()
The motivation for this stems from the existence of a "system robot" that handles a lot of the database transactions. I have created a full tracking suite within our project. If a user changes any sensitive information we track it. If the system makes those changes we want the system user to be listed in the tracking information.
There are several places in the code that needed to be modified to support this change. Obviously the test database doesn't have this user. I would like to add it prior to any tests running (but after the database is created).
I've been digging around but can't see exactly where the database is created for testing purposes. My first attempt was in the init of our test suite runner.
class CustomTestSuiteRunner(NoseTestSuiteRunner):
"""
Runs testing suite and adds code coverage report.
"""
def __init__(self, *args, **kwargs):
super(CustomTestSuiteRunner, self).__init__(*args, **kwargs)
self.coverage = coverage.coverage()
self.coverage.use_cache(0) # don't use caching with coverage.py
system_user = User(username=accounts_constants.SYSTEM_USER)
system_user.save()
This failed because it was still using the development database (the system user already exists on this database).
I tried adding the user further along in the testing sequence but can't seem to find where the appropriate place would be.
Any direction would be greatly appreciated.
EDIT1:
From girasquid's suggested solution:
I don't want to modify each setUp to super to the CustomTestCase. Instead I am trying to do it in __init__.
class CustomTestCase(TestCase):
def __init__(self, *args, **kwargs):
super(CustomTestCase, self).__init__(*args, **kwargs)
try:
system_user = User.objects.create(
username=accounts_constants.SYSTEM_USER)
system_user.save()
except psycopg2.IntegrityError:
pass
But I am getting an IntegrityError from nose. Any ideas why this might be?
lib/python2.7/site-packages/nose/loader.py", line 518, in makeTest
return self._makeTest(obj, parent)
IntegrityError: duplicate key value violates unique constraint "auth_user_username_key"
EDIT2:
I was convinced by our team to implement a case-by-case setUp change to include the user rather than a system-wide change.
I would define setUp on my test cases for this:
class MyTestCase(unittest.TestCase):
def setUp(self):
self.system_user = User.objects.create(username=accounts_constants.SYSTEM_USER)
If you have a lot of test cases you need to do this for, it might be useful to put this on a base TestCase and then inherit from that for all your tests.

PyroCMS / Codeigniter : too many session entries in db

I'm using for a small website the pyrocms / codeigniter combo.
after adding some content, i checked the db and saw that:
is this a normal behaviour? multiple session_ids for one user with the same ip?
i can't imagine that this is correct.
my session config looks like:
$config['sess_cookie_name'] = 'pyrocms' . (ENVIRONMENT !== 'production' ? '_' .
ENVIRONMENT : '');
$config['sess_expiration'] = 14400;
$config['sess_expire_on_close'] = true;
$config['sess_encrypt_cookie'] = true;
$config['sess_use_database'] = true;
// don't change anything but the 'ci_sessions' part of this. The MSM depends on the 'default_' prefix
$config['sess_table_name'] = 'default_ci_sessions';
$config['sess_match_ip'] = true;
$config['sess_match_useragent'] = true;
$config['sess_time_to_update'] = 300;
i did not change on line of code affecting the session class or something like that.
the red hat rows belong to a 15min cron-job. this is fine i think.
everytime a refresh the page two or three new session_entries are added...
Yes, this is normal. The CI session class automatically generates a new ID periodically. (Every 5 minutes, by default.) This is part of the security inherent in using CI sessions instead of native PHP sessions. Garbage collection will take care of this, you do not need to do anything.
You can read more about the session id behavior in the CI manual. This is an excerpt copied from that page.
The user's unique Session ID (this is a statistically random string
with very strong entropy, hashed with MD5 for portability, and
regenerated (by default) every five minutes)
This behavior is by design. There is nothing to fix. The session class has built in garbage collection that deletes old entries as needed. I have many projects using code igniter for several years. This is what it does.
If it really bothers you, you can alter the timeout in the main CI config file. Change the line
$config['sess_time_to_update'] = 300 (the 5 minute refresh period)
to a number greater than
$config['sess_expiration'] (default 7200)
This will cause the session to timeout before it is regenerated. This is inherently less secure in theory, but unless you are transacting sensitive data, it is probably irrelevant in practice.
But again, this is by design as part of the many layers of CI sessions. These and other features are what make it better than PHP native sessions. You can turn on profiling and see that the overhead for these queries is negligible, especially in light of all the other optimizations the framework provides.

PowerBuilder application login

I am using PowerBuilder PFC library to login to the database.
n_cst_appmanager/ pfc_open:
IF this.of_LogonDlg() > 0 THEN
Open(w_myapp_frame)
END IF
n_cst_appmanager/ pfc_logon:
SQLCA.DBMS = "ODBC"
SQLCA.AutoCommit = False
SQLCA.DBParm = "ConnectString='DSN=mytestdb;UID=" + as_userid + ";PWD=" + as_password + "'"
connect using SQLCA;
Now, once the user is logged in, there are few situations that I will need to connect to another database (for example, to copy some data there), so I would like to connect to the other database automatically, without displaying the login window again, therefore I would need to save the username and password of the user.
How can I save it? Do I need to save in the registry? Can you give some example please?
For example, I can get the user id in following way:
s_userid = gnv_app.of_GetUserID()
But I can not get the password. Can someone please help me how i can do it? Thanks a lot.
Actually, now that I'm paying attention to what you need instead of what you asked for <g>, and riffing off of Hugh's answer, why not just copy the transaction object?
n_cst_String lnv_String
ltr_NewConnect.DBMS = SQLCA.DBMS
ltr_NewConnect.AutoCommit = SQLCA.AutoCommit
ltr_NewConnect.DBParm = lnv_String.of_GlobalReplace (SQLCA.DBParm, "mytestdb", "myotherdb")
If I were doing this, I'd code a copy of all the transaction object fields, just in case the means of defining the connection changes.
I'm assuming the other database is the same type of database in order for this to make sense (so that it uses the same type of DBParm), but either way the principle may apply.
Good luck,
Terry.
There's nothing built into PFC and there's nothing automagic in PowerBuilder that will help you with this. Just create an instance variable and a function to access it. Maybe grab the n_cst_LogonAttrib from the Message.PowerObjectParm immediately after the call to of_LogonDlg() and grab the value from there. Or, further extend your n_cst_AppManager.pfc_Logon event. Or extend of_LogonDlg(), and model the capture after the way PFC does the user id.
Note that storing the password anywhere permanent and visible to other processes like the registry would be a security violation that many companies would not allow. Not a direction you want to go.
Good luck,
Terry.
You can parse them out of SQLCA.DBParm.
string ls_userID, ls_password
n_cst_string stringSrv
ls_userID = stringSrv.of_getKeyValue(SQLCA.DBParm, "UID", ";")
ls_password = stringSrv.of_getKeyValue(SQLCA.DBParm, "PWD", ";")
However, a good case can be made for capturing them in the appmanager if you know you will need them.
Having the same login credentials for different databases is a security concern. It's the sort of thing that leads to your company being in the news for the wrong reasons.

Why can't I update these custom fields in Salesforce?

Greetings,
Well I am bewildered. I have been tasked with updating a PHP script that uses the BulkAPI to upsert some data into the Opportunity entity.
This is all going well except that the Bulk API is returning this error for some clearly defined custom fields:
InvalidBatch : Field name not found : cv__Acknowledged__c
And similar.
I thought I finally found the problem when I discovered the WSDL version I was using was quite old (Partner WSDL). So I promptly regenerated the WSDL. Only problem? Enterprise, Partner, etc....all of them...do not include these fields. They're all coming from the Common Ground package and start with cv_
I even tried to find them in the object explorer in Workbench as well as the schema explorer in Force.com IDE.
So, please...lend me your experience. How can I update these values?
Thanks in advance!
Clif
Screenshots to prove I have the correct access:
EDIT -- Here is my code:
require_once 'soapclient/SforcePartnerClient.php';
require_once 'BulkApiClient.php';
$mySforceConnection = new SforcePartnerClient();
$mySoapClient = $mySforceConnection->createConnection(APP.'plugins'.DS.'salesforce_bulk_api_client'.DS.'vendors'.DS.'soapclient'.DS.'partner.wsdl.xml');
$mylogin = $mySforceConnection->login('redacted#redacted.com', 'redactedSessionredactedPassword');
$myBulkApiConnection = new BulkApiClient($mylogin->serverUrl, $mylogin->sessionId);
$job = new JobInfo();
$job->setObject('Opportunity');
$job->setOpertion('upsert');
$job->setContentType('CSV');
$job->setConcurrencyMode('Parallel');
$job->setExternalIdFieldName('Id');
$job = $myBulkApiConnection->createJob($job);
$batch = $myBulkApiConnection->createBatch($job, $insert);
$myBulkApiConnection->updateJobState($job->getId(), 'Closed');
$times = 1;
while($batch->getState() == 'Queued' || $batch->getState() == 'InProgress')
{
$batch = $myBulkApiConnection->getBatchInfo($job->getId(), $batch->getId());
sleep(pow(1.5, $times++));
}
$batchResults = $myBulkApiConnection->getBatchResults($job->getId(), $batch->getId());
echo "Number of records processed: " . $batch->getNumberRecordsProcessed() . "\n";
echo "Number of records failed: " . $batch->getNumberRecordsFailed() . "\n";
echo "stateMessage: " . $batch->getStateMessage() . "\n";
if($batch->getNumberRecordsFailed() > 0 || $batch->getNumberRecordsFailed() == $batch->getNumberRecordsProcessed())
{
echo "Failures detected. Batch results:\n".$batchResults."\nEnd batch.\n";
}
And lastly, an example of the CSV data being sent:
"Id","AccountId","Amount","CampaignId","CloseDate","Name","OwnerId","RecordTypeId","StageName","Type","cv__Acknowledged__c","cv__Payment_Type__c","ER_Acknowledgment_Type__c"
"#N/A","0018000000nH16fAAC","100.00","70180000000nktJ","2010-10-29","Gary Smith $100.00 Single Donation 10/29/2010","00580000001jWnq","01280000000F7c7AAC","Received","Individual Gift","Not Acknowledged","Credit Card","Email"
"#N/A","0018000000nH1JtAAK","30.00","70180000000nktJ","2010-12-20","Lisa Smith $30.00 Single Donation 12/20/2010","00580000001jWnq","01280000000F7c7AAC","Received","Individual Gift","Not Acknowledged","Credit Card","Email"
After 2 weeks, 4 cases, dozens of e-mails and phone calls, 3 bulletin board posts, and 1 Stackoverflow question, I finally got a solution.
The problem was quite simple in the end. (which makes all of that all the more frustrating)
As stated, the custom fields I was trying to update live in the Convio Common Ground package. Apparently our install has 2 licenses for this package. None of the licenses were assigned to my user account.
It isn't clear what is really gained/lost by not having the license other than API access. As the rest of this thread demonstrates, I was able to see and update the fields in every other way.
If you run into this, you can view the licenses on the Manage Packages page in Setup. Drill through to the package in question and it should list the users who are licensed to use it.
Thanks to SimonF's professional and timely assistance on the Developer Force bulletin boards:
http://boards.developerforce.com/t5/Perl-PHP-Python-Ruby-Development/Bulk-API-So-frustrated/m-p/232473/highlight/false#M4713
I really think this is a field level security issue. Is the field included in the opportunity layout for that user profile? Field level security picks the most restrictive option, so if you seem to have access from the setup screen but it's not included in the layout, I don't think the system will give you access.
If you're certain that your user's profile has FLS access to the fields and the assigned layouts include the fields, then I'd suggest looking into the definition of the package in question. I know the bulk API allows use of fields in managed packages normally (I've done this).
My best guess at this point is that your org has installed multiple versions of this package over time. Through component deprecation, it's possible the package author deprecated these custom fields. Take a look at two places once you've logged into salesforce:
1.) The package definition page. It should have details about what package version was used when the package was first installed and what package version you're at now.
2.) The page that has WSDL generation links. If you choose to generate the enterprise WSDL, you should be taken to a page that has dropdown elements that let you select which package version to use. Try fiddling with those to see if you can get the fields to show up.
These are just guesses. If you find more info, let me know, and I can try to provide additional guidance.

How to delete all datastore in Google App Engine?

Does anyone know how to delete all datastore in Google App Engine?
If you're talking about the live datastore, open the dashboard for your app (login on appengine) then datastore --> dataviewer, select all the rows for the table you want to delete and hit the delete button (you'll have to do this for all your tables).
You can do the same programmatically through the remote_api (but I never used it).
If you're talking about the development datastore, you'll just have to delete the following file: "./WEB-INF/appengine-generated/local_db.bin". The file will be generated for you again next time you run the development server and you'll have a clear db.
Make sure to clean your project afterwards.
This is one of the little gotchas that come in handy when you start playing with the Google Application Engine. You'll find yourself persisting objects into the datastore then changing the JDO object model for your persistable entities ending up with obsolete data that'll make your app crash all over the place.
The best approach is the remote API method as suggested by Nick, he's an App Engine engineer from Google, so trust him.
It's not that difficult to do, and the latest 1.2.5 SDK provides the remote_shell_api.py out of the shelf. So go to download the new SDK. Then follow the steps:
connect remote server in your commandline: remote_shell_api.py yourapp /remote_api
The shell will ask for your login info, and if authorized, will make a Python shell for you. You need setup url handler for /remote_api in your app.yaml
fetch the entities you'd like to delete, the code looks something like:
from models import Entry
query = Entry.all(keys_only=True)
entries =query.fetch(1000)
db.delete(entries)
\# This could bulk delete 1000 entities a time
Update 2013-10-28:
remote_shell_api.py has been replaced by remote_api_shell.py, and you should connect with remote_api_shell.py -s your_app_id.appspot.com, according to the documentation.
There is a new experimental feature Datastore Admin, after enabling it in app settings, you can bulk delete as well as backup your datastore through the web ui.
The fastest and efficient way to handle bulk delete on Datastore is by using the new mapper API announced on the latest Google I/O.
If your language of choice is Python, you just have to register your mapper in a mapreduce.yaml file and define a function like this:
from mapreduce import operation as op
def process(entity):
yield op.db.Delete(entity)
On Java you should have a look to this article that suggests a function like this:
#Override
public void map(Key key, Entity value, Context context) {
log.info("Adding key to deletion pool: " + key);
DatastoreMutationPool mutationPool = this.getAppEngineContext(context)
.getMutationPool();
mutationPool.delete(value.getKey());
}
EDIT:
Since SDK 1.3.8, there's a Datastore admin feature for this purpose
You can clear the development server datastore when you run the server:
/path/to/dev_appserver.py --clear_datastore=yes myapp
You can also abbreviate --clear_datastore with -c.
If you have a significant amount of data, you need to use a script to delete it. You can use remote_api to clear the datastore from the client side in a straightforward manner, though.
Here you go: Go to Datastore Admin, and then select the Entity type you want to delete and click Delete. Mapreduce will take care of deleting!
There are several ways you can use to remove entries from App Engine's Datastore:
First, think whether you really need to remove entries. This is expensive and it might be cheaper to not remove them.
You can delete all entries by hand using the Datastore Admin.
You can use the Remote API and remove entries interactively.
You can remove the entries programmatically using a couple lines of code.
You can remove them in bulk using Task Queues and Cursors.
Or you can use Mapreduce to get something more robust and fancier.
Each one of these methods is explained in the following blog post:
http://www.shiftedup.com/2015/03/28/how-to-bulk-delete-entries-in-app-engine-datastore
Hope it helps!
The zero-setup way to do this is to send an execute-arbitrary-code HTTP request to the admin service that your running app already, automatically, has:
import urllib
import urllib2
urllib2.urlopen('http://localhost:8080/_ah/admin/interactive/execute',
data = urllib.urlencode({'code' : 'from google.appengine.ext import db\n' +
'db.delete(db.Query())'}))
Source
I got this from http://code.google.com/appengine/articles/remote_api.html.
Create the Interactive Console
First, you need to define an interactive appenginge console. So, create a file called appengine_console.py and enter this:
#!/usr/bin/python
import code
import getpass
import sys
# These are for my OSX installation. Change it to match your google_appengine paths. sys.path.append("/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine")
sys.path.append("/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/yaml/lib")
from google.appengine.ext.remote_api import remote_api_stub
from google.appengine.ext import db
def auth_func():
return raw_input('Username:'), getpass.getpass('Password:')
if len(sys.argv) < 2:
print "Usage: %s app_id [host]" % (sys.argv[0],)
app_id = sys.argv[1]
if len(sys.argv) > 2:
host = sys.argv[2]
else:
host = '%s.appspot.com' % app_id
remote_api_stub.ConfigureRemoteDatastore(app_id, '/remote_api', auth_func, host)
code.interact('App Engine interactive console for %s' % (app_id,), None, locals())
Create the Mapper base class
Once that's in place, create this Mapper class. I just created a new file called utils.py and threw this:
class Mapper(object):
# Subclasses should replace this with a model class (eg, model.Person).
KIND = None
# Subclasses can replace this with a list of (property, value) tuples to filter by.
FILTERS = []
def map(self, entity):
"""Updates a single entity.
Implementers should return a tuple containing two iterables (to_update, to_delete).
"""
return ([], [])
def get_query(self):
"""Returns a query over the specified kind, with any appropriate filters applied."""
q = self.KIND.all()
for prop, value in self.FILTERS:
q.filter("%s =" % prop, value)
q.order("__key__")
return q
def run(self, batch_size=100):
"""Executes the map procedure over all matching entities."""
q = self.get_query()
entities = q.fetch(batch_size)
while entities:
to_put = []
to_delete = []
for entity in entities:
map_updates, map_deletes = self.map(entity)
to_put.extend(map_updates)
to_delete.extend(map_deletes)
if to_put:
db.put(to_put)
if to_delete:
db.delete(to_delete)
q = self.get_query()
q.filter("__key__ >", entities[-1].key())
entities = q.fetch(batch_size)
Mapper is supposed to be just an abstract class that allows you to iterate over every entity of a given kind, be it to extract their data, or to modify them and store the updated entities back to the datastore.
Run with it!
Now, start your appengine interactive console:
$python appengine_console.py <app_id_here>
That should start the interactive console. In it create a subclass of Model:
from utils import Mapper
# import your model class here
class MyModelDeleter(Mapper):
KIND = <model_name_here>
def map(self, entity):
return ([], [entity])
And, finally, run it (from you interactive console):
mapper = MyModelDeleter()
mapper.run()
That's it!
You can do it using the web interface. Login into your account, navigate with links on the left hand side. In Data Store management you have options to modify and delete data. Use respective options.
I've created an add-in panel that can be used with your deployed App Engine apps. It lists the kinds that are present in the datastore in a dropdown, and you can click a button to schedule "tasks" that delete all entities of a specific kind or simply everything. You can download it here:
http://code.google.com/p/jobfeed/wiki/Nuke
For Python, 1.3.8 includes an experimental admin built-in for this. They say: "enable the following builtin in your app.yaml file:"
builtins:
- datastore_admin: on
"Datastore delete is currently available only with the Python runtime. Java applications, however, can still take advantage of this feature by creating a non-default Python application version that enables Datastore Admin in the app.yaml. Native support for Java will be included in an upcoming release."
Open "Datastore Admin" for your application and enable Admin. Then all of your entities will be listed with check boxes. You can simply select the unwanted entites and delete them.
This is what you're looking for...
db.delete(Entry.all(keys_only=True))
Running a keys-only query is much faster than a full fetch, and your quota will take a smaller hit because keys-only queries are considered small ops.
Here's a link to an answer from Nick Johnson describing it further.
Below is an end-to-end REST API solution to truncating a table...
I setup a REST API to handle database transactions where routes are directly mapped through to the proper model/action. This can be called by entering the right url (example.com/inventory/truncate) and logging in.
Here's the route:
Route('/inventory/truncate', DataHandler, defaults={'_model':'Inventory', '_action':'truncate'})
Here's the handler:
class DataHandler(webapp2.RequestHandler):
#basic_auth
def delete(self, **defaults):
model = defaults.get('_model')
action = defaults.get('_action')
module = __import__('api.models', fromlist=[model])
model_instance = getattr(module, model)()
result = getattr(model_instance, action)()
It starts by loading the model dynamically (ie Inventory found under api.models), then calls the correct method (Inventory.truncate()) as specified in the action parameter.
The #basic_auth is a decorator/wrapper that provides authentication for sensitive operations (ie POST/DELETE). There's also an oAuth decorator available if you're concerned about security.
Finally, the action is called:
def truncate(self):
db.delete(Inventory.all(keys_only=True))
It looks like magic but it's actually very straightforward. The best part is, delete() can be re-used to handle deleting one-or-many results by adding another action to the model.
You can Delete All Datastore by deleting all Kinds One by One.
with google appengine dash board. Please follow these Steps.
Login to https://console.cloud.google.com/datastore/settings
Click Open Datastore Admin. (Enable it if not enabled.)
Select all Entities and press delete.(This Step run a map reduce job for deleting all selected Kinds.)
for more information see This image http://storage.googleapis.com/bnifsc/Screenshot%20from%202015-01-31%2023%3A58%3A41.png
If you have a lot of data, using the web interface could be time consuming. The App Engine Launcher utility lets you delete everything in one go with the 'Clear datastore on launch' checkbox. This utility is now available for both Windows and Mac (Python framework).
For the development server, instead of running the server through the google app engine launcher, you can run it from the terminal like:
dev_appserver.py --port=[portnumber] --clear_datastore=yes [nameofapplication]
ex: my application "reader" runs on port 15080. After modify the code and restart the server, I just run "dev_appserver.py --port=15080 --clear_datastore=yes reader".
It's good for me.
Adding answer about recent developments.
Google recently added datastore admin feature. You can backup, delete or copy your entities to another app using this console.
https://developers.google.com/appengine/docs/adminconsole/datastoreadmin#Deleting_Entities_in_Bulk
I often don't want to delete all the data store so I pull a clean copy of /war/WEB-INF/local_db.bin out source control. It may just be me but it seems even with the Dev Mode stopped I have to physically remove the file before pulling it. This is on Windows using the subversion plugin for Eclipse.
PHP variation:
import com.google.appengine.api.datastore.Query;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
define('DATASTORE_SERVICE', DatastoreServiceFactory::getDatastoreService());
function get_all($kind) {
$query = new Query($kind);
$prepared = DATASTORE_SERVICE->prepare($query);
return $prepared->asIterable();
}
function delete_all($kind, $amount = 0) {
if ($entities = get_all($kind)) {
$r = $t = 0;
$delete = array();
foreach ($entities as $entity) {
if ($r < 500) {
$delete[] = $entity->getKey();
} else {
DATASTORE_SERVICE->delete($delete);
$delete = array();
$r = -1;
}
$r++; $t++;
if ($amount && $amount < $t) break;
}
if ($delete) {
DATASTORE_SERVICE->delete($delete);
}
}
}
Yes it will take time and 30 sec. is a limit. I'm thinking to put an ajax app sample to automate beyond 30 sec.
for amodel in db.Model.__subclasses__():
dela=[]
print amodel
try:
m = amodel()
mq = m.all()
print mq.count()
for mw in mq:
dela.append(mw)
db.delete(dela)
#~ print len(dela)
except:
pass
If you're using ndb, the method that worked for me for clearing the datastore:
ndb.delete_multi(ndb.Query(default_options=ndb.QueryOptions(keys_only=True)))
For any datastore that's on app engine, rather than local, you can use the new Datastore API. Here's a primer for how to get started.
I wrote a script that deletes all non-built in entities. The API is changing pretty rapidly, so for reference, I cloned it at commit 990ab5c7f2063e8147bcc56ee222836fd3d6e15b
from gcloud import datastore
from gcloud.datastore import SCOPE
from gcloud.datastore.connection import Connection
from gcloud.datastore import query
from oauth2client import client
def get_connection():
client_email = 'XXXXXXXX#developer.gserviceaccount.com'
private_key_string = open('/path/to/yourfile.p12', 'rb').read()
svc_account_credentials = client.SignedJwtAssertionCredentials(
service_account_name=client_email,
private_key=private_key_string,
scope=SCOPE)
return Connection(credentials=svc_account_credentials)
def connect_to_dataset(dataset_id):
connection = get_connection()
datastore.set_default_connection(connection)
datastore.set_default_dataset_id(dataset_id)
if __name__ == "__main__":
connect_to_dataset(DATASET_NAME)
gae_entity_query = query.Query()
gae_entity_query.keys_only()
for entity in gae_entity_query.fetch():
if entity.kind[0] != '_':
print entity.kind
entity.key.delete()
continuing the idea of svpino it is wisdom to reuse records marked as delete. (his idea was not to remove, but mark as "deleted" unused records). little bit of cache/memcache to handle working copy and write only difference of states (before and after desired task) to datastore will make it better. for big tasks it is possible to write itermediate difference chunks to datastore to avoid data loss if memcache disappeared. to make it loss-proof it is possible to check integrity/existence of memcached results and restart task (or required part) to repeat missing computations. when data difference is written to datastore, required computations are discarded in queue.
other idea similar to map reduced is to shard entity kind to several different entity kinds, so it will be collected together and visible as single entity kind to final user. entries are only marked as "deleted". when "deleted" entries amount per shard overcomes some limit, "alive" entries are distributed between other shards, and this shard is closed forever and then deleted manually from dev console (guess at less cost) upd: seems no drop table at console, only delete record-by-record at regular price.
it is possible to delete by query by chunks large set of records without gae failing (at least works locally) with possibility to continue in next attempt when time is over:
qdelete.getFetchPlan().setFetchSize(100);
while (true)
{
long result = qdelete.deletePersistentAll(candidates);
LOG.log(Level.INFO, String.format("deleted: %d", result));
if (result <= 0)
break;
}
also sometimes it useful to make additional field in primary table instead of putting candidates (related records) into separate table. and yes, field may be unindexed/serialized array with little computation cost.
For all people that need a quick solution for the dev server (as time of writing in Feb. 2016):
Stop the dev server.
Delete the target directory.
Rebuild the project.
This will wipe all data from the datastore.
I was so frustrated about existing solutions for deleting all data in the live datastore that I created a small GAE app that can delete quite some amount of data within its 30 seconds.
How to install etc: https://github.com/xamde/xydra
For java
DatastoreService db = DatastoreServiceFactory.getDatastoreService();
List<Key> keys = new ArrayList<Key>();
for(Entity e : db.prepare(new Query().setKeysOnly()).asIterable())
keys.add(e.getKey());
db.delete(keys);
Works well in Development Server
You have 2 simple ways,
#1: To save cost, delete the entire project
#2: using ts-datastore-orm:
https://www.npmjs.com/package/ts-datastore-orm
await Entity.truncate();
The truncate can delete around 1K rows per seconds
Here's how I did this naively from a vanilla Google Cloud Shell (no GAE) with python3:
from google.cloud import datastore
client = datastore.Client()
query.keys_only()
for counter, entity in enumerate(query.fetch()):
if entity.kind.startswith('_'): # skip reserved kinds
continue
print(f"{counter}: {entity.key}")
client.delete(entity.key)
This takes a very long time even with a relatively small amount of keys but it works.
More info about the Python client library: https://googleapis.dev/python/datastore/latest/client.html
As of 2022, there are two ways to delete a kind from a (largeish) datastore to the best of my knowledge. Google recommends using a Dataflow template. The template will basically pull each entity one by one subject to a GQL query, and then delete it. Interestingly, if you are deleting a large number of rows (> 10m), you will run into datastore troubles; as it will fail to provide enough capacity, and your operations to the datastore will start timing out. However, only the kind you are mass deleting from will be effected.
If you have less than 10m rows, you can just use this go script:
import (
"cloud.google.com/go/datastore"
"context"
"fmt"
"google.golang.org/api/option"
"log"
"strings"
"sync"
"time"
)
const (
batchSize = 10000 // number of keys to get in a single batch
deleteBatchSize = 500 // number of keys to delete in a single batch
projectID = "name-of-your-GCP-project"
serviceAccount = "path-to-sa-file"
table = "kind-to-delete"
)
func min(a, b int) int {
if a < b {
return a
}
return b
}
func deleteBatch(table string) int {
ctx := context.Background()
client, err := datastore.NewClient(ctx, projectID, option.WithCredentialsFile(serviceAccount))
if err != nil {
log.Fatalf("Failed to open client: %v", err)
}
defer client.Close()
query := datastore.NewQuery(table).KeysOnly().Limit(batchSize)
keys, err := client.GetAll(ctx, query, nil)
if err != nil {
fmt.Printf("%s Failed to get %d keys : %v\n", table, batchSize, err)
return -1
}
var wg sync.WaitGroup
for i := 0; i < len(keys); i += deleteBatchSize {
wg.Add(1)
go func(i int) {
batch := keys[i : i+min(len(keys)-i, deleteBatchSize)]
if err := client.DeleteMulti(ctx, batch); err != nil {
// not a big problem, we'll get them next time ;)
fmt.Printf("%s Failed to delete multi: %v", table, err)
}
wg.Done()
}(i)
}
wg.Wait()
return len(keys)
}
func main() {
var globalStartTime = time.Now()
fmt.Printf("Deleting \033[1m%s\033[0m\n", table)
for {
startTime := time.Now()
count := deleteBatch(table)
if count >= 0 {
rate := float64(count) / time.Since(startTime).Seconds()
fmt.Printf("Deleted %d keys from %s in %.2fs, rate %.2f keys/s\n", count, table, time.Since(startTime).Seconds(), rate)
if count == 0 {
fmt.Printf("%s is now clear.\n", table)
break
}
} else {
fmt.Printf("Retrying after short cooldown\n")
time.Sleep(10 * time.Second)
}
}
fmt.Printf("Total time taken %s.\n", time.Since(globalStartTime))
}

Resources