Django "bulk_save" and "bulk_update" - database

UPDATE: ADDED A BOUNTY. PLEASE PROVIDE AN EXAMPLE AND I WILL ACCEPT THE BEST ANSWER
UPDATE 2: Explicit example now included
Carrying on from the same project, where I asked about bulk_create in a separate thread.
I was wondering if there is a way to essentially "bulk_save" - insert if non-existent or simply update if it already exists.
For example:
class Person(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
height = models.DecimalField(blank=True, null=True)
weight = models.DecimalField(blank=True, null=True)
I have a list of dictionaries with key-value pairs for these fields. I would like to filter by name, and then update the height and/or weight as these my players are still growing and conditioning. If there is no easy way to "bulk_save", a bulk update would also be helpful.
Reference: June 8, 2012 - "get_or_create()" patch at django project
Bulk_update reference

I just did a variation of the update_many function listed below I seem to have improved speeds tremendously already.
http://people.iola.dk/olau/python/bulkops.py
UPDATE - apparently DSE2 is also an option.
https://bitbucket.org/weholt/dse2
Will update with speed tests tomorrow.

Related

Django Queryset Construction and Update

I have a model with 2 foreign keys to the same model.
class Team(models.Model):
name= models.CharField()
class Game(models.Model):
team_1= models.ForeignKey(Team)
team_2= models.ForeignKey(Team)
date = models.DateTimeField()
team_1_post_game_rating = models.DecimalField()
team_2_post_game_rating = models.DecimalField()
If there is a result, a calculation is done updating the rating based on the result and the rankings of both teams. No problems so far, except if a result is edited after other games have been played.
What i need to be able to do (In the most efficient way possible) is find all the games and update the rankings for all teams that were played by either team subsequent to the game that was edited and any team that those teams played and so on.
I could probably do it using a sub query and values list and iterating on the results but of course Id rather find a nice clean way to construct it dynamically and get away without my database going into meltdown.

Instantiate new data on migration

Suppose I have a model Person. Now I create a new model:
class Ranking(models.Model):
person = models.ForeignKey(Person)
score= models.IntegerField(null=False, default= 100)
date_created = models.DateTimeField(auto_now_add=True)
The thing is, I want each Person to have at least one Ranking, so on creation of new Person objects I can just create a new Ranking for each object.
What I don't know is how to create a new default Ranking instance for each of the existing Person objects in the db?
In a django script it would look as simple as something like:
for person in people:
Ranking(person=person).save()
Is there a way to add that code to the south forward migration file? Is there a better way of solving this problem? Any ideas?
First, auto generate the migration python manage.py schemamigration myapp --auto.
Then find something that resembles the following in the migration file (presumably myapp/migrations/00xx_auto__add__ranking.py):
def forwards(self, orm):
# Adding model 'Ranking'
db.create_table(u'myapp_ranking', (
(u'id', self.gf('django.db.models.fields.AutoField')(primary_key=True)),
('person', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['myapp.Person'])),
('score', self.gf('django.db.models.fields.IntegerField')(default=100)),
('date_created', self.gf('django.db.models.fields.DateTimeField')(auto_now_add=True, blank=True)),
))
db.send_create_signal(u'myapp', ['Ranking'])
After this, insert something like the following:
# Create 'blank' Ranking entries for all extant Person objects
for person in orm.Person.objects.all():
orm.Ranking.objects.create(person=person)
Other approaches include splitting this into three migrations (this is better say if you have a large dataset in a production environment):
add the model, with the person field not required
add a separate data migration (python manage.py datamigration myapp), and insert into it code to do what I suggested above.
run the two migrations above (allow this to take time if necessary)
change the person field to be once it's all populated, and run this final migration
The South docs have something along these lines. There's also a similar question here that might give insight.

Django profile id may not not null using get_or_create, how does it relate to the db?

I followed a bit the steps on Django User Profiles - Simple yet powerful.
Not quite the same because I am in the middle of developing the idea.
From this site I used in particular, also this line:
User.profile = property(lambda u:
UserProfile.objects.get_or_create(user=u)[0])
I was getting always an error message on creating the object, typically
"XX" may not be null. I solved part of the problems by playing with models
and (in my present case) sqliteman. Till I got the same
message on the id: "xxx.id may not be null".
On the net I found a description of a possible solution which involved doing a reset
of the database, which I was not that happy to do. In particular because for the
different solutions, it might have involved the reset of the application db.
But because the UserProfile model was kinda new and till now empty,
I played with it on the DB directly and made an hand made drop of the table and
ask syncdb to rebuilt it. (kinda risky thought).
Now this is the diff of the sqlite dump:
294,298c290,294
< CREATE TABLE "myt_userdata" (
< "id" integer NOT NULL PRIMARY KEY,
< "user_id" integer NOT NULL UNIQUE REFERENCES "auth_user" ("id"),
< "url" varchar(200),
< "birthday" datetime
---
> CREATE TABLE myt_userdata (
> "id" INTEGER NOT NULL,
> "user_id" INTEGER NOT NULL,
> "url" VARCHAR(200),
> "birthday" DATETIME
Please note that both versions are generated by django. The ">" version was generated with a simple model definition which had indeed the connection with the user table via:
user = models.ForeignKey(User, unique=True)
The new "<" version has much more information and it is working.
My question:
Why Django complains about an myt_userdata.id may not be null?
The subsidiary question:
Does Django tries to relate to the underline db structure, how?
(for example the not NULL message comes from the model or from the DB?)
The additional question:
I have been a bit reluctant to the use south: Too complicated, additional modules
which I might have to care between devel and production and maybe not that easy
if I want to switch DB engine (I am using sqlite only at devel stage, I plan to move to
mysql).
Probably south might have worked in this case. Would it work? would you suggest its use
anyway?
Edited FIY:
This is my last model (the working one):
class UserData(models.Model):
user = models.ForeignKey(User, unique=True)
url = models.URLField("Website", blank=True, null=True)
birthday = models.DateTimeField('Birthday', blank=True, null=True)
def __unicode__(self):
return self.user.username
User.profile = property(lambda u: UserData.objects.get_or_create(user=u,defaults={'birthday': '1970-01-01 00:00:00'})[0])
Why Django complains about an myt_userdata.id may not be null?
Because id is not a primary key and is not populated automatically though. Also, you don't provide it on model creation, so DB does not know what to do.
Does Django tries to relate to the underline db structure, how? (for example the not NULL message comes from the model or from the DB?)
It's an error from DB, not from Django.
You can use sql command to understan what exactly is executed on syncdb. Variant above seems to be correct table definition made from correct Django model, and I have no ide how have you got a variant below. Write a correct and clear model, and you'll get correct and working table scheme after syncdb

Understanding ListProperty backend behavior in GAE

I'm trying to understand how you're supposed to access items in a GAE db.ListProperty(db.Key).
Example:
A Magazine db.Model entity has a db.ListProperty(db.Key) that contains 10 Article entities. I want to get the Magazine object and display the Article names and dates. Do I make 10 queries for the actual article objects? Do I do a batch query? What if there's 50 articles? (Don't batch queries rely on the IN operator, which is limited to 30 or fewer elements?)
So you are describing something like this:
class Magazine(db.Model):
ArticleList = db.ListProperty(db.Key)
class Article(db.Model):
ArticleName = db.StringProperty()
ArticleDate = db.DateProperty()
In this case the simplest way to grab the listed articles is to use the Model.get() method, which looks for a key list.
m = Magazine.get() #grab the first record
articles = Article.get(m.ArticleList) #get Articles using key list
for a in articles:
name = a.ArticleName
date = a.ArticleDate
#do something with this data
Depending on how you plan on working with the data you may be better off adding a Magazine reference property to your Article entities instead.
You need to read Modeling Entity Relationships, especially the part about one to many.

Why is my django bulk database population so slow and frequently failing?

I decided I'd like to use django's model system rather than coding raw SQL to interface with my database, but I am having a problem that surely is avoidable.
My models.py contains:
class Student(models.Model):
student_id = models.IntegerField(unique = True)
form = models.CharField(max_length = 10)
preferred = models.CharField(max_length = 70)
surname = models.CharField(max_length = 70)
and I'm populating it by looping through a list as follows:
from models import Student
for id, frm, pref, sname in large_list_of_data:
s = Student(student_id = id, form = frm, preferred = pref, surname = sname)
s.save()
I don't really want to be saving this to the database each time but I don't know another way to get django to not forget about it (I'd rather add all the rows and then do a single commit).
There are two problems with the code as it stands.
It's slow -- about 20 students get updated each second.
It doesn't even make it through large_list_of_data, instead throwing a DatabaseError saying "unable to open database file". (Possibly because I'm using sqlite3.)
My question is: How can I stop these two things from happening? I'm guessing that the root of both problems is that I've got the s.save() but I don't see a way of easily batching the students up and then saving them in one commit to the database.
So it seems I should have looked harder before posing the question.
Some solutions are described in this stackoverflow question (the winning answer is to use django.db.transaction.commit_manually) and also in this one on aggregating saves.
Other ideas for speeding up this type of operation are listed in this stackoverflow question.

Resources