Peewee: Add a column and define order within schema using `AFTER`

Peewee: Add a column and define order within schema using `AFTER` - peewee

I'd like to use playhouse migrator to make changes to my db schema.
I'd like to add a column to a database but with AFTER in sql statement so that I may define the column order in the table.
Is this possible with Peewee/Playhouse migrator?
Thanks in advance!

There is no trigger support in Peewee. In 2015 the author stated that
I do not plan on supporting triggers at this time.
However, Peewee has "Signal support".
from playhouse.signals import Model, post_save
class MyModel(Model):
data = IntegerField()
#post_save(sender=MyModel)
def on_save_handler(model_class, instance, created):
put_data_in_cache(instance.data)
Perhaps, this could be used as a replacement.

Unfortunately the schema migrator does not support the AFTER clause. You are left with subclassing the relevant migrator class, or using a custom field-class and implementing a ddl() method on the field, which includes the AFTER portion.

You can extend the field with your custom ones and override the sort_key to a large number to ensure they are always pushed to the end.
This is definitely not the best way, but it works.
class dbCustomDateTime(dbDateTime):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Defining the sort key to ensure that even if this is used in a baseclass,
# this column will go to the end of the table
self._sort_key = 100, 100

Related

Instantiate new data on migration

Suppose I have a model Person. Now I create a new model:
class Ranking(models.Model):
person = models.ForeignKey(Person)
score= models.IntegerField(null=False, default= 100)
date_created = models.DateTimeField(auto_now_add=True)
The thing is, I want each Person to have at least one Ranking, so on creation of new Person objects I can just create a new Ranking for each object.
What I don't know is how to create a new default Ranking instance for each of the existing Person objects in the db?
In a django script it would look as simple as something like:
for person in people:
Ranking(person=person).save()
Is there a way to add that code to the south forward migration file? Is there a better way of solving this problem? Any ideas?

First, auto generate the migration python manage.py schemamigration myapp --auto.
Then find something that resembles the following in the migration file (presumably myapp/migrations/00xx_auto__add__ranking.py):
def forwards(self, orm):
# Adding model 'Ranking'
db.create_table(u'myapp_ranking', (
(u'id', self.gf('django.db.models.fields.AutoField')(primary_key=True)),
('person', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['myapp.Person'])),
('score', self.gf('django.db.models.fields.IntegerField')(default=100)),
('date_created', self.gf('django.db.models.fields.DateTimeField')(auto_now_add=True, blank=True)),
))
db.send_create_signal(u'myapp', ['Ranking'])
After this, insert something like the following:
# Create 'blank' Ranking entries for all extant Person objects
for person in orm.Person.objects.all():
orm.Ranking.objects.create(person=person)
Other approaches include splitting this into three migrations (this is better say if you have a large dataset in a production environment):
add the model, with the person field not required
add a separate data migration (python manage.py datamigration myapp), and insert into it code to do what I suggested above.
run the two migrations above (allow this to take time if necessary)
change the person field to be once it's all populated, and run this final migration
The South docs have something along these lines. There's also a similar question here that might give insight.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks

Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn

A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

NDB projection & caching questions

I have a couple of doubts regarding how NDB projection queries work and how the caching behaves behind the scenes
So given a model similar to:
class Users(ndb.Model):
user_name = ndb.StringProperty(required=True)
user_email = ndb.StringProperty(required=True)
user_password = ndb.StringProperty(required=True)
#classmethod # THIS ONE DOES NOT WORK
def get_profile_info(cls, id):
return ndb.Key(Users, id).get(projection=[Users.user_name])
#classmethod # THIS ONE WORKS
def get_profile_info(cls, id):
return Users.query(Users.key == ndb.Key(Users, id)).get(projection=[Users.user_name])
Why does the first classmethod raise a "TypeError: Unknown configuration option ('projection')"? Can't I simply call a projection on a direct get of a key, instead of having to query for a key?
Secondly, regarding caching, I'm not sure if I correctly understood this thread: NDB Caching When Using Projected Queries
Aren't projected queries cached? Does this mean its better to simply call a get() (and fetch the whole instance) so it is cached, instead of projecting?
Thanks in advance!

As per the error a projection makes no sense when using get. From the docs " It only gets values for those properties in the projection. It gets this data from the query index (and thus, properties in the projection must be indexed)". So doing get isn't accessing the object properties via indexes. Note gvr's comment on caching in your referenced question.

How to create tables in multiple databases using Django models.py

I would like to create table "A" in one database (assume in SQL Server 2008) and another table in "B" (My SQL) using models.py through Django.
Both the tables structures "A" and "B" may differ. I have verified that through router.py can achieve this.
I want to do it without "router.py" file.
Could anyone guide me on this please.
Thanks,
Shiva.

Try to use super in save() method.
When you save, it will launch others commands like write in other database.
For example :
You have :
class Chair(models.Model) :
name = models.Charfield(max_length=30)
You can implement save() def like that :
class Chair(models.Model) :
name = models.Charfield(max_length=30)
def save(self, *args, **kwargs):
<your cmd here>
super(Chair, self).save(*args, **kwargs)
It could seems impossible like that, but you can use arguments for make anything you want.

Prevent Django from updating identity column in MSSQL

I'm working with a legacy DB in MSSQL. We have a table that has two columns that are causing me problems:
class Emp(models.Model):
empid = models.IntegerField(_("Unique ID"), unique=True, db_column=u'EMPID')
ssn = models.CharField(_("Social security number"), max_length=10, primary_key=True, db_column=u'SSN') # Field name made lowercase.
So the table has the ssn column as primary key and the relevant part of the SQL-update code generated by django is this:
UPDATE [EMP] SET [EMPID] = 399,
.........
WHERE [EMP].[SSN] = 2509882579
The problem is that EMP.EMPID is an identity field in MSSQL and thus pyodbc throws this error whenever I try to save changes to an existing employee:
ProgrammingError: ('42000', "[42000] [Microsoft][SQL Native Client][SQL Server]C
annot update identity column 'EMPID'. (8102) (SQLExecDirectW); [42000] [Microsof
t][SQL Native Client][SQL Server]Statement(s) could not be prepared. (8180)")
Having the EMP.EMPID as identity is not crucial to anything the program, so dropping it by creating a temporary column and copying, deleting, renaming seems like the logical thing to do. This creates one extra step in transferring old customers into Django, so my question is, is there any way to prevent Django from generating the '[EMPID] = XXX' snippet whenever I'm doing an update on this table?
EDIT
I've patched my model up like this:
def save(self, *args, **kwargs):
if self.empid:
self._meta.local_fields = [f for f in self._meta.local_fields if f.name != 'empid']
super().save(*args, **kwargs)
This works, taking advantage of the way Django populates it's sql-sentence in django/db/models/base.py (525). If anyone has a better way or can explain why this is bad practice I'd be happy to hear it!

This question is old and Sindri found a workable solution, but I wanted to provide a solution that I've been using in production for a few years that doesn't require mucking around in _meta.
I had to write a web application that integrated with an existing business database containing many computed fields. These fields, usually computing the status of the record, are used with almost every object access across the entire application and Django had to be able to work with them.
These types of fields are workable with a model manager that adds the required fields on to the query with an extra(select=...).
ComputedFieldsManager code snippet: https://gist.github.com/manfre/8284698
class Emp(models.Model):
ssn = models.CharField(_("Social security number"), max_length=10, primary_key=True, db_column=u'SSN') # Field name made lowercase.
objects = ComputedFieldsManager(computed_fields=['empid'])
# the empid is added on to the model instance
Emp.objects.all()[0].empid
# you can also search on the computed field
Emp.objects.all().computed_field_in('empid', [1234])

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Peewee: Add a column and define order within schema using `AFTER` - peewee

I'd like to use playhouse migrator to make changes to my db schema. I'd like to add a column to a database but with AFTER in sql statement so that I may define the column order in the table. Is this possible with Peewee/Playhouse migrator? Thanks in advance!

Unfortunately the schema migrator does not support the AFTER clause. You are left with subclassing the relevant migrator class, or using a custom field-class and implementing a ddl() method on the field, which includes the AFTER portion.

Related

Instantiate new data on migration

Django Query Optimisation

NDB projection & caching questions

How to create tables in multiple databases using Django models.py

Prevent Django from updating identity column in MSSQL

Categories

Resources