Prevent Django from updating identity column in MSSQL - sql-server

I'm working with a legacy DB in MSSQL. We have a table that has two columns that are causing me problems:
class Emp(models.Model):
empid = models.IntegerField(_("Unique ID"), unique=True, db_column=u'EMPID')
ssn = models.CharField(_("Social security number"), max_length=10, primary_key=True, db_column=u'SSN') # Field name made lowercase.
So the table has the ssn column as primary key and the relevant part of the SQL-update code generated by django is this:
UPDATE [EMP] SET [EMPID] = 399,
.........
WHERE [EMP].[SSN] = 2509882579
The problem is that EMP.EMPID is an identity field in MSSQL and thus pyodbc throws this error whenever I try to save changes to an existing employee:
ProgrammingError: ('42000', "[42000] [Microsoft][SQL Native Client][SQL Server]C
annot update identity column 'EMPID'. (8102) (SQLExecDirectW); [42000] [Microsof
t][SQL Native Client][SQL Server]Statement(s) could not be prepared. (8180)")
Having the EMP.EMPID as identity is not crucial to anything the program, so dropping it by creating a temporary column and copying, deleting, renaming seems like the logical thing to do. This creates one extra step in transferring old customers into Django, so my question is, is there any way to prevent Django from generating the '[EMPID] = XXX' snippet whenever I'm doing an update on this table?
EDIT
I've patched my model up like this:
def save(self, *args, **kwargs):
if self.empid:
self._meta.local_fields = [f for f in self._meta.local_fields if f.name != 'empid']
super().save(*args, **kwargs)
This works, taking advantage of the way Django populates it's sql-sentence in django/db/models/base.py (525). If anyone has a better way or can explain why this is bad practice I'd be happy to hear it!

This question is old and Sindri found a workable solution, but I wanted to provide a solution that I've been using in production for a few years that doesn't require mucking around in _meta.
I had to write a web application that integrated with an existing business database containing many computed fields. These fields, usually computing the status of the record, are used with almost every object access across the entire application and Django had to be able to work with them.
These types of fields are workable with a model manager that adds the required fields on to the query with an extra(select=...).
ComputedFieldsManager code snippet: https://gist.github.com/manfre/8284698
class Emp(models.Model):
ssn = models.CharField(_("Social security number"), max_length=10, primary_key=True, db_column=u'SSN') # Field name made lowercase.
objects = ComputedFieldsManager(computed_fields=['empid'])
# the empid is added on to the model instance
Emp.objects.all()[0].empid
# you can also search on the computed field
Emp.objects.all().computed_field_in('empid', [1234])

Related

pandas to_sql in django: insert foreign key into DB

Is there a way to insert foreign keys when using pandas to_sql function?
I am processing uploaded Consultations (n=40k) with pandas in django, before adding them to the database (postgres). I got this working row by row, but that takes 15 to 20 minutes. This is longer than I want my users to wait, so I am looking for a more efficient solution.
I tried pandas to_sql, but I cannot figure out how to add the two foreign key relations as columns to my consultations dataframe before calling the to_sql function. Is there a way to add the Patient and Praktijk foreign keys as a column in the consultations dataframe?
More specifically, when inserting row by row, I use objects of type Patient or Praktijk when creating new consultations in the database. In a dataframe however, I cannot use these types, and therefore don't know how I could add the foreign keys correctly. Is there possibly a value of type object or int (a patient's id?) which can substitute a value of type Patient, and thereby set the foreign key?
The Consultation model:
class Consultation(models.Model):
# the foreign keys
patient = models.ForeignKey(Patient, on_delete=models.CASCADE, null=True, blank=True)
praktijk = models.ForeignKey(Praktijk, on_delete=models.CASCADE, default='')
# other fields which do not give trouble with to_sql
patient_nr = models.IntegerField(blank=True, null=True)
# etc
The to_sql call:
consultations.to_sql(Consult._meta.db_table, engine, if_exists='append', index=False, chunksize=10000)
If above is not possible, any hints towards another more efficient solution?
I had same problem and this is how I solved it. My answer isn't as straight forward but I trust it helps.
Inspect your django project to be sure of two things:
Target table name
Table column names
In My case, I use class Meta when defining django models to use explicit name (django has a way of automatically naming tables). I will use django tutorial project to illustrate.
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')
class Meta:
db_table = "poll_questions"
class Choice(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
class Meta:
db_table = "question_choices"
Note: Django references Question foreign key in the database using pk of the Question object.
Assume I have a Question pk 1, and a dataframe df that I wish to update Question choices with. My df must look like one below if using pandas to batch insert into database!
import pandas as pd
df = pd.DataFrame(
{
"question": [1, 1, 1, 1, 1],
"choice_text": [
"First Question",
"Second Question",
"Third Question",
"Fourth Question",
"Fifth Question"
],
"votes":[5,3,10,1,13]
}
)
I wish I could write the df as a table. Too bad that SO doesn't support usual markdown for tables
Nonetheless, we have our df next step is to create database connection for inserting the records.
from django.conf import settings
from sqlalchemy import create_engine
# load database settings from django
user = settings.DATABASES['default']['USER']
passwd = settings.DATABASES['default']['PASSWORD']
dbname = settings.DATABASES['default']['NAME']
# create database connection string
conn = 'postgresql://{user}:{passwd}#localhost:5432/{dbname}'.format(
user=user,
passwd=passwd,
dbname=dbname
)
# actual database connection object.
conn = create_engine(conn, echo=False)
# write df into db
df.to_sql("question_choices", con=conn, if_exists="append", index=False, chunksize=500, method="multi")
Voila!
We are done!
Note:
django supports bulk-create which, however, isn't what you asked for.
I ran into a similar problem using SQLalchemy but I found a simple workaround.
What I did is defined the database schema the way I wanted with SQLalchemy (with all the datatypes and foreign keys I needed) and then created an empty table, then I simply changed the if_exists parameter to append.
This will append all the data to an empty database.

Peewee: Add a column and define order within schema using `AFTER`

I'd like to use playhouse migrator to make changes to my db schema.
I'd like to add a column to a database but with AFTER in sql statement so that I may define the column order in the table.
Is this possible with Peewee/Playhouse migrator?
Thanks in advance!
There is no trigger support in Peewee. In 2015 the author stated that
I do not plan on supporting triggers at this time.
However, Peewee has "Signal support".
from playhouse.signals import Model, post_save
class MyModel(Model):
data = IntegerField()
#post_save(sender=MyModel)
def on_save_handler(model_class, instance, created):
put_data_in_cache(instance.data)
Perhaps, this could be used as a replacement.
Unfortunately the schema migrator does not support the AFTER clause. You are left with subclassing the relevant migrator class, or using a custom field-class and implementing a ddl() method on the field, which includes the AFTER portion.
You can extend the field with your custom ones and override the sort_key to a large number to ensure they are always pushed to the end.
This is definitely not the best way, but it works.
class dbCustomDateTime(dbDateTime):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Defining the sort key to ensure that even if this is used in a baseclass,
# this column will go to the end of the table
self._sort_key = 100, 100

conversion of dash in column name

I have to work with a database containing columns with a dash in their name, as for example a-name. When converting the table with peewee, it converts it to an illegal character, with python complaining about a misplaced operator.
For a table with 2 columns, id and a-name, the result would be
from peewee import *
database = MySQLDatabase('databasename', **{'password': 'pwd', 'host': 'ip', 'user': 'username'})
class BaseModel(Model):
class Meta:
database = database
class ATable(BaseModel):
id = PrimaryKeyField()
a-name = CharField()
class Meta:
db_table = 'aTable'
I found a temporary workaround by changing the dash to an underscore and using the optional parameter db_column, like
a_name = CharField(db_column='a-name')
Is there another possibility for this issue as I do not want to do manual changes everytime I download the models from the database server?
I should add that I have no control over the database server, I have merely an account with read-only permissions.
Greetings,
Luc
a_name = CharField(db_column='a-name')
This is the correct way to solve the problem. Python does not allow dashes in identifiers, so if your column uses them then specify the column name explicitly and use a nice name for the column.
I suppose you could look into modifying the playhouse.reflection.Introspector.make_column_name method, as well.

Performance issue with django exclude

I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).
I have the following models:
class Branch(models.Model):
name = models.CharField(max_length=30)
startTime = models.DateTimeField()
class Device(models.Model):
uid = models.CharField(max_length=100, primary_key=True)
type = models.CharField(max_length=20)
firstSeen = models.DateTimeField()
lastSeen = models.DateTimeField()
class Session(models.Model):
device = models.ForeignKey(Device)
branch = models.ForeignKey(Branch)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:
sessionCount = Session.objects.filter(branch=branch)
.exclude(device__in=badDevices)
.filter(end__gte=F('start')+timedelta(minutes=30)).count()
badDevices is a pre-filled list of device ids with around 60 items.
badDevices = ['id-1', 'id-2', ...]
This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.
I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.
This is the generated SQL:
SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end]
FROM [session]
WHERE ([session].[branch_id] = my-branch-id AND
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND
DATEPART(dw, [session].[start]) = 1
AND [session].[end] IS NOT NULL AND
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))
So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?
The general issue seems to be that django is doing some extra work to prepare the exclude clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.
In your case, one thing that might be causing this is some kind of pre-processing of badDevices. If, for instance, badDevices is a QuerySet then django might be executing the badDevices query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device has a non-default primary key.
The other thing might delay the SQL preparation is of course django-pyodbc-azure. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.
This is all wild speculation though, so if you're still having this issue then post the Device and Branch models as well, the exact content of badDevices and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.
EDIT: I think it must be the Device.uid field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:
Replace device__in with device_id__in, device__pk__in and device__uid__in and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacing branch with branch_id, just in case.
If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:
# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])
# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])
If neither works, then most likely the problem is with the whole query and not just exclude. There are some more options in that case but try the above first and I will update my answer later if necessary.
Just want to share a similar problem that I had with MySQL and exclude clauses performance and how it was fixed.
When running the exclude clause, the list with the "in" lookup was actually a Queryset that I got using values_list method. Checking the exclude query executed by MySQL, the "in" objects were not values but actually another query. This behavior was impacting performance on specific large queries.
To fix that, instead of passing the queryset, I flat it out in a python list of values. By doing that, each value is passed as an argument inside the in lookup and the performance was really improved.

Django profile id may not not null using get_or_create, how does it relate to the db?

I followed a bit the steps on Django User Profiles - Simple yet powerful.
Not quite the same because I am in the middle of developing the idea.
From this site I used in particular, also this line:
User.profile = property(lambda u:
UserProfile.objects.get_or_create(user=u)[0])
I was getting always an error message on creating the object, typically
"XX" may not be null. I solved part of the problems by playing with models
and (in my present case) sqliteman. Till I got the same
message on the id: "xxx.id may not be null".
On the net I found a description of a possible solution which involved doing a reset
of the database, which I was not that happy to do. In particular because for the
different solutions, it might have involved the reset of the application db.
But because the UserProfile model was kinda new and till now empty,
I played with it on the DB directly and made an hand made drop of the table and
ask syncdb to rebuilt it. (kinda risky thought).
Now this is the diff of the sqlite dump:
294,298c290,294
< CREATE TABLE "myt_userdata" (
< "id" integer NOT NULL PRIMARY KEY,
< "user_id" integer NOT NULL UNIQUE REFERENCES "auth_user" ("id"),
< "url" varchar(200),
< "birthday" datetime
---
> CREATE TABLE myt_userdata (
> "id" INTEGER NOT NULL,
> "user_id" INTEGER NOT NULL,
> "url" VARCHAR(200),
> "birthday" DATETIME
Please note that both versions are generated by django. The ">" version was generated with a simple model definition which had indeed the connection with the user table via:
user = models.ForeignKey(User, unique=True)
The new "<" version has much more information and it is working.
My question:
Why Django complains about an myt_userdata.id may not be null?
The subsidiary question:
Does Django tries to relate to the underline db structure, how?
(for example the not NULL message comes from the model or from the DB?)
The additional question:
I have been a bit reluctant to the use south: Too complicated, additional modules
which I might have to care between devel and production and maybe not that easy
if I want to switch DB engine (I am using sqlite only at devel stage, I plan to move to
mysql).
Probably south might have worked in this case. Would it work? would you suggest its use
anyway?
Edited FIY:
This is my last model (the working one):
class UserData(models.Model):
user = models.ForeignKey(User, unique=True)
url = models.URLField("Website", blank=True, null=True)
birthday = models.DateTimeField('Birthday', blank=True, null=True)
def __unicode__(self):
return self.user.username
User.profile = property(lambda u: UserData.objects.get_or_create(user=u,defaults={'birthday': '1970-01-01 00:00:00'})[0])
Why Django complains about an myt_userdata.id may not be null?
Because id is not a primary key and is not populated automatically though. Also, you don't provide it on model creation, so DB does not know what to do.
Does Django tries to relate to the underline db structure, how? (for example the not NULL message comes from the model or from the DB?)
It's an error from DB, not from Django.
You can use sql command to understan what exactly is executed on syncdb. Variant above seems to be correct table definition made from correct Django model, and I have no ide how have you got a variant below. Write a correct and clear model, and you'll get correct and working table scheme after syncdb

Resources