Are model save() methods lazy in django?
For instance, at what line in the following code sample will django hit the database?
my_model = MyModel()
my_model.name = 'Jeff Atwood'
my_model.save()
# Some code that is independent of my_model...
model_id = model_instance.id
print (model_id)
It does not make much sense to have a lazy save, does it? Django's QuerySets are lazy, the model's save method is not.
From the django source:
django/db/models/base.py, lines 424–437:
def save(self, force_insert=False, force_update=False, using=None):
"""
Saves the current instance. Override this in a subclass if you want to
control the saving process.
The 'force_insert' and 'force_update' parameters can be used to insist
that the "save" must be an SQL insert or update (or equivalent for
non-SQL backends), respectively. Normally, they should not be set.
"""
if force_insert and force_update:
raise ValueError("Cannot force both insert and updating in \
model saving.")
self.save_base(using=using, force_insert=force_insert,
force_update=force_update)
save.alters_data = True
Then, save_base does the heavy lifting (same file, lines 439–545):
...
transaction.commit_unless_managed(using=using)
...
And in django/db/transaction.py, lines 167–178, you'll find:
def commit_unless_managed(using=None):
"""
Commits changes if the system is not in managed transaction mode.
"""
...
P.S. All line numbers apply to django version (1, 3, 0, 'alpha', 0).
Related
I am new to Rails and am currently learning strong parameters in Rails 4 and following the below example from the official documentation:
`class PeopleController < ActionController::Base
# Using "Person.create(params[:person])" would raise an
# ActiveModel::ForbiddenAttributes exception because it'd
# be using mass assignment without an explicit permit step.
# This is the recommended form:
def create
Person.create(person_params)
end
# This will pass with flying colors as long as there's a person key in the
# parameters, otherwise it'll raise an ActionController::MissingParameter
# exception, which will get caught by ActionController::Base and turned
# into a 400 Bad Request reply.
def update
redirect_to current_account.people.find(params[:id]).tap { |person|
person.update!(person_params)
}
end
private
# Using a private method to encapsulate the permissible parameters is
# just a good pattern since you'll be able to reuse the same permit
# list between create and update. Also, you can specialize this method
# with per-user checking of permissible attributes.
def person_params
params.require(:person).permit(:name, :age)
end
end`
Question 1:
What does current_account.people.find mean inside the update method?
Question 2:
Could someone please explain the person_params method. What is "params" inside the person_params method?
current_account is a most likely a private method that returns an Account instance. current_account.people.find(params[:id]) searches the people table for a person that belongs to the current_account and has an ID of params[:id]. Object#tap is a ruby method that yields a block with the current object, and then returns that object. In this case, the Person instance is updated inside the block and the returned from tap. Finally, redirect_to is a controller method that will redirect the request to a different path. redirect_to can take many different types of arguments, including an ActiveRecord model, a string, or a symbol. Passing it an ActiveRecord model will redirect the request to the model's resource path, which is defined in routes.rb. In this case, that path will most likely be /people/:id.
The params object is a hash containing parameter names and values. For example, the request /people?name=Joe&age=34 will result in the following params object: {name: 'Joe', age: '34'}.
I want to scan all records to check if there is not errors inside data.
How can I disable BadValueError to no break scan on lack of required field?
Consider that I can not change StringProperty to not required and such properties can be tenths in real code - so such workaround is not useful?
class A(db.Model):
x = db.StringProperty(required = True)
for instance in A.all():
# check something
if something(instance):
instance.delete()
Can I use some function to read datastore.Entity directly to avoid such problems with not need validation?
The solution I found for this problem was to use a resilient query, it ignores any exception thrown by a query, you can try this:
def resilient_query(query):
query_iter = iter(query)
while True:
next_result = query_iter.next()
#check something
yield next_result
except Exception, e:
next_result.delete()
query = resilient_query(A.query())
If you use ndb, you can load all your models as an ndb.Expando, then modify the values. This doesn't appear to be possible in db because you cannot specify a kind for a Query in db that differs from your model class.
Even though your model is defined in db, you can still use ndb to fix your entities:
# Setup a new ndb connection with ndb.Expando as the default model.
conn = ndb.make_connection(default_model=ndb.Expando)
# Use this connection in our context.
ndb.set_context(ndb.make_context(conn=conn))
# Query for all A kinds
for a in ndb.Query(kind='A'):
if a.x is None:
a.x = 'A more appropriate value.'
# Re-put the broken entity.
a.put()
Also note that this (and other solutions listed) will be subject to whatever time limits you are restricted to (i.e. 60 seconds on an App Engine frontend). If you are dealing with large amounts of data you will most likely want to write a custom map reduce job to do this.
Try setting a default property option to some distinct value that does not exist otherwise.
class A(db.Model):
x = db.StringProperty(required = True, default = <distinct value>)
Then load properties and check for this value.
you can override the _check_initialized(self) method of ndb.Model in your own Model subclass and replace the default logic with your own logic (or skip altogether as needed).
I hope I didn't miss any topic that could answer my problem. I'm here now because I'm terribly frustrated and tired with the following task:
- I have a Spreasheet with Drive.Google with lots of data in it
- I would like to create an application with wxPython that would pull data from this spreeadsheet (in the most easy way possible)
- Would also like to get multiple data from a user who will access this application through a nice interface (panel aka window)
- The multiple data introduced by the user should be able to work with the data pulled out from the Spreasheet. For example to see if the data introduced by the user is in the Spreadsheet or not and also some other operations with the next data introduced by the user.
- Finally and most importantly show the results to the user (later I would also like to add some functions to save somehow the results)
I hope I managed to express clearly what I would like to do. Now I'm new to Google API's, Python adn wxPython, but I have experience with C++ , php, html .
I've spent 2 weeks now with discovering Drive.Google and learning Python and wxPython. I did follow all tuturials on these, made my notes, read tones of stackoverflow questions-answers, wiki.wxpython.org etc. I learn every single day and I can do now many things separately but to have all functions like I described above I just couldn't work out how to do. At least please orient me in the direction. Awfel lot of times I spend hours doing examples and getting nowhere. I have Python, wxPython extention, GoogleAppEngine Launcher and even pyCharm demo. Please be kind. This is my first question ever.
here's the mess I made so far combining relevant examples:
import wx
import gdata.docs
import gdata.docs.service
import gdata.spreadsheet.service
import re, os
class Form(wx.Panel):
def __init__(self, *args, **kwargs):
super(Form, self).__init__(*args, **kwargs)
self.createControls()
self.bindEvents()
self.doLayout()
self.spreasht()
def createControls(self):
self.logger = wx.TextCtrl(self, style=wx.TE_MULTILINE|wx.TE_READONLY)
self.saveButton = wx.Button(self, label="Elvegzes")
self.nameLabel = wx.StaticText(self, label="type Name1:")
self.nameTextCtrl = wx.TextCtrl(self, value="type here")
self.name2Label = wx.StaticText(self, label="type Name2:")
self.name2TextCtrl = wx.TextCtrl(self, value="type here")
def bindEvents(self):
for control, event, handler in \
[(self.saveButton, wx.EVT_BUTTON, self.onSave),
(self.nameTextCtrl, wx.EVT_TEXT, self.onNameEntered),
(self.nameTextCtrl, wx.EVT_CHAR, self.onNameChanged)]:
control.Bind(event, handler)
def doLayout(self):
raise NotImplementedError
def spreadsht(self):
gd_client = gdata.spreadsheet.service.SpreadsheetsService()
gd_client.email = 'my email address'
gd_client.password = 'my password to it'
gd_client.source = 'payne.org-example-1'
gd_client.ProgrammaticLogin()
q = gdata.spreadsheet.service.DocumentQuery()
q['title'] = 'stationcenter'
q['title-exact'] = 'true'
feed = gd_client.GetSpreadsheetsFeed(query=q)
spreadsheet_id = feed.entry[0].id.text.rsplit('/',1)[1]
feed = gd_client.GetWorksheetsFeed(spreadsheet_id)
worksheet_id = feed.entry[0].id.text.rsplit('/',1)[1]
al1 = raw_input('Name1: ')
print al1
al2 = raw_input('Name2: ')
print al2
rows = gd_client.GetListFeed(spreadsheet_id, worksheet_id).entry
for row in rows:
for key in row.custom:
if al1 == row.custom[key].text:
print ' %s: %s' % (key, row.custom[key].text)
def onColorchanged(self, event):
self.__log('User wants color: %s'%self.colors[event.GetInt()])
def onReferrerEntered(self, event):
self.__log('User entered referrer: %s'%event.GetString())
def onSave(self,event):
self.__log('User clicked on button with id %d'%event.GetId())
def onNameEntered(self, event):
self.__log('User entered name: %s'%event.GetString())
def onNameChanged(self, event):
self.__log('User typed character: %d'%event.GetKeyCode())
event.Skip()
def onInsuranceChanged(self, event):
self.__log('User wants insurance: %s'%bool(event.Checked()))
# Helper method(s):
def __log(self, message):
''' Private method to append a string to the logger text
control. '''
self.logger.AppendText('%s\n'%message)
class FormWithSizer(Form):
def doLayout(self):
''' Layout the controls by means of sizers. '''
boxSizer = wx.BoxSizer(orient=wx.HORIZONTAL)
gridSizer = wx.FlexGridSizer(rows=5, cols=2, vgap=10, hgap=10)
# Prepare some reusable arguments for calling sizer.Add():
expandOption = dict(flag=wx.EXPAND)
noOptions = dict()
emptySpace = ((0, 0), noOptions)
# Add the controls to the sizers:
for control, options in \
[(self.nameLabel, noOptions),
(self.nameTextCtrl, expandOption),
(self.allomas2Label, noOptions),
(self.allomas2TextCtrl, expandOption),
emptySpace,
(self.saveButton, dict(flag=wx.ALIGN_CENTER))]:
gridSizer.Add(control, **options)
for control, options in \
[(gridSizer, dict(border=5, flag=wx.ALL)),
(self.logger, dict(border=5, flag=wx.ALL|wx.EXPAND,
proportion=1))]:
boxSizer.Add(control, **options)
self.SetSizerAndFit(boxSizer)
class FrameWithForms(wx.Frame):
def __init__(self, *args, **kwargs):
super(FrameWithForms, self).__init__(*args, **kwargs)
notebook = wx.Notebook(self)
form2 = FormWithSizer(notebook)
notebook.AddPage(form2, 'CALTH')
self.SetClientSize(notebook.GetBestSize())
if __name__ == '__main__':
app = wx.App(0)
frame = FrameWithForms(None, title='Relevant title˝')
frame.Show()
app.MainLoop()
THANK YOU AGAIN!!!!!!!!!!!
First, make sure you can download the data you want with just Python. Then create a wxPython GUI with a single button. In that button's handler, have it call the script that can download the data you want.
If that causes your GUI to become unresponsive, then you'll need to use a thread to do the downloading. I recommend the following articles if that's the case:
http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
http://wiki.wxpython.org/LongRunningTasks
Okay, so now you have the data downloading appropriately. Now you add a grid widget or a listctrl / object list view widget. Pick one of those. I prefer object list view, which you can read about here. Then in your button handler you can call your downloader script or thread and when that's done, you can load the widget with that data. If you're using a thread, then the thread will have to call the next step (i.e. the widget loading bit).
Now you should have your data displayed. All that's left is making it look pretty and maybe putting the downloading part into a menu item.
in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.
The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.
Currently my application caches models in memcache like this:
memcache.set("somekey", aModel)
But Nicks' post at http://blog.notdot.net/2009/9/Efficient-model-memcaching suggests that first converting it to protobuffers is a lot more efficient. But after running some tests I found out it's indeed smaller in size, but actually slower (~10%).
Do others have the same experience or am I doing something wrong?
Test results: http://1.latest.sofatest.appspot.com/?times=1000
import pickle
import time
import uuid
from google.appengine.ext import webapp
from google.appengine.ext import db
from google.appengine.ext.webapp import util
from google.appengine.datastore import entity_pb
from google.appengine.api import memcache
class Person(db.Model):
name = db.StringProperty()
times = 10000
class MainHandler(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
m = Person(name='Koen Bok')
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, m)
r = memcache.get(key)
self.response.out.write('Pickle took: %.2f' % (time.time() - t1))
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, db.model_to_protobuf(m).Encode())
r = db.model_from_protobuf(entity_pb.EntityProto(memcache.get(key)))
self.response.out.write('Proto took: %.2f' % (time.time() - t1))
def main():
application = webapp.WSGIApplication([('/', MainHandler)], debug=True)
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
The Memcache call still pickles the object with or without using protobuf. Pickle is faster with a protobuf object since it has a very simple model
Plain pickle objects are larger than protobuf+pickle objects, hence they save time on Memcache, but there is more processor time in doing the protobuf conversion
Therefore in general either method works out about the same...but
The reason you should use protobuf is it can handle changes between versions of the models, whereas Pickle will error. This problem will bite you one day, so best to handle it sooner
Both pickle and protobufs are slow in App Engine since they're implemented in pure Python. I've found that writing my own, simple serialization code using methods like str.join tends to be faster since most of the work is done in C. But that only works for simple datatypes.
One way to do it more quickly is to turn your model into a dictionary and use the native eval / repr function as your (de)serializers -- with caution of course, as always with the evil eval, but it should be safe here given that there is no external step.
Below an example of a class Fake_entity implementing exactly that.
You first create your dictionary through fake = Fake_entity(entity) then you can simply store your data via memcache.set(key, fake.serialize()). The serialize() is a simple call to the native dictionary method of repr, with some additions if you need (e.g. add an identifier at the beginning of the string).
To fetch it back, simply use fake = Fake_entity(memcache.get(key)). The Fake_entity object is a simple dictionary whose keys are also accessible as attributes. You can access your entity properties normally, except referenceProperties give keys instead of fetching the object (which is actually quite useful). You can also get() the actual entity with fake.get(), or more interestigly, change it and then save with fake.put().
It does not work with lists (if you fetch multiple entities from a query), but could be easily be adjusted with join/split functions using an identifier like '### FAKE MODEL ENTITY ###' as the separator. Use with db.Model only, would need small adjustments for Expando.
class Fake_entity(dict):
def __init__(self, record):
# simple case: a string, we eval it to rebuild our fake entity
if isinstance(record, basestring):
import datetime # <----- put all relevant eval imports here
from google.appengine.api import datastore_types
self.update( eval(record) ) # careful with external sources, eval is evil
return None
# serious case: we build the instance from the actual entity
for prop_name, prop_ref in record.__class__.properties().items():
self[prop_name] = prop_ref.get_value_for_datastore(record) # to avoid fetching entities
self['_cls'] = record.__class__.__module__ + '.' + record.__class__.__name__
try:
self['key'] = str(record.key())
except Exception: # the key may not exist if the entity has not been stored
pass
def __getattr__(self, k):
return self[k]
def __setattr__(self, k, v):
self[k] = v
def key(self):
from google.appengine.ext import db
return db.Key(self['key'])
def get(self):
from google.appengine.ext import db
return db.get(self['key'])
def put(self):
_cls = self.pop('_cls') # gets and removes the class name form the passed arguments
# import xxxxxxx ---> put your model imports here if necessary
Cls = eval(_cls) # make sure that your models declarations are in the scope here
real_entity = Cls(**self) # creates the entity
real_entity.put() # self explanatory
self['_cls'] = _cls # puts back the class name afterwards
return real_entity
def serialize(self):
return '### FAKE MODEL ENTITY ###\n' + repr(self)
# or simply repr, but I use the initial identifier to test and eval directly when getting from memcache
I would welcome speed tests on this, I would assume this is quite faster than the other approaches. Plus, you do not have any risks if your models have changed somehow in the meantime.
Below an example of what the serialized fake entity looks like. Take a particular look at datetime (created) as well as reference properties (subdomain) :
### FAKE MODEL ENTITY ###
{'status': u'admin', 'session_expiry': None, 'first_name': u'Louis', 'last_name': u'Le Sieur', 'modified_by': None, 'password_hash': u'a9993e364706816aba3e25717000000000000000', 'language': u'fr', 'created': datetime.datetime(2010, 7, 18, 21, 50, 11, 750000), 'modified': None, 'created_by': None, 'email': u'chou#glou.bou', 'key': 'agdqZXJlZ2xlcgwLEgVMb2dpbhjmAQw', 'session_ref': None, '_cls': 'models.Login', 'groups': [], 'email___password_hash': u'chou#glou.bou+a9993e364706816aba3e25717000000000000000', 'subdomain': datastore_types.Key.from_path(u'Subdomain', 229L, _app=u'jeregle'), 'permitted': [], 'permissions': []}
Personally I also use static variables (faster than memcache) to cache my entities in the short term, and fetch the datastore when the server has changed or its memory has been flushed for some reason (which happens quite often in fact).