Django - would these query sets be cached? - database

class UnassignedThread(models.Manager):
def get_queryset(self):
return super(UnassignedThread,
self).get_queryset().filter(
_irc_name__isnull=True)
Would results = ThreadVault.unassigned_threads.all() be cached? I am not certain if _isnull=True counts as being a evaluated(since the evaluation causes the cache).
Also, if have a model called ThreadVault, and I want to look up if threads #777 and #888 exist in the database, which way is the best to utilize cache to do the look up?
ThreadVault.objects.get(thread_id="777")
ThreadVault.objects.get(thread_id="888")
or
results = ThreadVault.objects.all()
for ticket in results:
if ticket.thread_id == "777" or ticket.thread_id == "888":
do something

No, querysets are lazy until they are sliced or iterated. filter simply adds conditions to the query, but does not evaluate it.
For your second question, neither of these are great, although the first is vastly preferable to the second (which involves loading and iterating through every object in the table). Instead, you should use exists() in conjunction with an __in filter:
ThreadVault.objects.filter(thread_id__in=["777", "888"].exists()
Neither of these questions has anything to do with caching.

th_ids = ["777","888"]
ThreadVault.objects.filter(thread_id__in=th_ids).exists()
for caching your view
from django.views.decorators.cache import cache_page
#cache_page(60 * 15)
def my_view(request):

Related

Is it possible to bulk load an NDB child Entity in GAE?

At some point in the future I may need to bulk load migration data (i.e. from a CSV). Has anyone had exceptions raised doing the following? Also is there any change in behaviour if the ndb.put_multi() function is used?
from google.appengine.ext import ndb
while True:
if not id:
break
id, name = read_csv_row(readline())
x = X(parent=ndb.Key('Y','static_id')
x.id, x.name = id, name
x.put()
class X(ndb.Model):
id = StringProperty()
name = StringProperty()
class Y(ndb.Model):
pass
def read_csv_row(line):
"""returns tuple"""
From my research and thanks to comments it seems that the code above (where it made into valid code) create problems which would eventually lead to google.appengine.api.datastore_errors.Timeout exceptions being thrown.
See another question:
Datastore write limit tests - trying to break app engine, but it won´t break ;)
The best suggestion I have so far is to use a Task Queue to to rate limit this. More information on:
blog.notdot.net/tag/deferred

How to disable BadValueError (required field) value in Google App Engine during scanning all records?

I want to scan all records to check if there is not errors inside data.
How can I disable BadValueError to no break scan on lack of required field?
Consider that I can not change StringProperty to not required and such properties can be tenths in real code - so such workaround is not useful?
class A(db.Model):
x = db.StringProperty(required = True)
for instance in A.all():
# check something
if something(instance):
instance.delete()
Can I use some function to read datastore.Entity directly to avoid such problems with not need validation?
The solution I found for this problem was to use a resilient query, it ignores any exception thrown by a query, you can try this:
def resilient_query(query):
query_iter = iter(query)
while True:
next_result = query_iter.next()
#check something
yield next_result
except Exception, e:
next_result.delete()
query = resilient_query(A.query())
If you use ndb, you can load all your models as an ndb.Expando, then modify the values. This doesn't appear to be possible in db because you cannot specify a kind for a Query in db that differs from your model class.
Even though your model is defined in db, you can still use ndb to fix your entities:
# Setup a new ndb connection with ndb.Expando as the default model.
conn = ndb.make_connection(default_model=ndb.Expando)
# Use this connection in our context.
ndb.set_context(ndb.make_context(conn=conn))
# Query for all A kinds
for a in ndb.Query(kind='A'):
if a.x is None:
a.x = 'A more appropriate value.'
# Re-put the broken entity.
a.put()
Also note that this (and other solutions listed) will be subject to whatever time limits you are restricted to (i.e. 60 seconds on an App Engine frontend). If you are dealing with large amounts of data you will most likely want to write a custom map reduce job to do this.
Try setting a default property option to some distinct value that does not exist otherwise.
class A(db.Model):
x = db.StringProperty(required = True, default = <distinct value>)
Then load properties and check for this value.
you can override the _check_initialized(self) method of ndb.Model in your own Model subclass and replace the default logic with your own logic (or skip altogether as needed).

Datastore: Is it possible to only save to memcache when using ndb API?

dear all
Currently I'm using ndb API to store some statistic information. Unfortunately, this becomes the major source of my cost. I'm thinking it should be much cheaper if I only save them to memcache. It doesn't matter if data is lost due to cache expire.
After read the manual, I assume _use_datastore class variable can be used to configure this behaviour:
class StaticModel(ndb.Model):
_use_datastore = False
userid = ndb.StringProperty()
created_at = ndb.DateTimeProperty(auto_now_add=True)
May I know if above statement is the right solution?
Cheers!
I think there are three ways to achieve what you want.
The first is to set _use_datastore = False on the NDB model class as per your question.
The second would be to pass use_datastore=False whenever you put / get / delete a StaticModel. An example would be:
model = StaticModel(userid="foo")
key = model.put(use_datastore=False)
n = key.get(use_datastore=False)
The third option would be to set a datastore policy in the NDB Context which returns false for any StaticModel keys. Something like:
context.set_datastore_policy(lambda key: True if key.kind() == 'StaticModel' else False)

Fetch query from Magento database -- mysql_num_rows

What function is equal to mysql_num_rows in Magento?
For Magento, the proper equivalent is PHP's count() function.
Why?
Magento usually uses Varien_Data_Collection instances to fetch result sets containing multiple records. Varien implements the Lazy Load Pattern for these collections, that is, no result set will be fetched before you really need it.
If you take a look at the Varien_Data_Collection class, you'll see, that this class does implement PHP's Countable interface and the proper count() method for this interface:
class Varien_Data_Collection implements IteratorAggregate, Countable
{
:
public function count()
{
$this->load();
return count($this->_items);
}
:
}
If you're asking yourself now, what got lazy loading to do with counting records, then you need to know that querying a collection the usual Magento way, e.g. like this:
$collection = Mage::getModel('catalog/product')
->getCollection()
->addFieldToFilter(
'status',
Mage_Catalog_Model_Product_Status::STATUS_ENABLED
);
does not fetch the result set at all. But, how do you count records of a result set which hasn't been fetched yet? Right, you can't. And neither can mysql_num_rows. It fetches the result set first.
Now, when you call count() on the collection, e.g. by
$n = count($collection);
PHP's core count() function will detect that the passed argument $collection implements a Countable interface and has its own count() method defined, so it will call that one.
This leads to really fetching the result set* and storing it to $this->_items, which finally allows counting the records and return the number.
* In Magento you can also call foreach ($collection as $product) to really fetch the result set, but that's another story.

parallel code execution python2.7 ndb

in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.
The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.

Resources