I am creating the bulkloader.yaml automatically from my existing schema and have trouble downloading my data due the repeated=True of my KeyProperty.
class User(ndb.Model):
firstname = ndb.StringProperty()
friends = ndb.KeyProperty(kind='User', repeated=True)
The automatic created bulkloader looks like this:
- kind: User
connector: csv
connector_options:
# TODO: Add connector options here--these are specific to each connector.
property_map:
- property: __key__
external_name: key
export_transform: transform.key_id_or_name_as_string
- property: firstname
external_name: firstname
# Type: String Stats: 2 properties of this type in this kind.
- property: friends
external_name: friends
# Type: Key Stats: 2 properties of this type in this kind.
import_transform: transform.create_foreign_key('User')
export_transform: transform.key_id_or_name_as_string
This is the error message I am getting:
google.appengine.ext.bulkload.bulkloader_errors.ErrorOnTransform: Error on transform. Property: friends External Name: friends. Code: transform.key_id_or_name_as_string Details: 'list' object has no attribute 'to_path'
What can I do please?
Possible Solution:
After Tony's tip I came up with this:
- property: friends
external_name: friends
# Type: Key Stats: 2 properties of this type in this kind.
import_transform: myfriends.stringToValue(';')
export_transform: myfriends.valueToString(';')
myfriends.py
def valueToString(delimiter):
def key_list_to_string(value):
keyStringList = []
if value == '' or value is None or value == []:
return None
for val in value:
keyStringList.append(transform.key_id_or_name_as_string(val))
return delimiter.join(keyStringList)
return key_list_to_string
And this works! The encoding is in Unicode though: UTF-8. Make sure to open the file in LibreOffice as such or you would see garbled content.
The biggest challenge is import. This is what I came up with without any luck:
def stringToValue(delimiter):
def string_to_key_list(value):
keyvalueList = []
if value == '' or value is None or value == []:
return None
for val in value.split(';'):
keyvalueList.append(transform.create_foreign_key('User'))
return keyvalueList
return string_to_key_list
I get the error message:
BadValueError: Unsupported type for property friends: <type 'function'>
According to Datastore viewer, I need to create something like this:
[datastore_types.Key.from_path(u'User', u'kave#gmail.com', _app=u's~myapp1')]
Update 2:
Tony you are to be a real expert in Bulkloader. Thanks for your help. Your solution worked!
I have moved my other question to a new thread.
But one crucial problem that appears is that, when I create new users I can see my friends field shown as <missing> and it works fine.
Now when I use your solution to upload the data, I see for those users without any friend entries a <null> entry. Unfortunately this seems to break the model since friends can't be null.
Changing the model to reflect this, seems to be ignored.
friends = ndb.KeyProperty(kind='User', repeated=True, required=False)
How can I fix this please?
update:
digging further into it:
when the status <missing> is shown in the data viewer, in code it shows friends = []
However when I upload the data via csv I get a <null>, which translates to friends = [None]. I know this, because I exported the data into my local data storage and could follow it in code. Strangely enough if I empty the list del user.friends[:], it works as expected. There must be a beter way to set it while uploading via csv though...
Final Solution
This turns out to be a bug that hasn't been resolved since over one year.
In a nutshell, even though there is no value in csv, because a list is expected, gae makes a list with a None inside. This is game breaking, since retrieval of such a model ends up in an instant crash.
Adding a post_import_function, which deletes the lists with a None inside.
In my case:
def post_import(input_dict, instance, bulkload_state_copy):
if instance["friends"] is None:
del instance["friends"]
return instance
Finally everything works as expected.
When you are using repeated properties and exporting to a CSV, you should be doing some formatting to concatenate the list into a CSV understood format. Please check the example here on import/export of list of dates and hope it can help you.
EDIT : Adding suggestion for import transform from an earlier comment to this answer
For import, please try something like:
`from google.appengine.api import datastore
def stringToValue(delimiter):
def string_to_key_list(value):
keyvalueList = []
if value == '' or value is None or value == []: return None
for val in value.split(';'):
keyvalueList.append(datastore.Key.from_path('User', val))
return keyvalueList
return string_to_key_list`
if you have id instead of name , add like val = int(val)
Related
Suppose I have the following Model:
class myClassObj(models.Model):
flag1 = models.NullBooleanField()
flag2 = models.BooleanField()
Now also suppose I want the Database to enforce the following constraint:
flag1 should be None if and only if flag2 is false
How can I write the constraints in this model so that this condition is checked any time a myClassObj is created or edited? I see some interesting information here. But I don't see how to specify an "iff" constraint as I described above.
The Django documentation recommends doing custom validation where access to multiple fields is required by overriding Model.clean().
This example from the documentation show how it's possible to validate that a news article still in the "draft" phase does not have a publication date.
def clean(self):
import datetime
from django.core.exceptions import ValidationError
# Don't allow draft entries to have a pub_date.
if self.status == 'draft' and self.pub_date is not None:
raise ValidationError('Draft entries may not have a publication date.')
# Set the pub_date for published items if it hasn't been set already.
if self.status == 'published' and self.pub_date is None:
self.pub_date = datetime.date.today()
For more detailed information, see the full reference here: https://docs.djangoproject.com/en/dev/ref/models/instances/#validating-objects
To have this called every time you save the object you'll also need to override the save method: https://docs.djangoproject.com/en/dev/topics/db/models/#overriding-model-methods.
Another useful reference for other use cases if you only need to validate a single field is writing custom validators: https://docs.djangoproject.com/en/dev/ref/validators/
I've been using appcfg.py to upload_data pretty successfully, but I'm not sure how to set up the import transform in bulkloader.yaml for repeated properties or how to structure the CSV. For example:
In a post model that looks like this:
class Post(models.Model):
tags = ndb.StringProperty(repeated=True)
and a bulkloader.yaml looks like this:
transformers:
- kind: Post
connector: csv
property_map:
- property: __key__
external_name: key
export_transform: transform.key_id_or_name_as_string
- property: tags
external_name: tags
import_transform: ???
is import_transform the right API to register for this? Or is there some other way to do this?
I've tried a two step approach that seems to work using the import_transform. First create a module (essentially a custom transform file), let's say bulkmodify.py. Then in bulkmodify define a transform converting the incoming value to a list
def list_convert(value):
output=[value]
return output
Then in your bulkloader.yaml file specify the import transform for your repeated property:
import_transform: bulkmodify.list_convert
Also don't forget to include your module in the import list at the top of your bulkloader.yaml file.
- import: bulkmodify
In my input CSV the data is structured with multiple quotes so the bulkloader brings them in as a single property with multiple listed values
key,"""tag1"",""tag2"",""tag3""", property3, etc.
I was looking at Where are the reference pages of the Google App Engine bulkloader transform? and figured out most of my bulkloader.yaml configuration with the exception of one case.
One of my Kinds 'Product' has a property called site. If present this is a deep key for a Customer Kind and a Site kind. Now the problem I am having is with the non_if_empty. In the below case it will not ever create the deep key. It always comes back none. If I remove the transform.none_if_empty it will fail as my input file has empty entires for some of these values. How can I make this work? How can I use none_if_empty with create_deep_key
- property: site
external_name: site
export_transform: transform.key_id_or_name_as_string
import_transform: transform.none_if_empty(transform.create_deep_key(('Customer', 'siteCustomer', True),
('Site', 'siteId', True)))
export:
- external_name: siteCustomer
export_transform: transform.key_id_or_name_as_string_n(0)
- external_name: siteId
export_transform: transform.key_id_or_name_as_string_n(1)
Product Bulkloader File Example
name,siteCustomer,siteId
first,,
second,1,1
That should be
import_transform: transform.none_if_empty(transform.create_deep_key(
('Customer','siteCustomer', True),
('Site', transform.CURRENT_PROPERTY, True)))
Essentially, refer to the current property's import value as transform.CURRENT_PROPERTY.
So I still don't know what I am missing here but my work around is thus:
from google.appengine.ext.bulkload import transform
def create_deep_key(*path_info):
f = transform.create_deep_key(*path_info)
def create_deep_key_lambda(value, bulkload_state):
try:
return f(value, bulkload_state)
except:
return None
return create_deep_key_lambda
django nonrel's documentation states: "you have to manually write code for merging the results of multiple queries (JOINs, select_related(), etc.)".
Can someone point me to any snippets that manually add the related data? #nickjohnson has an excellent post showing how to do this with the straight AppEngine models, but I'm using django-nonrel.
For my particular use I'm trying to get the UserProfiles with their related User models. This should be just two simple queries, then match the data.
However, using django-nonrel, a new query gets fired off for each result in the queryset. How can I get access to the related items in a 'select_related' sort of way?
I've tried this, but it doesn't seem to work as I'd expect. Looking at the rpc stats, it still seems to be firing a query for each item displayed.
all_profiles = UserProfile.objects.all()
user_pks = set()
for profile in all_profiles:
user_pks.add(profile.user_id) # a way to access the pk without triggering the query
users = User.objects.filter(pk__in=user_pks)
for profile in all_profiles:
profile.user = get_matching_model(profile.user_id, users)
def get_matching_model(key, queryset):
"""Generator expression to get the next match for a given key"""
try:
return (model for model in queryset if model.pk == key).next()
except StopIteration:
return None
UPDATE:
Ick... I figured out what my issue was.
I was trying to improve the efficiency of the changelist_view in the django admin. It seemed that the select_related logic above was still producing additional queries for each row in the results set when a foreign key was in my 'display_list'. However, I traced it down to something different. The above logic does not produce multiple queries (but if you more closely mimic Nick Johnson's way it will look a lot prettier).
The issue is that in django.contrib.admin.views.main on line 117 inside the ChangeList method there is the following code: result_list = self.query_set._clone(). So, even though I was properly overriding the queryset in the admin and selecting the related stuff, this method was triggering a clone of the queryset which does NOT keep the attributes on the model that I had added for my 'select related', resulting in an even more inefficient page load than when I started.
Not sure what to do about it yet, but the code that selects related stuff is just fine.
I don't like answering my own question, but the answer might help others.
Here is my solution that will get related items on a queryset based entirely on Nick Johnson's solution linked above.
from collections import defaultdict
def get_with_related(queryset, *attrs):
"""
Adds related attributes to a queryset in a more efficient way
than simply triggering the new query on access at runtime.
attrs must be valid either foreign keys or one to one fields on the queryset model
"""
# Makes a list of the entity and related attribute to grab for all possibilities
fields = [(model, attr) for model in queryset for attr in attrs]
# we'll need to make one query for each related attribute because
# I don't know how to get everything at once. So, we make a list
# of the attribute to fetch and pks to fetch.
ref_keys = defaultdict(list)
for model, attr in fields:
ref_keys[attr].append(get_value_for_datastore(model, attr))
# now make the actual queries for each attribute and store the results
# in a dict of {pk: model} for easy matching later
ref_models = {}
for attr, pk_vals in ref_keys.items():
related_queryset = queryset.model._meta.get_field(attr).rel.to.objects.filter(pk__in=set(pk_vals))
ref_models[attr] = dict((x.pk, x) for x in related_queryset)
# Finally put related items on their models
for model, attr in fields:
setattr(model, attr, ref_models[attr].get(get_value_for_datastore(model, attr)))
return queryset
def get_value_for_datastore(model, attr):
"""
Django's foreign key fields all have attributes 'field_id' where
you can access the pk of the related field without grabbing the
actual value.
"""
return getattr(model, attr + '_id')
To be able to modify the queryset on the admin to make use of the select related we have to jump through a couple hoops. Here is what I've done. The only thing changed on the 'get_results' method of the 'AppEngineRelatedChangeList' is that I removed the self.query_set._clone() and just used self.query_set instead.
class UserProfileAdmin(admin.ModelAdmin):
list_display = ('username', 'user', 'paid')
select_related_fields = ['user']
def get_changelist(self, request, **kwargs):
return AppEngineRelatedChangeList
class AppEngineRelatedChangeList(ChangeList):
def get_query_set(self):
qs = super(AppEngineRelatedChangeList, self).get_query_set()
related_fields = getattr(self.model_admin, 'select_related_fields', [])
return get_with_related(qs, *related_fields)
def get_results(self, request):
paginator = self.model_admin.get_paginator(request, self.query_set, self.list_per_page)
# Get the number of objects, with admin filters applied.
result_count = paginator.count
# Get the total number of objects, with no admin filters applied.
# Perform a slight optimization: Check to see whether any filters were
# given. If not, use paginator.hits to calculate the number of objects,
# because we've already done paginator.hits and the value is cached.
if not self.query_set.query.where:
full_result_count = result_count
else:
full_result_count = self.root_query_set.count()
can_show_all = result_count self.list_per_page
# Get the list of objects to display on this page.
if (self.show_all and can_show_all) or not multi_page:
result_list = self.query_set
else:
try:
result_list = paginator.page(self.page_num+1).object_list
except InvalidPage:
raise IncorrectLookupParameters
self.result_count = result_count
self.full_result_count = full_result_count
self.result_list = result_list
self.can_show_all = can_show_all
self.multi_page = multi_page
self.paginator = paginator
I just have a hunch about this. But if feels like I'm doing it the wrong way. What I want to do is to have a db.StringProperty() as a unique identifier. I have a simple db.Model, with property name and file. If I add another entry with the same "name" as one already in the db.Model I want to update this.
As of know I look it up with:
template = Templates.all().filter('name = ', name)
Check if it's one entry already:
if template.count() > 0:
Then add it or update it. But from what I've read .count() is every expensive in CPU usage.
Is there away to set the "name" property to be unique and the datastore will automatic update it or another better way to do this?
..fredrik
You can't make a property unique in the App Engine datastore. What you can do instead is to specify a key name for your model, which is guaranteed to be unique - see the docs for details.
I was having the same problem and came up with the following answer as the simplest one :
class Car(db.Model):
name = db.StringProperty(required=True)
def __init__(self,*args, **kwargs):
super(Car, self).__init__(*args, **kwargs)
loadingAnExistingCar = ("key" in kwargs.keys() or "key_name" in kwargs.keys())
if not loadingAnExistingCar:
self.__makeSureTheCarsNameIsUnique(kwargs['name'])
def __makeSureTheCarsNameIsUnique(self, name):
existingCarWithTheSameName = Car.GetByName(name)
if existingCarWithTheSameName:
raise UniqueConstraintValidationException("Car should be unique by name")
#staticmethod
def GetByName(name):
return Car.all().filter("name", name).get()
It's important to not that I first check if we are loading an existing entity first.
For the complete solution : http://nicholaslemay.blogspot.com/2010/07/app-engine-unique-constraint.html
You can just try to get your entity and edit it, and if not found create a new one:
template = Templates.gql('WHERE name = :1', name)
if template is None:
template = Templates()
# do your thing to set the entity's properties
template.put()
That way it will insert a new entry when it wasn't found, and if it was found it will update the existing entry with the changes you made (see documentation here).
An alternative solution is to create a model to store the unique values, and store it transationally using a combination of Model.property_name.value as key. Only if that value is created you save your actual model. This solution is described (with code) here:
http://squeeville.com/2009/01/30/add-a-unique-constraint-to-google-app-engine/
I agree with Nick. But, if you do ever want to check for model/entity existence based on a property, the get() method is handy:
template = Templates.all().filter('name = ', name).get()
if template is None:
# doesn't exist
else:
# exists
I wrote some code to do this. The idea for it is to be pretty easy to use. So you can do this:
if register_property_value('User', 'username', 'sexy_bbw_vixen'):
return 'Successfully registered sexy_bbw_vixen as your username!'
else:
return 'The username sexy_bbw_vixen is already in use.'
This is the code. There are a lot of comments, but its actually only a few lines:
# This entity type is a registry. It doesn't hold any data, but
# each entity is keyed to an Entity_type-Property_name-Property-value
# this allows for a transaction to 'register' a property value. It returns
# 'False' if the property value is already in use, and thus cannot be used
# again. Or 'True' if the property value was not in use and was successfully
# 'registered'
class M_Property_Value_Register(db.Expando):
pass
# This is the transaction. It returns 'False' if the value is already
# in use, or 'True' if the property value was successfully registered.
def _register_property_value_txn(in_key_name):
entity = M_Property_Value_Register.get_by_key_name(in_key_name)
if entity is not None:
return False
entity = M_Property_Value_Register(key_name=in_key_name)
entity.put()
return True
# This is the function that is called by your code, it constructs a key value
# from your Model-Property-Property-value trio and then runs a transaction
# that attempts to register the new property value. It returns 'True' if the
# value was successfully registered. Or 'False' if the value was already in use.
def register_property_value(model_name, property_name, property_value):
key_name = model_name + '_' + property_name + '_' + property_value
return db.run_in_transaction(_register_property_value_txn, key_name )