change a db from a certain point in time, when the change doesn't fit the already existing data - database

I have a model that looks like this:
class Report(models.Model):
updater = models.CharField(max_length=15)
pub_date = models.DateTimeField(auto_add_now=True)
identifier = models.CharField(max_length=100)
... and so on...
There are some more fields but they are irrelevant to the question. Now the site has very simple functions - the users can see older reports and their data, and can edit them or add new ones.
However, the identifier field is actually an integer that symbolizes a log file that is being reported. Most of the times, each report has one log. But sometimes it has more than one. I did it as a CharField because I built the site to replace an older sharepoint 2003 website, where that field was treated as simple text. So I want that in my next version, it would be like it should be, i.e. like this:
class Report(models.Model):
updater = models.CharField(max_length=15)
pub_date = models.DateTimeField(auto_add_now=True)
... and so on...
class Log(models.Model):
report = models.ForeignKey(Report)
identifier = models.IntegerField()
The problem is, since in the old site that field was a CharField, people used this as they liked. Meaning, even if they updated various logs in the same report they just did it like this <logid1>, <logid2>. Sometimes they added some text <logid1> which is related to <logid2>.
So I want to change this, but I don't want to lose all the old data, and I can't fix all those edge cases (the DB contains around 22 thousand reports). I thought about adding this to report:
def disp_id(self):
if self.pub_date < ... #the day I'll do the update
return self.identifier
else:
return ', '.join([log.identifier for log in self.log_set.all()])
But then I'm not really getting rid of the old field now am I? I'm just adding a new one and keeping the original null from a certain date.
As far as I know, what I want to do is impossible. I'm only asking because I know that maybe I'm not the first one to deal with this sort of thing and maybe there is a solution that I'm not aware of.
Hope my explanation is clear enough, thanks in advance!

class Report(models.Model):
updater = models.CharField(max_length=15)
pub_date = models.DateTimeField(auto_add_now=True)
identifier = models.CharField(null=True)
... and so on...
logs = models.ManyToManyField(Log,null=True)
class Log(models.Model):
identifier = models.IntegerField()
Make the above model , and then make a script as follow:
ident_list = []
for reports in Report.objects.all():
identifiers = reports.identifiers.split(',')
for idents in identifiers:
if not idents in ident_list:
log = Log.create(**{'identifier' : int(idents)})
ident_list.append(int(idents))
else:
log = Log.objects.get(identifier = int(idents))
report.log.add(log)
Check the data before removing the column identifiers from the table Report.
Does it solves your purpose now ?

Related

ndb query by KeyProperty

I'm struggling with a KeyProperty query, and can't see what's wrong.
My model is
class MyList(ndb.Model):
user = ndb.KeyProperty(indexed=True)
status = ndb.BooleanProperty(default=True)
items = ndb.StructuredProperty(MyRef, repeated=True, indexed=False)
I create an instance of MyList with the appropriate data and can run the following properly
cls = MyList
lists = cls.query().fetch()
Returns
[MyList(key=Key('MyList', 12), status=True, items=..., user=Key('User', 11))]
But it fails when I try to filter by user, i.e. finding lists where the user equals a particular entity; even when using the one I've just used for insert, or from the previous query result.
key = lists[0].user
lists = cls.query(cls.user=key).fetch()
Returns
[]
But works fine with status=True as the filter, and I can't see what's missing?
I should add it happens in a unit testing environment with the following v3_stub
self.policy = datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=0)
self.testbed.init_datastore_v3_stub(
require_indexes=True,
root_path="%s/../"%(os.path.dirname(__file__)),
consistency_policy=self.policy
)
user=Key('User', 11) is a key to a different class: User. Not MyList
Perhaps you meant:
user = ndb.KeyProperty(kind='User', indexed=True)
Your code looks fine, but I have noticed some data integrity issues when developing locally with NDB. I copied your model and code, and I also got the empty list at first, but then after a few more attempts, the data is there.
Try it a few times?
edit: possibly related?
google app engine ndb: put() and then query(), there is always one less item

App Engine Search API - Sort Results

I have several entities that I am searching across that include dates, and the Search API works great across all of them except for one thing - sorting.
Here's the data model for one of my entities (simplified of course):
class DepositReceipt(ndb.Expando):
#Sets creation date
creation_date = ndb.DateTimeProperty(auto_now_add=True)
And the code to create the search.Document where de is an instance of the entity:
document = search.Document(doc_id=de.key.urlsafe(),
fields=[search.TextField(name='deposit_key', value=de.key.urlsafe()),
search.DateField(name='created', value=de.creation_date),
search.TextField(name='settings', value=de.settings.urlsafe()),
])
This returns a valid document.
And finally the problem line. I took this snippet from the official GAE Search API tutorial and just changed the direction of the sort to DESCENDING and changed the search expression to created (the date property from the Document above).
expr_list = [search.SortExpression(
expression="created", default_value='',
direction=search.SortExpression.DESCENDING)]
I don't think this is important, but the rest of the search code looks like this:
sort_opts = search.SortOptions(expressions=expr_list)
query_options = search.QueryOptions(
cursor=query_cursor,
limit=_NUM_RESULTS,
sort_options=sort_opts)
query_obj = search.Query(query_string=query, options=query_options)
search_results = search.Index(name=index_name).search(query=query_obj)
In production, I get this error message:
InvalidRequest: Failed to parse search request "settings:ag5zfmdoaWRvbmF0aW9uc3IQCxIIU2V0dGluZ3MYmewDDA"; failed to parse date
Changing the expression="created" to anything else works perfectly fine. This also happens across my other entity types that use dates, so I have no idea what's going on. Advice?
I think default_value needs to be a valid date, rather than '' as you have it.

Grails/GORM domain saving - transient object workaround

I found a work around to a problem I had, and I want to know if it is valid or not. It is a similar problem to: Grails Gorm : Object references an unsaved transient instance
Lets assume I have two domain Objects (names changed to protect the guilty).
public class Shelf {
String name
Set<Book> books = [] as Set
static hasMany = [books: Book]
}
and
public class Book {
String title
Shelf shelf
}
So this means that 1 Shelf contains 0 to many Books, and one Book can be on only one Shelf.
This Shelf is very large. And at some point, it contains 80,000 Books. All stored nicely in the DB. Of course, adding new Books is getting slower and slower.
This is done by:
Book book1 = new Book("Awesome Title")
existingShelf.addToBooks(book1)
existingShelf.save(flush: true) // super slow
This is slow. Mainly (I assume) because GORM has to confirm the other 80,000 records.
So I did this to try to work around the slow point.
Book book2 = new Book("Awesome Title 2")
book2.save(flush: true)
This gives me an "Object references an unsaved transient instance", which I guess makes sense - the "shelf" value is empty.
So I did something a little weird:
Book book3 = new Book("Awesome Title 3")
book3.shelf = new Shelf()
book3.shelf.id = <known/valid id here>
book2.save(flush: true)
This works. It saves. There are no referential errors. Further code that depends on this... works.
I just made a call that last minutes and reduced it down to seconds.
But that seems too easy. I'm sure I worked around Grails magic some how. And probably broke something in the process.
Advice? Explanations?
Yes, using addTo* methods can be slow. If you look at the generated SQL you'll understand why. Doing the following:
new Book(title: "GORM Performance", shelf: grailsShelf).save()
will be faster and there is technically nothing wrong with it. Just be aware of that your instance of grailsShelf.books will not contain the new book until you've refreshed the collection from the database. This is part of what the addTo* method does for you.
Side note:
Set<Book> books = [] as Set
is unnecessary.

How do I SubmitChanges on multiple tables that are related in LinqToSql?

I'm making a Windows Phone 7.1 application, and I'm having a lot of trouble submitting changes to my database. Here is the structure of the tables in my database:
Day <-1-----*-> TrainingSession <-many-----1-> Sport
So, a single day can have many training sessions, and a training session has one sport. A single sport can naturally be in many different training sessions.
The primary keys look like this:
Day - DateTime
TrainingSession - int (DB generated)
Sport - nvarchar(200)
Sports will simply have attributes sportName, and an iconFileName.
I've set up Associations by putting EntitySet in both Day and Sport, and TrainingSession has EntityRef and EntityRef. I'm not 100% sure if Sport needs the EntitySet, so please correct me if I'm wrong. For the moment, I just hard-coded some sports in my Sport class for testing, and you'll see me retrieving an ObservableCollection to get those out.
Here is how I am trying to create a collection of days with training sessions, each training session having different sports:
public void CreateDay(DateTime date)
{
FitPlanDataContext calendarDatabase = new FitPlanDataContext(FitPlanDataContext.ConnectionString);
DateTime firstDate = new DateTime(date.Year, date.Month, 1);
DayItem dayItem = new DayItem();
dayItem.DateTime = firstDate;
fillTestDayItemWithRandomData(dayItem);
calendarDatabase.DayItems.InsertOnSubmit(dayItem);
calendarDatabase.SubmitChanges();
}
private void fillTestDayItemWithRandomData(DayItem dayItem)
{
ObservableCollection<SportArt> sportArtCollection = SportArtController.GetAllSports();
dayItem.TrainingSessions = new EntitySet<TrainingSession>();
ObservableCollection<TrainingSession> trainingSessionCollection = new ObservableCollection<TrainingSession>();
TrainingSession trainingSession1 = new TrainingSession();
trainingSession1.DayItem = dayItem;
trainingSession1.SportArt = sportArtCollection[1];
trainingSessionCollection.Add(trainingSession1);
TrainingSession trainingSession2 = new TrainingSession();
trainingSession2.DayItem = dayItem;
trainingSession2.SportArt = sportArtCollection[2];
trainingSessionCollection.Add(trainingSession2);
FitPlanDataContext calendarDatabase = new FitPlanDataContext(FitPlanDataContext.ConnectionString);
calendarDatabase.TrainingSessions.InsertAllOnSubmit<TrainingSession>(trainingSessionCollection);
}
This code is not working for me, and it is giving me the following error:
NotSupportedException was Unhandled:
An attempt has been made to Attach or Add an entity that is not new, perhaps having been loaded from another DataContext. This is not supported.
Before I got this error, I was also getting NullReferenceExceptions.
I've been looking around for a solution, and I saw some people used Detach or workarounds with Attach, but I havent figured out how I could implement it to my code. Could anyone give me a helping hand with this?
Also, I thought the NullReferenceException could be coming from the fact that I'm not saving any sports to the database, could this be so?
So I messed around with it a lot, and today I finally found the solution I was looking for.
It seems I asked the question wrong. I didn't include the query from the database, which is probably important to add. I actually omitted a lot of the code to keep things simple in my question, but looks like I omitted too much.
Anyways, it turned out the way I setup the database structure was correct, and nothing had to be changed there.
So here's what I did to get it working:
-The call to the method that fills the day with training sessions needed to go after submitting changes about the day. This is because days have training sessions, and I cant save training sessions without the day already in the database.
-I added using statements around the places where I need to use the datacontext instead of just creating an instance of the datacontext with a local variable. This ensures that the datacontext lives only in the scope of the using statment.
(I changed the DateTime of the day to be the date given as the parameter to the method)
public void CreateDay(DateTime date)
{
DayItem dayItem = new DayItem();
dayItem.DateTime = date;
using (FitPlanDataContext calendarDatabase = new FitPlanDataContext(FitPlanDataContext.ConnectionString))
{
calendarDatabase.DayItems.InsertOnSubmit(dayItem);
calendarDatabase.SubmitChanges();
}
fillTestDayItemWithRandomData(dayItem);
}
Then, the changes to the method that fills the day with training sessions go like this:
-I open a using statement where I instantiate a new datacontext. Then I access the database to retrieve a list of all the sports, and also the day that I need to update. I find the day I need to update by dayItemParameter. (Remember that retrieving from the database will give you a collection.)
-I create my new training sessions and fill their properties. Note that the day I retrieved from the database is the value of a training session's property because the training session is a child of day, and needs to know who its parent day is.
-I removed the instantiation of EntitySet because I realized that I already instantiate it in the constructor of the DayItem class.
-Lastly, I add all the new training sessions into a collection, and save them all to the database at once using InsertAllOnSubmit(collection).
private void fillTestDayItemWithRandomData(DayItem dayItemParameter)
{
using (FitPlanDataContext calendarDatabase = new FitPlanDataContext(FitPlanDataContext.ConnectionString))
{
ObservableCollection<SportArt> sportArtCollection;
var sportArts = (from SportArt sportArt in calendarDatabase.SportArts
select sportArt);
sportArtCollection = new ObservableCollection<SportArt>(sportArts);
ObservableCollection<DayItem> dayItemCollection;
var dayItems = (from DayItem dayItem in calendarDatabase.DayItems
where dayItem.DateTime == dayItemParameter.DateTime
select dayItem);
dayItemCollection = new ObservableCollection<DayItem>(dayItems);
DayItem foundDayItem = dayItemCollection[0];
ObservableCollection<TrainingSession> trainingSessionCollection = new ObservableCollection<TrainingSession>();
TrainingSession trainingSession1 = new TrainingSession();
trainingSession1.DayItem = foundDayItem;
trainingSession1.SportArt = sportArtCollection[1];
trainingSessionCollection.Add(trainingSession1);
TrainingSession trainingSession2 = new TrainingSession();
trainingSession2.DayItem = foundDayItem;
trainingSession2.SportArt = sportArtCollection[2];
trainingSessionCollection.Add(trainingSession2);
calendarDatabase.TrainingSessions.InsertAllOnSubmit<TrainingSession>(trainingSessionCollection);
calendarDatabase.SubmitChanges();
}
}
Conclusion:
The main problem I was having was that I was trying to save training sessions to a day that wasn't submitted to the database. The next big problem (that I think many others have) is that reading and updating of an entity has to be in the same datacontext. So, you can't create a datacontext to retrieve a day, then use another datacontext to add a training session to that day (even if you saved the value of the day to a local variable). You need to retrieve the day and save training sessions to it all in the same data context.
At the moment, my application is working, but it is quite sluggish. In this question, I'm asking about just one day, but in my actual program, I'm creating hundreds of days, which means a lot of opening and closing of the database. If anyone has suggestions to how I can
optimize the process, I'm open ears.
I realize and apologize that this post got so long, but writing it helped me to understand the situation with more depth, and I really hope that it'll help others too.

Google App Engine - Is this also a Put method? Or something else

Was wondering if I'm unconsciously using the Put method in my last line of code ( Please have a look). Thanks.
class User(db.Model):
name = db.StringProperty()
total_points = db.IntegerProperty()
points_activity_1 = db.IntegerProperty(default=100)
points_activity_2 = db.IntegerProperty(default=200)
def calculate_total_points(self):
self.total_points = self.points_activity_1 + self.points_activity_2
#initialize a user ( this is obviously a Put method )
User(key_name="key1",name="person1").put()
#get user by keyname
user = User.get_by_key_name("key1")
# QUESTION: is this also a Put method? It worked and updated my user entity's total points.
User.calculate_total_points(user)
While that method will certainly update the copy of the object that is in-memory, I do not see any reason to believe that the change will be persisted to the the datastore. Datastore write operations are costly, so they are not going to happen implicitly.
After running this code, use the datastore viewer to look at the copy of the object in the datastore. I think that you may find that it does not have the changed total_point value.

Resources