Get all of the instance names from a many to many object? - database

I tried querying the Db with this bit of code:
Makesite.objects.values_list('ref_id', flat =True)
and it returned [1,2,None]. Which I found to be moderately confusing. I assumed that python saved the instances by their own names and not numbers that it just assigns to them. Any help with the code or an explanation to why python saves them as numbers and not their names would be awesome thanks.
In models.py
class Makesite(models.Model):
sitename = models.CharField(max_length=100, unique = True)
siteinfo = models.ManyToManyField(Siteinfo)
ref_id = models.ManyToManyField(RefID)
report = models.ManyToManyField(Report)

Django doesn't save m2m by their "names". It uses their primary keys (in your case - integers). You can check it, by viewing your DB tables.
When you just use something like makesite.sitename Django makes anither query to fetch sitename instance.
But when you use values_list you don't want extra queries or joins, that's why Django returns data from a single table. And all that it can get form there is a primary key.
If you want to optimize your related queries take a look at select_related / prefetch_related methods or use caching.

Related

How to change GUID of entity before saving back to database (in C#)

We have code to serialize entities to file:
formatter.Serialize(fileStream, session); // BinaryFormatter
and to deserialize:
session = (Session)binaryFormatter.Deserialize(innerStream);
The entity structure is quite complicated and if necessary I can come up with a simpler example, however the crux of the problem is this. There is a Patient table that has multiple "Sessions". The Patient has an associated table "Cities", and therein lies the problems. Each computer has it's own Database (SQLServer Express) and has a table Cities which contain exactly the same data (London, Madrid, Berlin). However, on each machine, the unique key for the city (a GUID) is different! When I deserialize a session on another machine from which it was serialized on, I want to use the same city based on name, NOT Guid.
The deserialization works fine. It's the deserialized entity to the database that causes me grief. To get the City GUID correct, I use:
session.Patient.City.CityGUID = tempCity.CityGUID;
session.Patient.CityGUID = tempCity.CityGUID;
tempCity is the entity from the database with the matching name (like 'London').
I can do this, but on the line:
context.Patients.Attach(session.Patient);
I get an exception of the form:
[System.InvalidOperationException] = {System.InvalidOperationException: The property 'CityGUID' is part of the object's key information and cannot be modified.
at System.Data.Entity.Core.Objects.EntityEntry.DetectChangesInProperty(Int32 ordinal, Boolean detectOnlyComplexProperties, Boolea...
Any ideas on how to fix this? I can create whole new objects but this is problematic because if I deserialize two sessions that both have the same patient, they should remain that way in the new database. My creating new entities, I end up with 2 new patients.
I think the best solution is to make sure the Cities have the same GUIDs on different machines, perhaps by modifying the install set, but I'm wondering if there is a simple fix.
Opinions?
Dave
In fact, I need to replace the entire entity: session.Patient.City = tempCity;

Fetching entities from datastore where Entity.key.IN([keys...])

I'm trying to fetch a long list of entities, and those entities all refer to one of a few different related entities. It's explained in the comments, but basically many "items" reference to a few "Company"s. I don't want to have to make multiple queries for each key in unique_key (IE key.get()), so I thought the following would work but it's returning an empty list. Pray tell, what am I doing wrong? Or is there a better way to accomplish this relationship of many items referencing a few, while minimizing calls to the db (I'm new to AppEngine Datastore).
Notice, this is in Python, using the ndb library offered by app engine.
# "items" is a list of entities that have a property "parenty_company"
# parent_company is a string of the Company key
# I get a unique list of all Key strings and convert them to Keys
# I then query for where the Company Key is in my unique list
unique_keys = list(set([ndb.Key(Company, prop.parent_company) for prop in items]))
companies = Company.query(Company.key.IN(unique_keys)).fetch()
You definitely should use ndb.get_multi(unique_keys). It will fetch all keys asynchronously in a single batch.

NDB Modeling One-to-one with KeyProperty

I'm quite new to ndb but I've already understood that I need to rewire a certain area in my brain to create models. I'm trying to create a simple model - just for the sake of understanding how to design an ndb database - with a one-to-one relationship: for instance, a user and his info. After searching around a lot - found documentation but it was hard to find different examples - and experimenting a bit (modeling and querying in a couple of different ways), this is the solution I found:
from google.appengine.ext import ndb
class Monster(ndb.Model):
name = ndb.StringProperty()
#classmethod
def get_by_name(cls, name):
return cls.query(cls.name == name).get()
def get_info(self):
return Info.query(Info.monster == self.key).get()
class Info(ndb.Model):
monster = ndb.KeyProperty(kind='Monster')
address = ndb.StringProperty()
a = Monster(name = "Dracula")
a.put()
b = Info(monster = a.key, address = "Transilvania")
b.put()
print Monster.get_by_name("Dracula").get_info().address
NDB doesn't accept joins, so the "join" we want has to be emulated using class methods and properties. With the above system I can easily reach a property in the second database (Info) through a unique property in the first (in this case "name" - suppose there are no two monsters with the same name).
However, if I want to print a list with 100 monster names and respective addresses, the second database (Info) will be hit 100 times.
Question: is there a better way to model this to increase performance?
If its truly a one to one relationship, why are creating 2 models. Given your example the Address entity cannot be shared with any Monster so why not put the Address details in the monster.
There are some reasons why you wouldn't.
Address could become large and therefore less efficient to retrieve 100's of properties when you only need a couple - though project queries may help there.
You change your mind and you want to see all monsters that live in Transylvania - in which case you would create the address entity and the Monster would have the key property that points to the Address. This obviously fails when you work out that some monsters can live in multiple places (Werewolfs - London, Transylvania, New York ;-) , in which case you either have a repeating KeyProperty in the monstor or an intermediate entity that points to the monster and the address. In your case I don't think that monsters on the whole have that many documented Addresses ;-)
Also if you are uniquely identifying monsters by name you should consider storing the name as part of the key. Doing a Monster.get_by_id("dracula") is quicker than a query by name.
As I wrote (poorly) in the comment. If 1. above holds and it is a true one to one relationship. I would then create Address as a child entity (Monster is the parent/ancestor in the key) when creating address. This allows you to,
allow other entities to point to the Address,
If you create a bunch of child entities, fetch them with a single
ancestor query). 3 If you have get monster and it's owned entities
again it's an ancestor query.
If you have a bunch of entities that
should only exist if Monster instance exists and they are not
children, then you have to do querys on all the entity types with
KeyProperty's matching the key, and if theses entities are not
PolyModels, then you have to perform a query for each entity
type (and know you need to perform the query on a given entity,
which involves a registry of some type, or hard coding things)
I suspect what you may be trying could be achieved by using elements described in the link below
Have a look at "Operations on Multiple Keys or Entities" "Expando Models" "Model Hooks"
https://developers.google.com/appengine/docs/python/ndb/entities
(This is probably more a comment than an answer)

Variable table name in Django

Can I use variable table name for db mapped objects? For example, there are n objects of the same structure and I want to store it in different tables, for raising performance on some operations.
Let's say I've got class defined as:
class Measurement(models.Model):
slave_id = models.IntegerField()
tag = models.CharField(max_length=40)
value = models.CharField(max_length=16)
timestamp = models.DateTimeField()
class Meta:
db_table = 'measurements'
Now all objects are stored into table 'measurements'. I would like to make table name dependant on 'slave_id' value. For example, to handle data from tables 'measurements_00001', 'measurements_00002' etc...
Is it possible to achieve this using Django ORM model or the only solution is to drop to SQL level?
In the vast majority of cases, this shouldn't buy you any performance advantage. Any RDBMS worth its salt should handle immense tables effortlessly.
If it's needed, there could be some sharding of the table. Again, managed by the DB server; at SQL level (and ORM) it should be seen as a single table. Ideally, the discrimination should be automatically handled; if not, most RDBMS let you specify it at table definition time (or sometimes tune with ALTER TABLE)
If you choose to define the sharding method, each RDBMS has it's own non-standard methods. Best not to tie your Python code to that; do the tuning once on the DB server instead.

Indexing URL's in SQL Server 2005

What is the best way to deal with storing and indexing URL's in SQL Server 2005?
I have a WebPage table that stores metadata and content about Web Pages. I also have many other tables related to the WebPage table. They all use URL as a key.
The problem is URL's can be very large, and using them as a key makes the indexes larger and slower. How much I don't know, but I have read many times using large fields for indexing is to be avoided. Assuming a URL is nvarchar(400), they are enormous fields to use as a primary key.
What are the alternatives?
How much pain would there likely to be with using URL as a key instead of a smaller field.
I have looked into the WebPage table having a identity column, and then using this as the primary key for a WebPage. This keeps all the associated indexes smaller and more efficient but it makes importing data a bit of a pain. Each import for the associated tables has to first lookup what the id of a url is before inserting data in the tables.
I have also played around with using a hash on the URL, to create a smaller index, but am still not sure if it is the best way of doing things. It wouldn't be a unique index, and would be subject to a small number of collisions. So I am unsure what foreign key would be used in this case...
There will be millions of records about webpages stored in the database, and there will be a lot of batch updating. Also there will be a quite a lot of activity reading and aggregating the data.
Any thoughts?
I'd use a normal identity column as the primary key. You say:
This keeps all the associated indexes smaller and more efficient
but it makes importing data a bit of a pain. Each import for the
associated tables has to first lookup what the id of a url is
before inserting data in the tables.
Yes, but the pain is probably worth it, and the techniques you learn in the process will be invaluable on future projects.
On SQL Server 2005, you can create a user-defined function GetUrlId that looks something like
CREATE FUNCTION GetUrlId (#Url nvarchar(400))
RETURNS int
AS BEGIN
DECLARE #UrlId int
SELECT #UrlId = Id FROM Url WHERE Url = #Url
RETURN #UrlId
END
This will return the ID for urls already in your URL table, and NULL for any URL not already recorded. You can then call this function inline your import statements - something like
INSERT INTO
UrlHistory(UrlId, Visited, RemoteIp)
VALUES
(dbo.GetUrlId('http://www.stackoverflow.com/'), #Visited, #RemoteIp)
This is probably slower than a proper join statement, but for one-time or occasional import routines it might make things easier.
Break up the URL into columns based on the bits your concerned with and use the RFC as a guide. Reverse the host and domain info so an index can group like domains (Google does this).
stackoverflow.com -> com.stackoverflow
blog.stackoverflow.com -> com.stackoverflow.blog
Google has a paper that outlines what they do but I can't find right now.
http://en.wikipedia.org/wiki/Uniform_Resource_Locator
I would stick with the hash solution. This generates a unique key with a fairly low chance of collision.
An alternative would be to create GUID and use that as the key.
I totally agree with Dylan. Use an IDENTITY column or a GUID column as surrogate key in your WebPage table. Thats a clean solution. The lookup of the id while importing isn't that painful i think.
Using a big varchar column as key column is wasting much space and affects insert and query performance.
Not so much a solution. More another perspective.
Storing the total unique URI of a page perhaps defeats part of the point of URI construction. Each forward slash is supposed to refer to a unique semantic space within the domain (whether that space is actual or logical). Unless the URIs you intend to store are something along the line of www.somedomain.com/p.aspx?id=123456789 then really it might be better to break a single URI metatable into a table representing the subdomains you have represented in your site.
For example if you're going to hold a number of "News" section URIs in the same table as the "Reviews" URIs then you're missing a trick to have a "Sections" table whose content contains meta information about the section and whose own ID acts as a parent to all those URIs within it.

Resources