Database design for variable number of attributes - database

I am trying to come up with a schema for an Oracle db. The problem is as follows:
The database should represent an attribute (- say a url) associated with a variety of attributes for a variety of values.
For Example:
The database should have the mapping:
If attribute-X has value-X ==> Url-X
If attribute-Y has value-Y ==> Url-Y
If attribute-X has value-X && attribute-Y has value-Y ==> Url-XY
Also the number of attributes is not defined, so these cannot correspond to a attribute in the db.
The workaround I have thought is to store it as a multi name value-pair and use the same to look up in the database.
For example:
**attribute** **Key** **Value**
attribute-X value-X Url-X
attribute-Y value-Y Url-Y
attribute-X&attribute-Y value-X&value-Y Url-XY
I am new to databases and I am aware that is is not a neat representation of the data model. Is there a better way to represent this ?

You could model this with one table, but then you would need to keep adding attributes (new column for each new attribute), which is not a good design.
You can however, model this so you can add attributes types, then add the attribute dynamically.

Related

Projection query with new fields/properites ignores entries that haven't set those properties yet

I have an Article type structured like this:
type Article struct {
Title string
Content string `datastore:",noindex"`
}
In an administrative portion of my site, I list all of my Articles. The only property I need in order to display this list is Title; grabbing the content of the article seems wasteful. So I use a projection query:
q := datastore.NewQuery("Article").Project("Title")
Everything works as expected so far. Now I decide I'd like to add two fields to Article so that some articles can be unlisted in the public article list and/or unviewable when access is attempted. Understanding the datastore to be schema-less, I think this might be very simple. I add the two new fields to Article:
type Article struct {
Title string
Content string `datastore:",noindex"`
Unlisted bool
Unviewable bool
}
I also add them to the projection query, since I want to indicate in the administrative article list when an article is publicly unlisted and/or unviewable:
q := datastore.NewQuery("Article").Project("Title", "Unlisted", "Unviewable")
Unfortunately, this only returns entries that have explicitly included Unlisted and Unviewable when Put into the datastore.
My workaround for now is to simply stop using a projection query:
q := datastore.NewQuery("Article")
All entries are returned, and the entries that never set Unlisted or Unviewable have them set to their zero value as expected. The downside is that the article content is being passed around needlessly.
In this case, that compromise isn't terrible, but I expect similar situations to arise in the future, and it could be a big deal not being able to use projection queries. Projections queries and adding new properties to datastore entries seem like they're not fitting together well. I want to make sure I'm not misunderstanding something or missing the correct way to do things.
It's not clear to me from the documentation that projection queries should behave this way (ignoring entries that don't have the projected properties rather than including them with zero values). Is this the intended behavior?
Are the only options in scenarios like this (adding new fields to structs / properties to entries) to either forgo projection queries or run some kind of "schema migration", Getting all entries and then Puting them back, so they now have zero-valued properties and can be projected?
Projection queries source the data for fields from the indexes not the entity, when you have added new properties pre-existing records do not appear in those indexes you are performing the project query on. They will need to be re-indexed.
You are asking for those specific properties and they don't exist hence the current behaviour.
You should probably think of a projection query as a request for entities with a value in a requested index in addition to any filter you place on a query.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

cakephp setting select options and values at Model

In my database model, my attribute is set as type INT.
On the front end, I want to display a select field with representative values for the respective Integer values.
eg: [1 = Home, 2 = About]
I am currently using an external plugin for the administrating content, and the select values only allows integer. So my idea is to achieve this at respective Model. Is it possible?
Genarally yes.
You should be able to attach results of Model->find('list') to select field. Of course your model should have name or title fields for description values (Home, About).
Sounds like the kind of enum representation as I always use.
Try this solution:
http://www.dereuromark.de/2010/06/24/static-enums-or-semihardcoded-attributes/
I basically uses an array matching to resolve those ints into strings in a clean way - using the model. can be the whole array for select fields or just the specific string for output in the view/index.
Its also fully form and bake-template capable.
If you name the field "attribute" in your table, and name the method "attributes()" you can easily have "cake bake" to bake this via custom templates.

Is it possible to use a database sequence for a non-PK field in a Grails app?

In my application i need a unique value for a specific field in the database. This field has to be of type Integer. So i was wondering if it is possible to use a sequence for the field? And how to i implement that in my GORM domain class?
See the grails doc on how to use sequences. Depending on the source of the sequence ( (oracle/postgres) sequence number generator type / database table)
static mapping = {
intField1 generator:'sequence', params:[sequence:'sequence_name']
intField2 generator:'sequence', params:[table: 'hi_value', column: 'next_value', max_lo: 100]
}

pull Drupal field values with db_query() or db_select()

I've created a content type in Drupal 7 with 5 or 6 fields. Now I want to use a function to query them in a hook_view call back. I thought I would query the node table but all I get back are the nid and title. How do I get back the values for my created fields using the database abstraction API?
Drupal stores the fields in other tables and can automatically join them in. The storage varies depending on how the field is configured so the easiest way to access them is by using an EntityFieldQuery. It'll handle the complexity of joining all your fields in. There's some good examples of how to use it here: http://drupal.org/node/1343708
But if you're working in hook_view, you should already be able access the values, they're loaded into the $node object that's passed in as a parameter. Try running:
debug($node);
In your hook and you should see all the properties.
If you already known the ID of the nodes (nid) you want to load, you should use the node_load_multiple() to load them. This will load the complete need with all fields value. To search the node id, EntityFieldQuery is the recommended way but it has some limitations. You can also use the database API to query the node table for the nid (and revision ID, vid) of your nodes, then load them using node_load_multiple().
Loading a complete load can have performance impacts since it will load way more data than what you need. If this prove to be an issue, you can either try do directly access to field storage tables (if your fields values are stored in your SQL database). The schema of these tables is buld dynamicaly depedning on the fields types, cardinality and other settings. You will have to dig into your database schema to figure it out. And it will probably change as soon as you change something on your fields.
Another solution, is to build stub node entities and to use field_attach_load() with a $options['field_id'] value to only load the value of a specific field. But this require a good knowledge and understanding of the Field API.
See How to use EntityFieldQuery article in Drupal Community Documentation.
Creating A Query
Here is a basic query looking for all articles with a photo that are
tagged as a particular faculty member and published this year. In the
last 5 lines of the code below, the $result variable is populated with
an associative array with the first key being the entity type and the
second key being the entity id (e.g., $result['node'][12322] = partial
node data). Note the $result won't have the 'node' key when it's
empty, thus the check using isset, this is explained here.
Example:
<?php
$query = new EntityFieldQuery();
$query->entityCondition('entity_type', 'node')
->entityCondition('bundle', 'article')
->propertyCondition('status', 1)
->fieldCondition('field_news_types', 'value', 'spotlight', '=')
->fieldCondition('field_photo', 'fid', 'NULL', '!=')
->fieldCondition('field_faculty_tag', 'tid', $value)
->fieldCondition('field_news_publishdate', 'value', $year. '%', 'like')
->fieldOrderBy('field_photo', 'fid', 'DESC')
->range(0, 10)
->addMetaData('account', user_load(1)); // Run the query as user 1.
$result = $query->execute();
if (isset($result['node'])) {
$news_items_nids = array_keys($result['node']);
$news_items = entity_load('node', $news_items_nids);
}
?>
Other resources
EntityFieldQuery on api.drupal.org
Building Energy.gov without Views

Resources