I'm designing a database to house scientific test data, using sqlalchemy. I've hit a problem that I can't seem to figure out.
In my test data, each Observation has a State (position, velocity, acceleration), and a State has an associated Time (time at which the state applies). So far, so good. I made a separate table for Times because I deal with different kinds of times, and I wanted to use a reference table to indicate what kind of time each time is (state time, observation time, etc). And the types of times I deal with might change, so normalizing in this way I think will let me add new kinds of times in the future, since they're just rows in a reference table.
So far this part works (using declarative style):
class Observation(Base):
__tablename__ = 'tbl_observations'
id = Column(Integer, primary_key=True)
state_id = Column(Integer, ForeignKey('tbl_states.id'))
state = relationship('State', uselist=False)
class State(Base):
__tablename__ = 'tbl_states'
id = Column(Integer, primary_key=True)
time_id = Column(Integer, ForeignKey('tbl_times.id'))
time = relationship('Time', uselist=False)
class Time(Base):
__tablename__ = 'tbl_times'
id = Column(Integer, primary_key=True)
time_type_id = Column(Integer, ForeignKey('ref_tbl_time_types.id'))
time_type = relationship('TimeType', uselist=False)
time_value = Column(Float)
class TimeType(Base):
__tablename__ = 'ref_tbl_time_types'
id = Column(Integer, primary_key=True)
desc = Column(String)
The wrinkle is that observations themselves can have different kinds of times. When I try to create a one-to-many relationship between Observation and Time, I get a circular dependency error:
class Observation(Base):
__tablename__ = 'tbl_observations'
id = Column(Integer, primary_key=True)
state_id = Column(Integer, ForeignKey('tbl_states.id'))
state = relationship('State', uselist=False)
# Added this line:
times = relationship('Time')
class Time(Base):
__tablename__ = 'tbl_times'
id = Column(Integer, primary_key=True)
time_type_id = Column(Integer, ForeignKey('ref_tbl_time_types.id'))
time_type = relationship('TimeType', uselist=False)
time_value = Column(Float)
# Added this line:
observation_id = Column(Integer, ForeignKey('tbl_observations.id'))
I'm guessing this breaks because the original Observation -> State -> Time chain has a reference right back up to Observation.
Is there any way to fix this? Have I gotten my design all screwed up? Am I doing something wrong in sqlalchemy? I'm new to all of this so it could be any of the above. Any help you can give would be very much appreciated.
P.S. I tried doing what was recommended here: Trying to avoid a circular reference but either I did it wrong or it didn't solve my particular problem.
The other answers here regarding reconsideration of your use case are valuable, and you should consider those. However, as far as SQLAlchemy is concerned, the circular dependency issue due to multiple FKs is solved by the use_alter/post_update combination, documented at http://docs.sqlalchemy.org/en/rel_0_7/orm/relationships.html#rows-that-point-to-themselves-mutually-dependent-rows . Here is the model using that:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base= declarative_base()
class Observation(Base):
__tablename__ = 'tbl_observations'
id = Column(Integer, primary_key=True)
state_id = Column(Integer, ForeignKey('tbl_states.id'))
state = relationship('State', uselist=False)
times = relationship('Time')
class State(Base):
__tablename__ = 'tbl_states'
id = Column(Integer, primary_key=True)
time_id = Column(Integer, ForeignKey('tbl_times.id'))
# post_update is preferable on the many-to-one
# only to reduce the number of UPDATE statements
# versus it being on a one-to-many.
# It can be on Observation.times just as easily.
time = relationship('Time', post_update=True)
class Time(Base):
__tablename__ = 'tbl_times'
id = Column(Integer, primary_key=True)
time_type_id = Column(Integer, ForeignKey('ref_tbl_time_types.id'))
time_type = relationship('TimeType', uselist=False)
time_value = Column(Float)
observation_id = Column(Integer, ForeignKey('tbl_observations.id',
use_alter=True, name="fk_time_obs_id"))
class TimeType(Base):
__tablename__ = 'ref_tbl_time_types'
id = Column(Integer, primary_key=True)
desc = Column(String)
e = create_engine("postgresql://scott:tiger#localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)
s = Session(e)
tt1 = TimeType(desc="some time type")
t1, t2, t3, t4, t5 = Time(time_type=tt1, time_value=40), \
Time(time_type=tt1, time_value=50), \
Time(time_type=tt1, time_value=60),\
Time(time_type=tt1, time_value=70),\
Time(time_type=tt1, time_value=80)
s.add_all([
Observation(state=State(time=t1), times=[t1, t2]),
Observation(state=State(time=t2), times=[t1, t3, t4]),
Observation(state=State(time=t2), times=[t2, t3, t4, t5]),
])
s.commit()
You have a many to one relationship between Observations and States. So one State can have many Observations, and every Observation has one State.
You also have a many to one relationship between States and Times. So one Time can have many States, and every State has one Time.
You are correct in that the problem is the reference back to observations from Times. You are forcing each Time to have an Observation, which in turn has to have a State, which in turn has to have a Time (and then the loop repeats forever).
To break this you need to figure out what you are actually trying to portray in these relationships. If an Observation has a State, which has a Time, then the Observation has a Time (you can get the Time from the State).
So the real question you need to answer is: What does it mean to say that a Time has an Observation? how would you be using that in your application?
I guess I do not completely get the model names in your object model and how they correspond to the real world. But I will try to guess. First, I doubt that the model Time (which looks to be rather basic and almost logic-free) should have a ForeignKey to some higher-level model class Observation. In light of this, I see your model not as a chain of n-1 relationships, but rather a kind of ternary relationship. So I could see you model like following:
class Base(object):
id = Column(Integer, primary_key=True)
class Observation(Base):
__tablename__ = 'tbl_observations'
class ObservationInstance(Base):
__tablename__ = 'tbl_observation_instances'
observation_id = Column(Integer, ForeignKey('tbl_observations.id'))
state_id = Column(Integer, ForeignKey('tbl_states.id'))
time_id = Column(Integer, ForeignKey('tbl_times.id'))
# relationships
observation = relationship('Observation', backref="instances")
state = relationship('State')
time = relationship('Time')
class State(Base):
__tablename__ = 'tbl_states'
class Time(Base):
__tablename__ = 'tbl_times'
time_type_id = Column(Integer, ForeignKey('ref_tbl_time_types.id'))
time_type = relationship('TimeType', uselist=False)
time_value = Column(Float)
class TimeType(Base):
__tablename__ = 'ref_tbl_time_types'
desc = Column(String)
Hope this makes any sense, and fits the real world you are trying to model. I assumed that you model represents some kind of (scientific) experiment. In this case I would rename Observation -> Experiement and ObservationInstance -> Observation.
Related
Maybe i have an understanding problem. I try to make 2 tabeles in one database. But additionaly i need to have some temporary values in one class that i doen´t want to write to the database.
I try to switch to peewee and read the dokumentation but i find no solution at my own.
without peewee i would make an init method where i write my attributes. But where did i have to write them now?
from peewee import *
import datetime
db = SqliteDatabase('test.db', pragmas={'foreign_keys': 1})
class BaseModel(Model):
class Meta:
database = db
class Sensor(BaseModel):
id = IntegerField(primary_key=True)
sort = IntegerField()
name = TextField()
#def __init__(self):
#self.sometemporaryvariable = "blabla"
def meineparameter(self, hui):
self.hui = hui
print(self.hui)
class Sensor_measure(BaseModel):
id = ForeignKeyField(Sensor, backref="sensorvalues")
timestamp = DateTimeField(default=datetime.datetime.now)
value = FloatField()
class Meta:
primary_key = CompositeKey("id", "timestamp")
db.connect()
db.create_tables([Sensor_measure, Sensor])
sensor1 = Sensor.create(id=2, sort=20, name="Sensor2")
#sensor1.sometemporaryvariable = "not so important to write to the database"
sensor1.save()
Remember to call super() whenever overriding a method in a subclass:
class Sensor(BaseModel):
id = IntegerField(primary_key=True)
sort = IntegerField()
name = TextField()
def __init__(self, **kwargs):
self.sometemporaryvariable = "blabla"
super().__init__(**kwargs)
I have a sqlite table defined as:
class HourlyUserWebsite(Base):
__tablename__ = 'hourly_user_website'
id = Column(Integer, primary_key=True)
user = Column(String(600), index=True)
domain = Column(String(600))
time_secs = Column(Integer, index=True)
def __repr__(self):
return "HourlyUserWebsite(user='%s', domain='%s', time_secs=%d)" % \
(self.user, self.domain, self.time_secs)
and I add elements to it with a class method as:
def add_elements_to_hourly_db(self, data, start_secs, end_secs, engine):
session = self._get_session(engine)
for el in data:
session.add(el)
session.commit()
return
as the data is time series I am expecting to add always elements with increasing or equal time_secs value (not decreasing).
I get the data from the table with a query like:
session.query(HorlyUserWebsite)
I'd like to have the results from the query sorted by time_secs and by user.
Is there any way I can do it? Can the data be stored in such a way that query for sorted data is optimised keeping in mind that it is a time series?
session.query(HourlyUserWebsite).order_by(HourlyUserWebsite.user,HourlyUserWebsite.time_secs.desc()).all()
I have a model named Post, which has a boolean field called is_answer. If the is_answer field of a Post is True, it's a "question"; otherwise, it's an "answer". I want to create the following question-answer relationship:
One "question" may have many "answer"s, but one "answer" has and only has one "question". Due to the fact that both "question" and "answer" are essentially Posts, I think the relationship must be self-referencing.
Here is what I've tried:
class Post(db.Model):
__tablename__ = 'posts'
id = db.Column(db.Integer, primary_key=True)
is_question = db.Column(db.Boolean)
post_id = db.Column(db.Integer, db.ForeignKey('posts.id'))
question = db.relationship('Post', backref=db.backref('answer', lazy='dynamic'), uselist=False, lazy='dynamic')
The error is:
ArgumentError: Post.question and back-reference Post.answer are both
of the same direction symbol('ONETOMANY'). Did you mean to set
remote_side on the many-to-one side ?
You need to add remote_side argument to create self-referenced relation. More information in documentaion.
UPDATED: by the way, I think you don't need boolean flag is_question, because you can determine questions and answers by checking post_id field is Null or not.
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
post_id = Column(Integer, ForeignKey('posts.id'))
question = relationship('Post', remote_side=[id], backref=backref('answers'), uselist=False)
Test:
session.add(
Post(
id=1,
post_id=None
)
)
session.add(
Post(
id=2,
post_id=1
)
)
session.add(
Post(
id=3,
post_id=1
)
)
session.commit()
question = session.query(Post).get(1)
print question.answers # output [post2, post3]
answer = session.query(Post).get(2)
print answer.question.id # output 1
# Receive all answers
print session.query(Post).filter(Post.post_id.isnot(None)).all()
You can use the below question and answer table.
class Answer(Base):
__tablename__="answers"
id = Column(Integer, primary_key=True)
mcq_id = Column(Integer,ForeignKey('questions.id'))
answer_text = Column(Text())
is_correct = Column(Boolean, nullable=False, default=False)
class Question(Base):
__tablename__="questions"
id = Column(Integer, primary_key=True)
question_text = Column(Text())
answer_explanation = Column(Text())
answer_choices = relationship('Answer',
primaryjoin="and_(Question.id == Answer.mcq_id )",
cascade="all, delete-orphan",
foreign_keys=[Answer.mcq_id])
# If you have more than one answers then define this function in your model.
def has_more_than_one_correct_answer(self):
count = 0
for choice in self.answer_choices:
if choice.is_correct:
count = count + 1
if count > 1:
return True
else:
return False
You can see the relationship between two tables. And you can access the relationship using joinedload or joinedload_all if you are using sqlalchemy.
Using Django, I am creating a database that will keep track of unanswered posts in forum and if/what employee(operator) is assigned to that post.
Models Operator and ThreadVault are permanent while Thread is intermediate/temp.
I will be making a api call to the forums to get a list of the unanswered posts once every ten minutes. I will then check to see if the thread ID already exists in the model ThreadVault. If not, it will add it to ThreadVault. Then, I will have a temporary/intermediate table Thread that will contain the unanswered posts for the past 10 minutes. After every 10 minutes, the table Thread will clear out and refresh with a new batch of unanswered threads.
A operator/employee may or may not be assigned to the thread. To do this, I am having ThreadVault operator_user_name point to Operator model.
class Operator:
operator_ldap = models.ForeignKey(settings.AUTH_USER_MODEL,
related_name='operator_requester')
operator_irc_name = models.CharField(max_length="25")
operator_user_name = models.CharField(max_length="25")
class ThreadVault:
thread_id = models.CharField(max_length="50")
url = models.CharField(max_length="200")
operator_user_name = models.ForeignKey(Operator) ## Can be Empty
#intermediate table
#Thread model clears out once every
#10 minutes when API repopulates data
class Thread:
url = models.ForeignKey(ThreadVault)
author_username = models.CharField(max_length="50")
author_name = models.CharField(max_length="50")
thread_id = models.ForeignKey(ThreadVault)
forum_id = models.CharField(max_length="50")
subject = models.CharField(max_length="200")
reply_count = models.CharField(max_length=("3"))
latest_post_date = models.CharField(max_length=("50"))
operator_user_name = models.ForeignKey(ThreadVault) ## Can be Empty
I know at this point I am not doing this correctly. How can I do this?
This worked out perfect:
class Operator(models.Model):
operator_ldap = models.ForeignKey(settings.AUTH_USER_MODEL,
related_name='operator_requester')
operator_irc_name = models.CharField(max_length="25",
blank=True, null=True)
operator_user_name = models.CharField(max_length="25",
blank=True, null=True)
class ThreadVault(models.Model):
thread_id = models.CharField(max_length="50")
url = models.CharField(max_length="200")
operator_user_name = models.ForeignKey(Operator, blank=True, null=True) ## Can be Empty
#intermediate table
#Thread model clears out once every
#10 minutes when API repopulates data
class Thread(models.Model):
url = models.ForeignKey(ThreadVault,
related_name="url_vault")
author_username = models.CharField(max_length="50")
author_name = models.CharField(max_length="50")
thread_id = models.ForeignKey(ThreadVault,
related_name="thread_vault")
forum_id = models.CharField(max_length="50")
subject = models.CharField(max_length="200")
reply_count = models.CharField(max_length=("3"))
latest_post_date = models.CharField(max_length=("50"))
operator_user_name = models.ForeignKey(ThreadVault,
related_name="operator_user_name_vault",
blank=True, null=True) ## Can be Empty
I have 2 models:
Category(models.Model):
name = models.CharField(max_length=30)
no_of_posts = models.IntegerField(default=0) # a denormalised field to store post count
Post(models.Model):
category = models.ForeignKey(Category)
title = models.CharField(max_length=100)
desc = models.TextField()
user = models.ForeignKey(User)
pub_date = models.DateTimeField(null=True, blank=True)
first_save = models.BooleanField()
Since I always want to show the no. of posts alongwith each category, I always count & store them every time a user creates or deletes a post this way:
## inside Post model ##
def save(self):
if not pub_date and first_save:
pub_date = datetime.datetime.now()
# counting & saving category posts when a post is 1st published
category = self.category
super(Post, self).save()
category.no_of_posts = Post.objects.filter(category=category).count()
category.save()
def delete(self):
category = self.category
super(Post, self).delete()
category.no_of_posts = Post.objects.filter(category=category).count()
category.save()
........
My question is whether, instead of counting every object, can we not use something like:
category.no_of_posts += 1 // in save() # and
category.no_of_posts -= 1 // in delete()
Or is there a better solution!
Oh, I missed that! I updated the post model to include the relationship!
Yes, a much better solution:
from django.db.models import Count
class CategoryManager(models.Manager):
def get_query_set(self, *args, **kwargs):
qs = super(CategoryManager, self).get_query_set(*args, **kwargs)
return qs.annotate(no_of_posts=Count('post'))
class Category(models.Model):
...
objects = CategoryManager()
Since you didn't show the relationship between Post and Category, I guessed on the Count('posts') part. You might have to fiddle with that.
Oh, and you'll want to get rid of the no_of_posts field from the model. It's not necessary with this. Or, you can just change the name of the annotation.
You'll still be able to get the post count with category.no_of_posts but you're making the database do the legwork for you.