What is the best way to represent a constrained many-to-many relationship within a relational database? - database

I would like to establish a many-to-many relationship with a constraint that only one or no entity from each side of the relationship can be linked at any one time.
A good analogy to the problem is cars and parking garage spaces. There are many cars and many spaces. A space can contain one car or be empty; a car can only be in one space at a time, or no space (not parked).
We have a Cars table and a Spaces table (and possibly a linking table). Each row in the cars table represents a unique instance of a car (with license, owner, model, etc.) and each row in the Spaces table represents a unique parking space (with address of garage floor level, row and number). What is the best way to link these tables in the database and enforce the constraint describe above?
(I am using C#, NHibernate and Oracle.)

If you're not worried about history - ie only worried about "now", do this:
create table parking (
car_id references car,
space_id references space,
unique car_id,
unique space_id
);
By making both car and space references unique, you restrict each side to a maximum of one link - ie a car can be parked in at most one space, and a space can has at most one car parked in it.

in any relational database, many to many relationships must have a join table to represent the combinations. As provided in the answer (but without much of the theoretical background), you cannot represent a many to many relationship without having a table in the middle to store all the combinations.
It was also mentioned in the solution that it only solves your problem if you don't need history. Trust me when I tell you that real world applications almost always need to represent historical data. There are many ways to do this, but a simple method might be to create what's called a ternary relationship with an additional table. You could, in theory, create a "time" table that also links its primary key (say a distinct timestamp) with the inherited keys of the other two source tables. this would enable you to prevent errors where two cars are located in the same parking spot during the same time. using a time table can allow you the ability to re-use the same time data for multiple parking spots using a simple integer id.
So, your data tables might look like so
table car
car_id (integers/numbers are fastest to index)
...
table parking-space
space_id
location
table timeslot
time_id integer
begin_datetime (don't use seconds unless you must!)
end_time (don't use seconds unless you must!)
now, here's where it gets fun. You add the middle table with a composite primary key that is made up of car.car_id + parking_space.space_id + time_id. There are other things you could add to optimize here, but you get the idea, I hope.
table reservation
car_id PK
parking_space_id PK
time_id PK (it's an integer - just try to keep it as highly granular as possible - 30 minute increments or something - if you allow this to include seconds / milliseconds /etc the advantages are cancelled out because you can't re-use the same value from the time table)
(this would also be the place to store variable rates, discounts, etc distinct to this particular account, reservation, etc).
now, you can reduce the amount of data because you aren't replicating the timestamp in the join table (reservation). By using an integer, you can re-use that timeslot for multiple parking spaces, but you could also apply a constraint preventing two cars from renting that given spot for the same "timeslot" for a given day / timeframe. This would also make it easier to store some history about the customers - who knows, you might want to see reports on customers who rent more often and offer them discounts or something.
By using the ternary relationship model, you are making each spot unique to a given timeslot (perhaps with some added validation rules), so the system can only store one car in one parking spot for one given time period.
By using integers as keys instead of timestamps, you are assured that the database won't need to do any heavy lifting to index the keys and sort / query against. This is a common practice in data warehousing / OLAP reporting when you have massive datasets and you need efficiency. I think it applies here as well.

create a third table.
parking
--------
car_id
space_id
start_dt
end_dt
for the constraint, i guess the problem with your situation is that you need to check a complex rule against the intersection table itself. if you try this in a trigger, it will report a mutation.
one way to avoid this would be to replicate the table, and query against this replication for the constraint.

Related

one to many relationship vs. multiple records in a single table

I'm designing a payment system. Which of the following two designs is more practical, generally implemented and considered a good practice?
Design 1
Consider two entities — order and credit_card_details.
A credit card might be used for payment of several orders. So we have a 1:M relationship between credit_card_details and order. Keep in mind that each record in credit_card_details is unique with the attributes like card_holder_name, cvv, expiry_date, etc. These are filled in a form while making the payment. This design requires that whenever a payment is made, I would need to lookup the credit_card_details table to check whether a new/old credit card is being used. If the credit card is —
Old: The corresponding FK is added to the order table.
New: A new record is added in credit_card_details and then the corresponding FK is added to the order table
Design 2
This is relatively simpler. I use a single order table where all the attributes of credit_card_details from the previous design are merged to the former table. Whenever an order is placed, I need not check for the existence of the entered credit card details and I simply insert them in order table. However, it comes with the cost of possible duplicate credit card details.
Personally option one makes sense, option 2 does not give you 3NF, and the data is denormalized and hence you may have duplicated data. What if the customer returns the order and you want to make a reverse payment and the card has expired? These are just some common curveballs I am throwing up. It all depends on the given scenarios.
Also how imagine that you wanted a history of all the credit cards associated to a user and against the orders???, what would be a logical way to store these in the database? Surely a separate table right?
So a given user may have 0 to many cards.
A card can be associated to 1 or many orders
And an order is always associated to one card.
Consider possible searching options as well, and look up speed, better to have a unique foreign key in the order table.
A third option might be to have an Order table, Card table and OrderCard table although personally again it depends on your domain, although I think option three may be overkill?
Hope this helps in your design

How to design parking street database?

I try to design database which contains data about street parking. Parking have gps coordinates, time restriction by day, day of week rules (some days are permitted, other restricted), free or paid status. In the end, I need to do some queries that can specify parking by criteria.
For first overdraw I try to do something like this:
Pakring
-------
parkingId
Lat
Long
Days (1234567)
Time -- already here comes trouble
But it`s not normalized and quickly overflow database. How to design data in the best way?
Update For now I have two approaches
The first one is:
I try to use restrictions tables with many-to-many links.(This is example for days and months). But queries will be complicated and I don`t now how to link time with day.
The second approach is:
Using one restricted table with Type field, that will have priority. But this solution also not normalized.
Just to be clear what data I have.
PakingId Coords String Description(NO PARKING11:30AM TO 1PM THURS)
And I want to show user where he can find street parking by area, time and day.
Thanks to all for your help and time.
This seems like a difficult task. Just a few thoughts.
Are you only concerned with street parking? Parking houses have multiple floors so GPS coordinates won't work unless you stay on the streets.
What is the accuracy of the coordinates? Would it be easier to identify each parking space individually by some other standard. Like unique identifiers of the painted parking squares. (But what happens if people don't park into squares? Or the GPS coordinates accuraycy fails/is not exact enough because of illegal parking? Do you intend to keep records of the parking tickets too?)
Some thought for the tables or information you need to take into account:
time: opening hours, days
price: maybe a different price for different time intervals?
exceptions: holidays, maintenance (maybe not so important, you could just make parking space status active/inactive)
parking slot: id (GPS/random id), status
Three or four tables above could be linked by an intermediate table which reveals the properties of a parking space for every possible parking time (like a prototype for all possible combinations). That information could be linked into another table where you keep records of a actual parking events (so you can for example keep records of people who have or have not paid their bills if you need to).
There are lots of stuff that affect your implementation so you really need to list all the rules of the parking space (and event?). Database structure can be done (and redone) later after you have an understanding of the properties of the events you need to keep records of. And thats the key to everything: understanding what you need to do so you can design and create the implementation. If the implementation (application) doesn't work change the implementation. If the design is faulty redesign. If you don't undestand the whole process (what you really need), everything you do is bound to fail. (Unless you are incredibly lucky but I wouldn't count on luck...)
Try using two tables with an intersection entity between them.
Table parking will have parking_id, lat and long columns. Table Restrictions will have all the type of restrictions that you have in your scenario with something like restriction_id, restriction_day, restriction_time and restriction_status and maybe restriction_type.
Then you can link the two tables with foreign key constraints in the intersection entity.
Example parking_id has restriction_id.
This way a parking can have more than one restriction and a restriction can be applied to more than one parking.
As you seem to have heard of normalization, and following the comment from Damien, you should use different tables to represent different things.
You should then think about how to link those tables together, and in the process define the type of relationship between the 2. Could be one-to-one (this one is the one where you could be tempted to put everything in the same table, but a simple foreign key in a linked table is cleaner), one-to-many (this is where the trouble would begin if you put everything in one table, cause now there will be several lines in the linked table with the same foreign key, and if everything was in the same table, you'd have to myltiply the fields in that table), or many to many (where you would need to add a table only to make the link between 2 other tables, thus with 2 foreign key fields pointing to records in both tables).
For example, in your case, a Parking table could hold the parking name, coordinates, etc.
A second table TimeTable could hold the opening days/time for each parking, with a foreign key to the parkingId (making it a one-to-many rlationship, 1 parking can have many opening frames). The fields of this table could for example be DayOfWeek (number indicating the day), openingTime, closingTime. This would allow you to define several timeframes on the same day, or a single one (if it's always open for example), giving in this case 7 records in this table for this parking (=> one-to-many relationship).
You could then imagine a 3rd table Price where you put data concerning the price of that parking (probably a one-to-many too, with records for hourly rates/long stay/..., and so on depending on the needs and the different "objects" you would need to represent.
Please note these are only rough examples. Database design can sometimes be very tricky and that's a matter I'm not specialist in, but I think these advises can help you go further and come back with another question if you get stuck.
Good luck !

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

Why use a 1-to-1 relationship in database design?

I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.

How do I organize such database in SQLite?

folks! I need some help with organizing database for application and I have no idea how to do it. Suppose following:
There is a list of academic subjects. For each subject we need to have a list of academic groups, which attend this subject. Then, for each group we need to have a list of dates. And for each date we need to have a list of students, and whether this student was present that day or not.
I have ugly data structures in my mind, will appreciate any help.
UPDATE
How do I see it:
Table1(the first col is date and second is list student's id, who were present)
10/10/11 | id1, id2, id3
10/11/11 | id1, 1d3, id5
Table2:
subject1 | id1 id2 id3
subject2 | id3 id2
And again, ids are id of groups. Dont know how to connect those tables.
There are many considerations to balance when designing a database, but based on the information you provided so far, something like this might be a good start:
This ER model uses a lot identifying relationships (i.e. "migrating" parent's primary key into child's PK) and results in natural primary keys, as opposed to non-identifying relationships that would require usage of surrogate keys. A lot of people like surrogate keys these days, but the truth is that both design strategies have pros and cons. In particular:
Natural keys are bigger (they "accumulate" fields over multiple levels of parent-child relationships).
But also, natural keys require less JOINing.
Natural keys can enforce constraints better in some special cases (such as diamond-shaped dependencies).
You will design one table for each kind of "thing" (subjects, groups, students, meetings) in your database. Each table will have one column for each datum (piece of information) you need to store about the thing. Additionally, there must be one column, or a predictable combination of columns, that will allow you to uniquely identify each thing (row) that you store in the table.
Then, you will decide how the things (subjects, groups, students, meetings) are related to each other and make sure that you have the correct columns in each table to store those relationships. You will discover that in some cases this can be done by adding one or more columns to the tables you already defined. In other cases, you will need to add a completely new table that doesn't store a "thing", per se, but rather a relationship between two things.
Once you have your list of tables and columns, if you feel that fails to represent some part of the problem correctly, post another question with the work you've already done and I'm sure you'll find someone to help you complete the assignment.
Response to your update:
You are on the wrong track. It is a bad idea (and contrary to correct relational database design) to ever store two values in a single field. So each of the tables you wrote about should have two columns (as you said), but the second column should store one and only one id. Instead of one row in table1 for 10/10/11, you would have three separate rows in your table.
But, before you start worrying about the "relationships", create tables to hold the "things".
I also suggest you pick up a basic guide to relational databases.

Resources