I need to store several date values in a database field. These values will be tied to a "User" such that each user will have their own unique set of these several date values.
I could use a one-to-many relationship here but each user will have exactly 4 date values tied to them so I feel that a one-to-many table would be overkill (in many ways e.g. speed) but if I needed to query against them I would need those 4 values to be in different fields e.g. MyDate1 MyDate2 ... etc. but then the SQL command to fetch it out would have to check for 4 values each time.
So the one-to-many relationship would probably be the best solution, but is there a better/cleaner/faster/whatever another way around? Am I designing it correctly?
The platform is MS SQL 2005 but solution on any platform will do, I'm mostly looking for proper db designing techniques.
EDIT: The 4 fields represent 4 instances of the same thing.
If you do it as four separate fields, then you don't have to join. To Save the query syntax from being too horrible, you could write:
SELECT * FROM MyTable WHERE 'DateLiteral' IN (MyDate1, MyDate2, MyDate3, MyDate4);
As mentioned in comments, the IN operator is pretty specific when it comes to date fields (down to the last (milli)second). You can always use date time functions on the subquery, but BETWEEN is unusable:
SELECT * FROM MyTable WHERE date_trunc('hour', 'DateLiteral')
IN (date_trunc('hour', MyDate1), date_trunc('hour', MyDate2), date_trunc('hour', MyDate3), date_trunc('hour', MyDate4));
Some databases like Firebird have array datatype, which does exactly what you described. It is declared something like this:
alter table t1 add MyDate[4] date;
For what it's worth, the normalized design would be to store the dates as rows in a dependent table.
Storing multiple values in a single column is not a normalized design; normalization explicitly means each column has exactly one value.
You can make sure no more than four rows are inserted into the dependent table this way:
CREATE TABLE ThisManyDates (n INT PRIMARY KEY);
INSERT INTO ThisManyDates VALUES (1), (2), (3), (4);
CREATE TABLE UserDates (
User_ID INT REFERENCES Users,
n INT REFERENCES ThisManyDates,
Date_Value DATE NOT NULL,
PRIMARY KEY (User_ID, n)
);
However, this design doesn't allow you make the date values mandatory.
How about having 4 fields alongwith User ID (if you are sure, it wont exceed that)?
Create four date fields and store the dates in the fields. The date fields might be part of your user table, or they might be in some other table joined to the user table in a one-to-one relationship. It's your call.
Related
I am wondering what are the possibilities for storing this kind of data in an efficient way.
Lets say I have 100 kinds of different messages that I need to store. all messages has a common data like message Id, message name, sender, receiver, insert date etc. every kind of message has it's own unique columns that we wanna store and index (for quick queries).
We don't want to use 100 different tables, it will be impossible to work with.
The best way That I could come up with is to use 2-3 tables:
1. for the common data.
2. for the extra unique data when every column has a generic name like: foreign key, column1, column2....column20. there will be index on every column + the foreign key(20 indexes for 20 columns).
3. optional, metadata table to describe the generic columns for every unique message.
UPDATE:
lets say that I am a backbone for passing data, there are 100 different kinds of data(messages). I want to store every message that comes through me, but not as a bulk data because later I would like to query the data based on the unique columns of every different message type.
Is there a better way out there that I don't know about?
Thanks.
BTW the database is oracle.
UPDATE 2:
Does NoSQL database can give me a better solution?
The approach of 2 or 3 tables that you suggest seems reasonable for me.
An alternative way would be to just store all those unique columns in the same table along with common data. That should not prevent you from creating indexes for them.
Here is your answer:
You can store as much messages as you want against each message type
UPDATE:
This is what you exactly want. There are different messages type as you stated e.g. tracking, cargo or whatever. These categories will be stored in msg_type table. You can store as much categories in msg_type as you want.
Then say each msg_type has numerous messages that will be stored in Messages table. You can store here as much messages as you want without any limit
Here is your database SQL:
create table msg_type(
type_id number(14) primary key,
type varchar2(50)
);
create sequence msg_type_seq
start with 1 increment by 1;
create or replace trigger msg_type_trig
before insert on msg_type
referencing new as new
for each row
begin
select msg_type_seq.nextval into :new.type_id from dual;
end;
/
create table Messages(
msg_id number(14) primary key,
type_id number(14) constraint Messages_fk references CvCategories(type_id),
msg_date timestamp(0) default sysdate,
msg varchar2(3900));
create sequence Messages_seq
start with 1 increment by 1;
create or replace trigger Messages_trig
before insert on Messages
referencing new as new
for each row
begin
select Messages_seq.nextval into :new.msg_id from dual;
end;
/
Are tables with lots of columns indicative of bad design? For example say I have the following table that stores user information and user settings:
[Users table]
userId
name
address
somesetting1
...
somesetting50
As the site requires more settings the table gets larger. In my mind this table is normalized, all the settings are dependent on the userId.
I have a thing against tables with lots of columns it just seems wrong to me, but then I remembered that you can select what data to return from the table, so If the table is large I could still break it into several different objects in code. For example
[User object]
[UserSetting object]
and return only the data to fill those objects.
Is the above common practice, or are their other techniques that deal with tables with lots of columns that are more suitable to use?
I think you should use multiple tables like this:
[Users table]
userId
name
address
[Settings table]
settingId
userId
settingKey
settingValue
The tables are related by the userId column which you can use to retrieve the settings for the user you need to.
I would say that it is bad table design. If a user doesn't have an entry for 47 of those 50 settings then you will have a large number of NULL's in the table which isn't good practice and will also slow down performance (NULL's have to be handled in a special way).
Instead, have the following:
USER TABLE
Id,
FirstName
LastName
etc
SETTINGS
Id,
SettingName
USER SETTINGS
Id,
SettingId,
UserId,
SettingValue
You then have a many to many join, and eliminate NULL's
first, don't put spaces in table names! all the [braces] will be a real pain!
if you have 50 columns how meaningful will all that data be for each user? will there be lots of nulls? Most data may not even apply to any given user. Think 1 to 1 tables, where you break down the "settings" into logical groups:
Users: --main table where most values will be stored
userId
name
address
somesetting1 ---please note that I'm using "somesetting1", don't
... --- name the columns like this, use meaningful names!!
somesetting5
UserWidgets --all widget settings for the user
userId
somesetting6
....
somesetting12
UserAccounting --all accounting settings for the user
userId
somesetting13
....
somesetting23
--etc..
you only need to have a Users row for each user, and then a row in each table where that data applies to the given user. I f a user doesn't have any widget settings then no row for that user. You can LEFT join each table as necessary to get all the settings as needed. Usually you only need to work on a sub set of settings based on which part of the application that is running, which means you won't need to join in all of the tables, just the one or tow that you need at that time.
You could consider an attributes table. As long as your indexes are good, then you wouldn't have too much of a performance issue:
[AttributeDef]
AttributeDefId int (primary key)
GroupKey varchar(50)
ItemKey varchar(50)
...
[AttributeVal]
AttributeValId int (primary key)
AttributeDefId int (FK -> AttributeDef.AttributeDefId)
UserId int (probably FK to users table?)
Val varchar(255)
...
basically you're "pivoting" your table with many columns into 2 tables with less columns. You can write views and table functions around this structure to give you data for a group of related items or just a specific item, etc. You could also add other things to the attribute definition table to indicate required data elements, restrictions on the data elements, etc.
What's your thought on this type of design?
Use several tables with matching indexes to get the best SELECT speed. Use the indexes as a way to relate the information between tables using a JOIN.
For example I have a table which stores details about properties. Which could have owners, value etc.
Is there a good design to keep the history of every change to owner and value. I want to do this for many tables. Kind of like an audit of the table.
What I thought was keeping a single table with fields
table_name, field_name, prev_value, current_val, time, user.
But it looks kind of hacky and ugly. Is there a better design?
Thanks.
There are a few approaches
Field based
audit_field (table_name, id, field_name, field_value, datetime)
This one can capture the history of all tables and is easy to extend to new tables. No changes to structure is necessary for new tables.
Field_value is sometimes split into multiple fields to natively support the actual field type from the original table (but only one of those fields will be filled, so the data is denormalized; a variant is to split the above table into one table for each type).
Other meta data such as field_type, user_id, user_ip, action (update, delete, insert) etc.. can be useful.
The structure of such records will most likely need to be transformed to be used.
Record based
audit_table_name (timestamp, id, field_1, field_2, ..., field_n)
For each record type in the database create a generalized table that has all the fields as the original record, plus a versioning field (additional meta data again possible). One table for each working table is necessary. The process of creating such tables can be automated.
This approach provides you with semantically rich structure very similar to the main data structure so the tools used to analyze and process the original data can be easily used on this structure, too.
Log file
The first two approaches usually use tables which are very lightly indexed (or no indexes at all and no referential integrity) so that the write penalty is minimized. Still, sometimes flat log file might be preferred, but of course functionally is greatly reduced. (Basically depends if you want an actual audit/log that will be analyzed by some other system or the historical records are the part of the main system).
A different way to look at this is to time-dimension the data.
Assuming your table looks like this:
create table my_table (
my_table_id number not null primary key,
attr1 varchar2(10) not null,
attr2 number null,
constraint my_table_ak unique (attr1, att2) );
Then if you changed it like so:
create table my_table (
my_table_id number not null,
attr1 varchar2(10) not null,
attr2 number null,
effective_date date not null,
is_deleted number(1,0) not null default 0,
constraint my_table_ak unique (attr1, att2, effective_date)
constraint my_table_pk primary key (my_table_id, effective_date) );
You'd be able to have a complete running history of my_table, online and available. You'd have to change the paradigm of the programs (or use database triggers) to intercept UPDATE activity into INSERT activity, and to change DELETE activity into UPDATing the IS_DELETED boolean.
Unreason:
You are correct that this solution similar to record-based auditing; I read it initially as a concatenation of fields into a string, which I've also seen. My apologies.
The primary differences I see between the time-dimensioning the table and using record based auditing center around maintainability without sacrificing performance or scalability.
Maintainability: One needs to remember to change the shadow table if making a structural change to the primary table. Similarly, one needs to remember to make changes to the triggers which perform change-tracking, as such logic cannot live in the app. If one uses a view to simplify access to the tables, you've also got to update it, and change the instead-of trigger which would be against it to intercept DML.
In a time-dimensioned table, you make the strucutural change you need to, and you're done. As someone who's been the FNG on a legacy project, such clarity is appreciated, especially if you have to do a lot of refactoring.
Performance and Scalability: If one partitions the time-dimensioned table on the effective/expiry date column, the active records are in one "table", and the inactive records are in another. Exactly how is that less scalable than your solution? "Deleting" and active record involves row movement in Oracle, which is a delete-and-insert under the covers - exactly what the record-based solution would require.
The flip side of performance is that if the application is querying for a record as of some date, partition elimination allows the database to search only the table/index where the record could be; a view-based solution to search active and inactive records would require a UNION-ALL, and not using such a view requires putting the UNION-ALL in everywhere, or using some sort of "look-here, then look-there" logic in the app, to which I say: blech.
In short, it's a design choice; I'm not sure either's right or either's wrong.
In our projects we usually do it this way:
You have a table
properties(ID, value1, value2)
then you add table
properties_audit(ID, RecordID, timestamp or datetime, value1, value2)
ID -is an id of history record(not really required)
RecordID -points to the record in original properties table.
when you update properties table you add new record to properties_audit with previous values of record updated in properties. This can be done using triggers or in your DAL.
After that you have latest value in properties and all the history(previous values) in properties_audit.
I think a simpler schema would be
table_name, field_name, value, time, userId
No need to save current and previous values in the audit tables. When you make a change to any of the fields you just have to add a row in the audit table with the changed value. This way you can always sort the audit table on time and know what was the previous value in the field prior to your change.
i have an event calendar application with a sql database behind it and right now i have 3 tables to represent the events:
Table 1: Holiday
Columns: ID, Date, Name, Location, CalendarID
Table 2: Vacation
Columns: Id, Date, Name, PersonId, WorkflowStatus
Table 3: Event
Columns: Id, Date, Name, CalendarID
So i have "generic events" which go into the event tableand special events like holidays and vacation that go into these separate tables. I am debating consolidating these into a single table and just having columns like location and personid blank for the generic events.
Table 1: Event:
Columns : Id, Date, Name, Location, PersonId, WorkflowStatus
does anyone see any strong positives or negative to each option. Obviously there will be records that have columns that dont necessarily apply but it there is overlap with these three tables.
Either way you construct it, the application will have to cope with variant types. In such a situation I recommend that you use a single representation in the DBM because the alternative is to require a multiplicity of queries.
So it becomes a question of where you stick the complexity and even in a huge organization, it's really hard to generate enough events to worry about DBMS optimization. Application code is more flexible than hardwired schemata. This is a matter of preference.
If it were my decision, i'd condense them into one table. I'd add a column called "EventType" and update that as you import the data into the new table to specify the type of event.
That way, you only need to index one table instead of three (if you feel indexes are required), the data is all in one table, and the queries to get the data out would be a little more concise because you wouldn't need to union all three tables together to see what one person has done. I don't see any downside to having it all in one table (although there will probably be one that someone will bring up that i haven't thought of).
How about sub-typing special events to an Event supertype? This way it is easy to later add any new special events.
Data integrity is the biggest downside of putting them in one table. Since these all appear to be fields that would be required, you lose the ability to require them all by default and would have to write a trigger to make sure that data integrity was maintained properly (Yes, this must be maintained in the database and not, as some people believe, by the application. Unless of course you want to have data integrity problems.)
Another issue is that these are the events you need now and there may be more and more specialized events in the future and possibly breaking code for one type of event because you added another specialized field that only applies to something else is a big risk. When you make a change to add some required vacation information, will you be sure to check that it doesn't break the application concerning holidays? Or worse not error out but show information you didn't want? Are you going to look at the actual screen everytime? Unit testing just of code may not pick up this type of thing especially if someone was foolish enough to use select * or fail to specify columns in an insert. And frankly not every organization actually has a really thorough automated test process in place (it could be less risk if you do).
I personally would tend to go with Damir Sudarevic's solution. An event table for all the common fields (making it easy to at least get a list of all events) and specialized tables for the fields not held in common, making is simpler to write code that affects only one event and allowing the database to maintain its integrity.
Keep them in 3 separate tables and do a UNION ALL in a view if you need to merge the data into one resultset for consumption. How you store the data on disk need not be identical to how you need to consume the data so long as the performance is adequate.
As you have it now there are no columns that do not apply for any of the presented entities. If you were to merge the 3 tables into one you'd have to add a field at the very least to know which columns to expect to be populated and reduce your performance. Now when you query for a holiday alone you go to a subset of the data that you would have to sift through / index to get at the same data in a merged storage table.
If you did not already have these tables defined you could consider creating one table with the following signature...
create table EventBase (
Id int PRIMARY KEY,
Date date,
Name varchar(50)
)
...and, say, the holiday table with the following signature.
create table holiday (
Id int PRIMARY KEY,
EventId int,
Location varchar(50),
CalendarId int
)
...and join the two when you needed to do so. Choosing between this and the 3 separate tables you already have depends on how you plan on using the tables and volume but I would definitely not throw all into a single table as is and make things less clear to someone looking at the table definition with no other initiation.
Or combine the common fields and separate out the unique ones:
Table 1: EventCommon
Columns: EventCommonID, Date, Name
Table 2: EventOrHoliday
Columns: EventCommonID, CalendarID, isHoliday
Table3: Vacation
Columns: EventCommonID, PersonId, WorkflowStatus
with 1->many relationships between EventCommon and the other 2.
I need to make 100 or so tables. I have tables called PartStatsXXX and the tables to be made will all be called PartReviewXXX (they pair up with each other in a 1:n relationship).
Is it efficient to create one big table to store all product (product and part being the same term from a business perspective) reviews? Someone mentioned making a relationship from PartStatsXXX to PartsReview (one large table) with the value of XXX as part of the primary key from PartStatsXXX.
XXX is the name of the part type (eg battery, wiring loom, etc). So this will be varchar. Should I make a composite key? The part type wouldn't change names (though some part names can have multiple names depending on culture), but it's not really a candidate ID. It was then mentioned I could get several views for what I need depending on the value of XXX.
I hope this makes sense. What would be the best approach?
Thanks
Multi-table PartStatsXXX is a bad idea: hard to code properly or with a framework, harder to maintain, nightmare to query...
Use two tables: PartStats and PartsReview, with approriate keys and indexes for performance.
It is more efficient to create tables based on what you want to store in each one. You do not need 100 tables for 100 products. you need 1 table for all products.
So for your needs I would create 2 tables:
products
========
id INT
name VARCHAR
product_reviews
===============
id INT
product_id INT (foreign key to products.id)
rating INT (example column)
Unless you are storing different types of data for each product's reviews (i.e., each table has a different set of columns), using a different table per product will be creating an unnecessary nightmare.
As a general rule, you never want to have more than one table with the same set of columns. As already suggested, one table with a "product_id" column is the way to go.
If you want to save yourself some pain in a quick-and-dirty way, use two tables.
CREATE TABLE PartStats (
...,
PartType VARCHAR(255),
...
);
CreateTable PartReview (
...
PartType VARCHAR(255),
...
);
and then join them up via
SELECT ...
FROM PartStats ps JOIN PartReview pr
ON ps.PartType = pr.PartType;
This gets you out from having hundreds of tables, but sets you up for a different problem: Redundant data (PartType) that can get out of sync. A typo in a PartType can yield an orphaned review.
The solution here, assuming that you can have more than one PartStats entry for a given PartType, is to add a third table to the sole older of PartType names.
CREATE TABLE PartType (
ID INT ...,
PartType VARCHAR(255),
PRIMARY KEY (ID)
);
and arrange for PartStats and PartReview to use the ID of a PartType. For example,
CREATE TABLE PartStats (
...,
PartType_ID INT REFERENCES PartType(ID),
...
);
CREATE TABLE PartReviews (
...
PartType_ID INT REFERENCES PartType(ID),
...
);
This will prevent your making a PartStats or a PartReview for a non-existent PartType.
If query performance becomes an issue, adding secondary indexes on PartType_ID will help.
I can recommend you a couple of not bad books on database design (several months ago I decided to improve my database design skills so I took a look at several different books and chose these two):
1) Pro SQL Server 2008 Relational Database Design and Implementation (c) Louis Davidson
2) Relational database design clearly explain (c) Jan Harrington
Good luck!