Best Practices for Computed Column in SQL Server - sql-server

I'm working with a user table and want to put a "end of probational period date". Basically, each new user has 2 full months from when they join as part of their probation period. I saw that I can put a formula in the column for my user table, but I'm wondering if I should have a script that updates this instead or if this is an acceptable time to use computed columns. I access this table for various things, and will occasionally update the users' row based on performance milestone achievements. The Application Date will never change/be updated.
My question is: Is using the computed column a good practice in this situation or will it recompute each time I update that row (even though I'm not going to update the App Date)? I don't want to create more overhead when I update the row in the future.
Formula I'm using in the column definition for the Probation End Date:
(dateadd(day,(-1),dateadd(month,(3),dateadd(day,(1)-datepart(day,[APP_DT]),[APP_DT]))))

Seeing that this date most likely will never change once it's set, it's probably not a good candidate for a computed column.
After all: once you insert a row into that table, you can easily calculate that "end of probation period" date right there and then (e.g. in a trigger), and once set, that date won't ever change.
So while you can definitely do it this way, I would probably prefer to use a AFTER INSERT trigger (or a stored procedure for the INSERT operation) that just calculates it once, and then stores that date.
Also, just as a heads-up: a computed column with just the formula is being calculated every time to access it - just be aware of that. That is, unless you specify the PERSISTED keyword, in that case, the result is stored along side the other data in the row, and this would be a much better fit here - again, since that value, once calculated, is not bound to change ever again.

If you want to later extend someone's probation period without having to change their application date, then a computed column is NOT the way to go. Why not just use a DEFAULT constraint for both columns?
USE tempdb;
GO
CREATE TABLE dbo.foo
(
MemberID INT IDENTITY(1,1),
JoinDate DATE NOT NULL DEFAULT SYSDATETIME(),
ProbationEndDate NOT NULL DEFAULT
DATEADD(DAY, -1, DATEADD(MONTH, DATEDIFF(MONTH,0,SYSDATETIME())+3, 0))
);
INSERT dbo.foo DEFAULT VALUES;
SELECT MemberID, JoinDate, ProbationEndDate FROM dbo.foo;
Results:
MemberID JoinDate ProbationEndDate
-------- ---------- ----------------
1 2013-04-05 2013-06-30
(Notice I used a slightly less convulted approach to get the end of the month for two months out.)

There's no overhead when you insert data; only when you read the column the values are computed for this column. So I'd say your approach is correct.

Related

How to check the data in sql server if it is there for everyminute

I have table with
fields: Id, ChId, ChValue, ChLoggingDate
Now the data will be saved for everyminute in to the database. I need a query to check if the data exists for every minute in the table through out the year for a particular weekday. That is, all Monday's in 2013 if it has complete data for that day calculate the arithmetic mean for the year of Monday's.
YOu will need a table - or a table value function - to create a timestamp for every minute, then you can join with that and check where the original table has no data.
Only way - any query otherwise can only work with the data it has.
This is a common approach for any reporting.
http://www.kodyaz.com/t-sql/create-date-time-intervals-table-in-sql-server.aspx
explains how, you just must expand it to a times table. If you separate date and time you can get away with a time table (00:00:00 to 24:59:59) and a date table separately.

Help with hotel queries

Hello I have a database in Microsoft SQL Server with tables relevant to a reservation system for a hotel/villa and need help creating a few queries to obtain relevant data:
To be able to find out a list of guests checking out on a specific date, grouped by villa type and providing a total number for that day (i.e. a count).
For that query I think I'd have to use 2 relevant tables a guest reservation and reservation table:
create table guest_reservation(confirm_no int,
agent_id int,
g_name varchar (30),
g_phone varchar (10));
create table reservation(
confirm_no int,
credit_card_no char (16),
res_checkin_date datetime,
res_checkout_date datetime,
default_villa_type char (1),
price_plan char (1));
I thought using a query like this would help, but it didn't seem to:
SELECT g_name, villa_type, COUNT(*)
FROM guest_reservation, reservation
WHERE guest_reservation.confirm_no = reservation.confirm_no
AND res_checkout_date = ‘insert date for when you would want to check out here’
GROUP BY villa_type;
Ideas/help?
EDIT: I think I figured out the 1st question...
Another query I wanted help on was that if a guest wanted a certain type of room then if that type of room would be available on the dates they wanted to stay on.
I used JUST the Reservation table but I'm not sure if that quite would do what I want, here's what I currently had:
Select villa_type from reservation
where res_check_in_date not between '2011-10-08' and '2011-10-09'
and res_check_out_date not between '2011-10-08' and '2011-10-09'
Due to the nature of time spans, you may find it helpful, nay, necessary, to create a Calendar table which is simply a static list of days in the calendar. These are almost always created in BI projects for data analysis.
What this object allows you to do is join from you Reservation table to the Calendar table, giving you a list of distinctive dates which encompass the time span. Here's a code sample:
inner join calendar
on calendar.date between reservation.res_checkin_date and reservation.res_checkout_date
For example, If I'm checking in today (4/12) and checking out on Thursday (4/14), joining to the calendar table will product 3 distinct rows, one each for 4/12, 4/13, 4/14. In terms of checking availability, this is a much clearer picture of who occupies what, when.
Additionally, there are some code nuances you may want to be aware of, such as using a DATE instead of DATETIME, to eliminate the possibility of times producing unexpected results. If you need a full DATETIME type, there are ways to efficiently query the DATE in a DATETIME.

What is the best way to keep changes history to database fields?

For example I have a table which stores details about properties. Which could have owners, value etc.
Is there a good design to keep the history of every change to owner and value. I want to do this for many tables. Kind of like an audit of the table.
What I thought was keeping a single table with fields
table_name, field_name, prev_value, current_val, time, user.
But it looks kind of hacky and ugly. Is there a better design?
Thanks.
There are a few approaches
Field based
audit_field (table_name, id, field_name, field_value, datetime)
This one can capture the history of all tables and is easy to extend to new tables. No changes to structure is necessary for new tables.
Field_value is sometimes split into multiple fields to natively support the actual field type from the original table (but only one of those fields will be filled, so the data is denormalized; a variant is to split the above table into one table for each type).
Other meta data such as field_type, user_id, user_ip, action (update, delete, insert) etc.. can be useful.
The structure of such records will most likely need to be transformed to be used.
Record based
audit_table_name (timestamp, id, field_1, field_2, ..., field_n)
For each record type in the database create a generalized table that has all the fields as the original record, plus a versioning field (additional meta data again possible). One table for each working table is necessary. The process of creating such tables can be automated.
This approach provides you with semantically rich structure very similar to the main data structure so the tools used to analyze and process the original data can be easily used on this structure, too.
Log file
The first two approaches usually use tables which are very lightly indexed (or no indexes at all and no referential integrity) so that the write penalty is minimized. Still, sometimes flat log file might be preferred, but of course functionally is greatly reduced. (Basically depends if you want an actual audit/log that will be analyzed by some other system or the historical records are the part of the main system).
A different way to look at this is to time-dimension the data.
Assuming your table looks like this:
create table my_table (
my_table_id number not null primary key,
attr1 varchar2(10) not null,
attr2 number null,
constraint my_table_ak unique (attr1, att2) );
Then if you changed it like so:
create table my_table (
my_table_id number not null,
attr1 varchar2(10) not null,
attr2 number null,
effective_date date not null,
is_deleted number(1,0) not null default 0,
constraint my_table_ak unique (attr1, att2, effective_date)
constraint my_table_pk primary key (my_table_id, effective_date) );
You'd be able to have a complete running history of my_table, online and available. You'd have to change the paradigm of the programs (or use database triggers) to intercept UPDATE activity into INSERT activity, and to change DELETE activity into UPDATing the IS_DELETED boolean.
Unreason:
You are correct that this solution similar to record-based auditing; I read it initially as a concatenation of fields into a string, which I've also seen. My apologies.
The primary differences I see between the time-dimensioning the table and using record based auditing center around maintainability without sacrificing performance or scalability.
Maintainability: One needs to remember to change the shadow table if making a structural change to the primary table. Similarly, one needs to remember to make changes to the triggers which perform change-tracking, as such logic cannot live in the app. If one uses a view to simplify access to the tables, you've also got to update it, and change the instead-of trigger which would be against it to intercept DML.
In a time-dimensioned table, you make the strucutural change you need to, and you're done. As someone who's been the FNG on a legacy project, such clarity is appreciated, especially if you have to do a lot of refactoring.
Performance and Scalability: If one partitions the time-dimensioned table on the effective/expiry date column, the active records are in one "table", and the inactive records are in another. Exactly how is that less scalable than your solution? "Deleting" and active record involves row movement in Oracle, which is a delete-and-insert under the covers - exactly what the record-based solution would require.
The flip side of performance is that if the application is querying for a record as of some date, partition elimination allows the database to search only the table/index where the record could be; a view-based solution to search active and inactive records would require a UNION-ALL, and not using such a view requires putting the UNION-ALL in everywhere, or using some sort of "look-here, then look-there" logic in the app, to which I say: blech.
In short, it's a design choice; I'm not sure either's right or either's wrong.
In our projects we usually do it this way:
You have a table
properties(ID, value1, value2)
then you add table
properties_audit(ID, RecordID, timestamp or datetime, value1, value2)
ID -is an id of history record(not really required)
RecordID -points to the record in original properties table.
when you update properties table you add new record to properties_audit with previous values of record updated in properties. This can be done using triggers or in your DAL.
After that you have latest value in properties and all the history(previous values) in properties_audit.
I think a simpler schema would be
table_name, field_name, value, time, userId
No need to save current and previous values in the audit tables. When you make a change to any of the fields you just have to add a row in the audit table with the changed value. This way you can always sort the audit table on time and know what was the previous value in the field prior to your change.

Database name convention: DATETIME column

What is your naming convention for DATETIME columns (in my case, using MS SQL Server)
For a column that stores when the row was created CreatedDatetime makes sense, or LastModifiedDatetime.
But for a simple table, let's say one called Event, would you create columns called:
TABLE Event
====================================================
EventID, // Primary key
EventDatetime, // When the event is happening
EventEnabled // Is the event is on
or these column names
TABLE Event
====================================================
ID, // Primary key
Datetime, // When the event is happening
Enabled // Is the event is on
If you'd use neither convention: Please provide the column name you would use.
I normally name DATETIME columns as ACTION_WORD_on: created_on, completed_on, etc.
The ACTION_WORD defines what the column represents, and the suffix (_on) indicates that the column represents time.
Other suffixes (or even prefixes) may be used to specify the data type (_at, _UTC, when_, etc).
Be descriptive. Be consistent.
Why call it EventDateTime, when you don't also use EventIDInt, or EventEnbaledVarchar? Why inlcude the data type in the column name? (My rule of thumb is, if they're accessing data in a table, they better know what the column data types are, 'cause otherwise they don't know what they're working with.)
These days I prefer what I think of as descriptive column names, such as:
CreateDate
DateCreated
CreatedAt
CreatedOn (if there's no time portion)
AddedOn (might be semanitcally more appropriate, depending on the data)
Picking a "label" and using it consistantly in every table that requires that kind data is also a good thing. For example, having a "CreateDate" column in (almost) every table is fine, because then you will always know which column in every table will tell you when a row was created. Don't get hung up with the "but they all have to have unique names" argument; if you're writing a query, you had better know which tables you're pulling each column from.
--Edit--
I just recalled an exception I've done in the past. If a DateTime (or SmallDateTime) column will contain no time portion, just the date, as a "reminder" I'd put "Date" in the column name, such as "BilledDate" instead of "Billed" or "BilledOn". This shouldn't apply when tracking when rows were added, since you'd want the time as well.
The name should communicate what Business meaning of the data is in the column... "DateTime" is just the Type of the data. Is it when the event happened? when it was recorded? when it was stored in the DB? When the data was last modified?
If it efficiently communicates the meaning of what the column contains, the name is fine. "DateTime" is not fine. "EventDateTime" is only very slightly better. If the table holds events, then any datetime field in the table is an EventDateTime (It records some datetime related to the event). Although if there's only one datetime column in an "Events" table, then EventDateTime implies that it's when the event happened, so that's probably ok.
Choose or select the name so it communicates the meaning of the value...
Given edited question, some suggested names might be:
Occurred, or OccurredDateTime, or OccurredUTC, (or OccurredLocal), or, if events in your business model have duration, then perhaps StartedUtc, or BeganUtc, or InitiatedUtc, etc.
I prefer to create columns in the second form--although I'd probably want a more descriptive name than Datetime, depending on what its use would be.
Edit: In this sort of situation, I might actually go with a hybrid for that single field, and make it 'EventDate', 'StartDate', or something similar.
Maybe that's just me, but I don't believe you should name your columns with data types, neither replicate the table name all over the fields.
I would avoid using datatypes for column names (a DATETIME column called Datetime), so I vote for the first option.
I'd call the column HappensAt, because the row describe an event and the attribute (column) in question details when it happens. As a general rule I try to name my tables with singular nouns and my attributes with phrases that can be used to read, like
tablename(key) columname columnvalue
So I would then be able to say
event(131) HappensAt Dec 21, 2009, 21:30
However this isn't an inviolable rule. I'd still record the date someone was born in a BirthDate column, not a WasBornOn column. You have to bear in mind the common usages of natural language when you name things. Strive for natural usage and the rest will follow. Follow rules blindly and your readers will struggle for comprehension.
there are many good answers here, so I won't duplicate. But remember don't ever name a column a reserved word!!!
also, I really like the column names in option 1.
I would likely use something like "WhenRaisedUtc".

Fixed-size array database field

I need to store several date values in a database field. These values will be tied to a "User" such that each user will have their own unique set of these several date values.
I could use a one-to-many relationship here but each user will have exactly 4 date values tied to them so I feel that a one-to-many table would be overkill (in many ways e.g. speed) but if I needed to query against them I would need those 4 values to be in different fields e.g. MyDate1 MyDate2 ... etc. but then the SQL command to fetch it out would have to check for 4 values each time.
So the one-to-many relationship would probably be the best solution, but is there a better/cleaner/faster/whatever another way around? Am I designing it correctly?
The platform is MS SQL 2005 but solution on any platform will do, I'm mostly looking for proper db designing techniques.
EDIT: The 4 fields represent 4 instances of the same thing.
If you do it as four separate fields, then you don't have to join. To Save the query syntax from being too horrible, you could write:
SELECT * FROM MyTable WHERE 'DateLiteral' IN (MyDate1, MyDate2, MyDate3, MyDate4);
As mentioned in comments, the IN operator is pretty specific when it comes to date fields (down to the last (milli)second). You can always use date time functions on the subquery, but BETWEEN is unusable:
SELECT * FROM MyTable WHERE date_trunc('hour', 'DateLiteral')
IN (date_trunc('hour', MyDate1), date_trunc('hour', MyDate2), date_trunc('hour', MyDate3), date_trunc('hour', MyDate4));
Some databases like Firebird have array datatype, which does exactly what you described. It is declared something like this:
alter table t1 add MyDate[4] date;
For what it's worth, the normalized design would be to store the dates as rows in a dependent table.
Storing multiple values in a single column is not a normalized design; normalization explicitly means each column has exactly one value.
You can make sure no more than four rows are inserted into the dependent table this way:
CREATE TABLE ThisManyDates (n INT PRIMARY KEY);
INSERT INTO ThisManyDates VALUES (1), (2), (3), (4);
CREATE TABLE UserDates (
User_ID INT REFERENCES Users,
n INT REFERENCES ThisManyDates,
Date_Value DATE NOT NULL,
PRIMARY KEY (User_ID, n)
);
However, this design doesn't allow you make the date values mandatory.
How about having 4 fields alongwith User ID (if you are sure, it wont exceed that)?
Create four date fields and store the dates in the fields. The date fields might be part of your user table, or they might be in some other table joined to the user table in a one-to-one relationship. It's your call.

Resources