I understand that a view is only a saved SQL query that can be considered as a virtual table. However, it seems to me that there is not much difference between creating a new table using the SELECT INTO statement and creating a view. What are some of the major distinctions between the two methods? Under what situations would you use one or the other?
Let's start with aknowledging that by what you say about views, you are not talking about indexed views. Because these are actually implemented with existing tables in the background. So, we are talking about non-indexed views.
The two methods are very different - in fact, they have nothing in common. It is strange, because you both mention:
"view = only saved sql query"
"into = creating new table"
Isn't this a contradiction? The result of select into is actually a table, a view is not.
Not sure why are you asking about this or what are you trying to accomplish. In my experience, I use select into to fastly create logically temporary tables that have the same columns with the original without having to type all columns. This method of creating tables is generally inferior to the create table command and a subsequent insert, because no indexes and other stuff can be made - thus its use in adhoc queries or as a temporary entity.
Related
I have an existing database in MS SQL server and want to rename some tables and columns because the names currently used aren't accurate to what it represents.
I have multiple web and desktop applications that access the database, using Entity Framework (code first). Too many to update in one go and cannot afford for all apps to start working.
I was thinking it was nice is SQL server allowed a 'permanent' alias for tables and columns but I don't think this feature exists.
Or I was wondering if there was a way in EF to have two names for the same property?
For the tables, you could rename them and then create a synonym with the old name pointing to the new name.
For the columns, changing their name will break your application.You could create computed columns with the old name as well, that simply display the value of the new named column though (but this seems a little silly).
Note, however, that a computed column cannot reference another computed column, so you would have to duplicate the column in its entirety. That could lead to problems down the line if you don't update the definition of both columns.
A view containing a simple select statement acts exactly like a table. You really need to fix this properly across the database and applications. However if you want to go the view route, I suggest you do this:
Say you have a table called MyTable that you rename TheTable and with a column called MyColumn that you want to rename to TheColumn
Create a schema, say, new
Move the original table into it with this ALTER SCHEMA new TRANSFER MyTable
Rename the table and column.
Now you have a table called new.TheTable with a column called TheColumn. Everything is broken
Lastly, create a view that looks just like the old table
CREATE VIEW dbo.MyTable
AS
SELECT Column1, Column2, Column3, TheColumn As MyColumn
FROM new.TheTable;
Now everything works again.
All your fixed 'new' tables are in the new schema
However now everything is extra complicated
This is basically an illustration that you should just fix it properly across the whole app one at a time with careful change management. Definitely don't complicate it with triggers
Since you are using code first with multiple web and desktop applications, you are likely managing database changes from one place through migrations and ignoring changes other places.
You can create an empty migration and add code that will change the table name and column names to what you want. The migration should then create a view that will select from that table with the original table and column names. When you apply this migration, everything should still be working as normal from all applications. There are no model changes since you didn’t touch the model classes. Inserts, updates, and deletes will still happen through the view. There is no need for potentially buggy triggers or synonyms on the table in this option.
Now that you have the table changed, you can focus on the application code. If it helps, you can add annotations over the column and table names and start refactoring the code. You need to make sure you don’t make model changes that will break the other apps. If apps ignore model changes, you can get away with adding annotations over the columns and classes on all the apps before refactoring. You can get rid of the view sooner this way.
I'm looking at the best practice approach here. I have a web page that has several drop down options. The drop downs are not related, they are for misc. values (location, building codes, etc). The database right now has a table for each set of options (e.g. table for building codes, table for locations, etc). I'm wondering if I could just combine them all into on table (called listOptions) and then just query that one table.
Location Table
LocationID (int)
LocatValue (nvarchar(25))
LocatDescription (nvarchar(25))
BuildingCode Table
BCID (int)
BCValue (nvarchar(25))
BCDescription (nvarchar(25))
Instead of the above, is there any reason why I can't do this?
ListOptions Table
ID (int)
listValue (nvarchar(25))
listDescription (nvarchar(25))
groupID (int) //where groupid corresponds to Location, Building Code, etc
Now, when I query the table, I can pass to the query the groupID to pull back the other values I need.
Putting in one table is an antipattern. These are differnt lookups and you cannot enforce referential integrity in the datbase (which is the ciorrect place to enforce it as applications are often not the only way data gets changed) unless they are in separate tables. Data integrity is FAR more important than saving a few minutes of development time if you need an additonal lookup.
If you plan to use the values later in some referencing FKeys - better use separate tables.
But why do you need "all in one" table? Which problem it solves?
You could do this.
I believe that is your master data and it would not be having any huge amounts of rows that it might create and performance problems.
Secondly, why would you want to do it once your app is up and running. It should have thought about earlier. The tables might be used in a lot of places and it's might be a lot of coding and most importantly testing.
Can you throw further light into your requirements.
You can keep them in separate tables and have your stored procedure return one set of data with a "datatype" key that signifies which set of values go with what option.
However, I would urge you to consider a much different approach. This suggestion is based on years of building data driven websites. If these drop-down options don't change very often then why not build server-side include files instead of querying the database. We did this with most of our websites. Think about it, each time the page is presented you query the database for the same list of values... that data hardly ever changes.
In cases when that data did have the tendency to change, we simply added a routine to the back end admin that rebuilt the server-side include file whenever an add, change or delete was done to one of the lookup values. This reduced database I/O's and spead up the load time of all our websites.
We had approximately 600 websites on the same server all using the same instance of SQL Server (separate databases) our total server database I/O's were drastically reduced.
Edit:
We simply built SSI that looked like this...
<option value="1'>Blue</option>
<option value="2'>Red</option>
<option value="3'>Green</option>
With single table it would be easy to add new groups in favour of creating new tables, but for best practices concerns you should also have a group table so you can name those groups in the db for future maintenance
The best practice depends on your requirements.
Do the values of location and building vary frequently? Where do the values come from? Are they imported from external data? Do other tables refer the unique table (so that I need a two-field key to preper join the tables)?
For example, I use unique table with hetorogeneus data for constants or configuration values.
But if the data vary often or are imported from external source, I prefer use separate tables.
I know in SQL Server you can create indexes on a view, then the view saves the data from the underlying table. Then you can query the view. But, why I need to use view instead of table?
You may want to use a view to simplify on queries. In our projects, the consensus is on using views for interfaces, and especially "report interfaces".
Imagine you've got a client table, and the manager wants a report every morning with the client's name, and their account balance (or whatever). If you code your report against the table, you're creating a strong link between your report and your table, making later changes difficult.
On the other hand if your report hits a view, you can twist the database freely; as long as the view is the same the report works, the manager is happy and you're free to experiment with the database. You want to separate client metadata from the main client table? go for it, and join the two tables in the view. You want to denormalize the cart info for the client? no problem, the view can adapt...
To be honest, it's my view as a programmer but other uses will certainly be found by db gurus :)
One advantage of using an indexed view is for ordering results of 2 or more columns, where the columns are in different tables. ie, have a view which is the result of table1 and table2 sorted by table1.column1, table2.column2. You could then create an index on column1, column2 to optimise that query
A table is where the data is physically stored.
A view is where tables are summarized or grouped to make groups of tables easier to use.
An indexed view allows a query to use a view, and not need to get data from the underlying table, as the view already has the data, thus increasing performance.
You could not achieve the same result with just tables, without denormalizing your database, and thus potentially creating other issues.
Basically, use a view:
When you use the same complex query on many tables, multiple times.
When new system need to read old table data, but doesn't watch to change their perceived schema.
Indexed Views can improve performance by creating more specific index without increasing redundancy.
A view is simply a SELECT statement that has been given a name and stored in a database. The main advantage of a view is that once it's created, it acts like a table for any other SELECT statements that you want to write.
The select statement for the view can reference tables, other views and functions.
You can create an index on the view (indexed view) to improve performance. An indexed view is self-updating, immediately reflecting changes to the underlying tables.
If your indexed view only selects columns from one table, you could just as well place the index on that table and query that table directly, the view would only cause overhead for your database. However, if your SELECT statement covers multiple tables with joins etc. than you could gain a performance boost by placing an index on the view.
I am using SQL Server 2005 Express and Visual Studio 2008.
I have a database which has a table with 400 Columns. Things were (just about manageable) until I had to perform bi-directional sync between several databases.
I am wondering what arguments are for and against using 400 column database or 40 table database are?
The table in not normalised and comprises of mainly nvarchar(64) columns and some TEXT columns. (there are no datatypes as it was converted from text files).
There is one other table that links to this table and is a 1-1 relationship (i.e one entry relates to one entry in the 400 column table).
The table is a list files that contained parameters that are "plugged" into a application.
I look forward to your replies.
Thank you
Based on your process description I would start with something like this. The model is simplified, does not capture history, etc -- but, it is a good starting point. Note: parameter = property.
- Setup is a collection of properties. One setup can have many properties, one property belongs to one setup only.
- Machine can have many setups, one setup belongs to one machine only.
- Property is of a specific type (temperature, run time, spindle speed), there can be many properties of a certain type.
- Measurement and trait are types of properties. Measurement is a numeric property, like speed. Trait is a descriptive property, like color or some text.
For having a wide table:
Quick to report on as it's presumably denormalized and so no joins are needed.
Easy to understand for end-consumers as they don't need to hold a data model in their heads.
Against having a wide table:
Probably need to have multiple composite indexes to get good query performance
More difficult to maintain data consistency i.e. need to update multiple rows when data changes if that data is on multiple rows
As you're having to update multiple rows and maintain multiple indexes, concurrent performance for updates may become an issue as locks escalate.
You might end up with records with loads of nulls in columns if the attribute isn't relevant to the entity on that row which can make handling results awkward.
If lazy developers do a SELECT * from the table you end up dragging loads of data across the network, so you generally have to maintain suitable subset views.
So it all really depends on what you're doing. If the main purpose of the table is OLAP reporting and updates are infrequent and affect few rows then perhaps a wide, denormalized table is the right thing to have. In an OLTP environment then it's probably not and you should prefer narrower tables. (I generally design in 3NF and then denormalize for query performance as I go along.)
You could always take the approach of normalizing and providing a wide-view for readers if that's what they want to see.
Without knowing more about the situation it's not really possible to say more about the pros and cons in your particular circumstance.
Edit:
Given what you've said in your comments, have you considered just having a long & skinny name=value pair table so you'd just have UserId, PropertyName, PropertyValue columns? You might want to add in some other meta-attributes into it too; timestamp, version, or whatever. SQL Server is quite efficient at handling these sorts of tables so don't discount a simple solution like this out-of-hand.
I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).