SQL Server - Does column order matter?

SQL Server - Does column order matter? - sql-server

In terms of performance and optimizations:
When constructing a table in SQL Server, does it matter what order I put the columns in?
Does it matter if my primary key is the first column?
When constructing a multi-field index, does it matter if the columns are adjacent?
Using ALTER TABLE syntax, is it possible to specify in what position I want to add a column?
If not, how can I move a column to a difference position?

In SQL Server 2005, placement of nullable variable length columns has a space impact - placing nullable variable size columns at the end of the definition can result in less space consumption.
SQL Server 2008 adds the "SPARSE" column feature which negates this difference.
See here.

I would say the answer to all those questions is NO, altough my experience with MS-SQL goes as far as SQL2000. Might be a different story in SQL2005

For the fourth bullet: No you can't specify where you want to add the column. Here is the syntax for ALTER TABLE: https://msdn.microsoft.com/en-us/library/ms190273.aspx
In MySQL they offer an ALTER TABLE ADD ... AFTER ... but this doesn't appear in T-SQL.
If you want to reorder the columns you'll have to rebuild the table.
Edit: For your last last bullet point, you'll have to DROP the table and recreate it to reorder the columns. Some graphical tools that manipulate SQL databases will do this for you and make it look like you're reordering columns, so you might want to look into that.

No to the first 3 because the index will hold the data and no the last once also

Column order does not matter while creating a table. We can arrange the columns while retrieving the data from the database. Primary key can be set to any column or combination of columns.

For the first bullet:
Yes, column order does matter, at least if you are using the deprecated BLOBs image, text, or ntext, and using SQL Server <= 2005.
In those cases, you should have those columns at the 'end' of the table, and there is a performance hit every time you retrieve one.
If you're retrieving the data from such a table using a DAL, this is the perfect place for the Lazy Load pattern.

Related

Why can't columnar databases like Snowflake and Redshift change the column order?

I have been working with Redshift and now testing Snowflake. Both are columnar databases. Everything I have read about this type of databases says that they store the information by column rather than by row, which helps with the massive parallel processing (MPP).
But I have also seen that they are not able to change the order of a column or add a column in between existing columns (don't know about other columnar databases). The only way to add a new column is to append it at the end. If you want to change the order, you need to recreate the table with the new order, drop the old one, and change the name of the new one (this is called a deep copy). But this sometimes can't be possible because of dependencies or even memory utilization.
I'm more surprised about the fact that this could be done in row databases and not in columnar ones. Of course, there must be a reason why it's not a feature yet, but I clearly don't have enough information about it. I thought it was going to be just a matter of changing the ordinal of the tables in the information_schema but clearly is not that simple.
Does anyone know the reason of this?

Generally, column ordering within the table is not considered to be a first class attribute. Columns can be retrieved in whatever order you require by listing the names in that order.
Emphasis on column order within a table suggests frequent use of SELECT *. I'd strongly recommend not using SELECT * in columnar databases without an explicit LIMIT clause to minimize the impact.
If column order must be changed you do that in Redshift by creating a new empty version of the table with the columns in the desired order and then using ALTER TABLE APPEND to move the data into the new table very quickly.
https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE_APPEND.html

The order in which the columns are stored internally cannot be changed without dropping and recreating them.
Your SQL can retrieve the columns in any order you want.
General requirement to have columns listed in some particular order is for the viewing purpose.
You could define a view to be in the desired column order and use the view in the required operation.
CREATE OR REPLACE TABLE CO_TEST(B NUMBER,A NUMBER);
INSERT INTO CO_TEST VALUES (1,2),(3,4),(5,6);
SELECT * FROM CO_TEST;
SELECT A,B FROM CO_TEST;
CREATE OR REPLACE VIEW CO_VIEW AS SELECT A,B FROM CO_TEST;
SELECT * FROM CO_VIEW;
Creating a view to list the columns in the required order will not disturb the actual table underneath the view and the resources associated with recreation of the table is not wasted.

In some databases (Oracle especially) ordering columns on table will make difference in performance by storing NULLable columns at the end of the list. Has to do with how storage is beiing utilized within the data block.

Add column VS Recreating table

I was going through some SQL Scripts a coworker had wrote to upgrade a table to a newer version. It was simply adding a new column to the table. However, instead of a
ALTER TABLE [Table] ADD [Column] [DataType]
statement, he instead made a copy of the table with the new column, repopulated it with the existing data, deleted the old table, renamed the new one to the old table, and then re-added all the indexes and relationships.
My question is, is there any benefit to doing this rather than the simple Alter statement or is there any difference?
I can't imagine all that work for practically no difference other than the column's ordinal position being in the desired place.

When you use the SSMS GUI, it will sometimes take this approach. One possible reasons for doing it this way is "Inserting" a column rather than "appending" a column. (if you want the column to appear before some already existing columns.) In that case, adding a column won't work.
Essentially, any time a simple addition of a new column to the end of the table isn't what you're looking for. But my guess is that he used the GUI to add the column and chose the "generate SQL script" option.

Just a couple of differences on how that would be different. Hope this helps!
Adding new column to existing table:
Pros:
No need to recreate indices and constraints on the existing table.
Data on existing column remains intact.
Cons:
Huge table that has millions of records will need to be updated for
the new column.
Recreating a New Table:
Pros:
No need to worry for any kind of limits (number of columns, total
size of table) in the RDBMS
Cons:
Recreate indices and constraints
Reload data for all columns

Is there any way to index a varchar column as a datetime?

I'm querying a varchar column that contains all valid datetimes. In the query, I cast the column to datetime.
Keep in mind, changing the column's datatype definition to datetime is not an option. Just trust me, it's not an option.
Through query analysis, I see that the query would be much faster if I could add an index on this column. But it's varchar. If I added an index, it would index based on it's varchar value, right? But I'd want it to index on the datetime value, yes?
Is there anything I can do here for an index?

Valid data is the first priority, because any approach that converts the data to DATETIME will return a value of January 1, 1900 at midnight when the value doesn't match an accepted format.
SQL Server doesn't have function based indexes, so your options are:
A computed column, assuming you can use ALTER TABLE statements to add columns to the table. You could index this column after that.
An indexed view (AKA materialized view), assuming your data meets the requirements
A staging table, cloned from the original (and tweaked if you like) where you import data from the existing table so you can index and manipulate it

Have you considered building an indexed view on top of these generated tables? You could define your varchar-to-datetime conversion in the view, and then index that resulting column.
See Improving Performance with SQL Server 2005 Indexed Views for more information. I'm not an expert on using indexed views, I've only read about them. So I'll leave it up to you to figure out if it will be the right solution in your case.

SQL Server Add Column

I want to add a column to one of my tables in SQL Server. I don't want it to be at the end of the column listing in the table...I actually want it to be somewhere else (location wise) in the table. Is there another option besides dropping and rebuilding (populating) the table to accomplish this? I obviously don't want ot lose any of my data, but I would prefer it not have to have the column at the end of the table definition.
Thanks,
S

Column order in the table is irrelevant -- it's purely cosmetic.
There is no extension to ALTER TABLE that allows you to specify the ordinal position of a new column (either for adding a new column or moving an existing column).
For more on the subject, see:
Change Order of Column In Database Tables

Short answer:
I agree with OMG Ponies, column order isn't important. If you don't have a clustered index, rather drop and recreate the table than run an ALTER TABLE x ADD col.
Long answer:
If your table has a fair bit of data (50Mb comes to mind) then you will be better off recreating the table rather than ALTER TABLE x ADD col The data page allocation plan for the table is calculated at table creation time so when you add a column, SQL Server will typically put your new column's data in separate pages and put forward pointers from your existing data pages to the new data pages for the column you added. If you're going to use the new column extensively then your table IO will be quite poor since reading even 1 row will require reading at least 2 pages. Table scans will also perform poorly since forward pointers will always be followed, causing table scans that are normally sequential to jump back and forth on your disk during a read.
In this case it's better to rename the existing table, recreate your table with the new column, insert into table_name select col1, col2, 'null or default for new col', col3 from temp_renamed_table and finally drop the old table that you renamed. The data pages will be much better organised and your IO will be faster despite looking the same from a SQL developer's point of view than when ALTER TABLE is used. If you have a clustered index the table will be reorganised when you add the column and page splits are less likely. You could also run ALTER TABLE x REBUILD if you have SQL Server 2008, don't have a clustered index and lots of time when users aren't using your table. It's hard to comment on your indexing strategy without knowing much more.
This is a much better reason for recreating the table than something cosmetic like column order.

It is an extremely bad practice to rearrange columns in a table. Do not even consider trying to do such a thing. The column order is irrelevant if you have used correct coding practices (such as never and I do mean never) using select *.
If you have used select * and you change the order of the columns, you are even more at risk of breaking code because the query may not be expecting Price as the third column but as the second column and that could seriously mess up a lot of things.
Further the only way to do this is to create another table, move your data and then drop the old table and rename the first one. Of course if you have FKs, they too have to be dropped and recreated. This takes alot of time if you hav ea large data set and could cause problems for users.
There is no cuircumstance where you would ever consider doing this for a table that in on production as it is just too risky. If you are in the early stages of design, you could consider doing it.

ALTER TABLE my_table ADD COLUMN column_name VARCHAR(50) AFTER col_name;
substituting whatever def you want for VARCHAR(50)
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Edit This is of course the right answer for a MySQL server... but this is not what the OP wants.

How to add a constant column when replicating a database?

I am using SQL Server 2000 and I have two databases that both replicate (transactional push subscription) to a single database. I need to know which database the records came from.
So I want to add a fixed column specified in the publication to my table so I can tell which database the row originated from.
How do I go about doing this?
I would like to avoid altering the main databases mostly due to the fact there are many tables I would need to do this to. I was hoping for some built in feature of replication that would do this for me some where. Other than that I would go with the view idea.

You could use a calculated column Use the following on the two databases:
ALTER TABLE TableName ADD
MyColumn AS 'Server1'
Then just define the single "master" database to use a VARCHAR column (or whatever you want) that you fill using the calculated columns value.

You can create a view, which adds the "constant" column, and use it as a replication source.

So the solution for me was to set up the replication publications to allow transformations and create a DTS package for each site that appends the siteid into the tables to keep the ids unique as I can't use guids.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight