SQL Server Add Column - sql-server

I want to add a column to one of my tables in SQL Server. I don't want it to be at the end of the column listing in the table...I actually want it to be somewhere else (location wise) in the table. Is there another option besides dropping and rebuilding (populating) the table to accomplish this? I obviously don't want ot lose any of my data, but I would prefer it not have to have the column at the end of the table definition.
Thanks,
S

Column order in the table is irrelevant -- it's purely cosmetic.
There is no extension to ALTER TABLE that allows you to specify the ordinal position of a new column (either for adding a new column or moving an existing column).
For more on the subject, see:
Change Order of Column In Database Tables

Short answer:
I agree with OMG Ponies, column order isn't important. If you don't have a clustered index, rather drop and recreate the table than run an ALTER TABLE x ADD col.
Long answer:
If your table has a fair bit of data (50Mb comes to mind) then you will be better off recreating the table rather than ALTER TABLE x ADD col The data page allocation plan for the table is calculated at table creation time so when you add a column, SQL Server will typically put your new column's data in separate pages and put forward pointers from your existing data pages to the new data pages for the column you added. If you're going to use the new column extensively then your table IO will be quite poor since reading even 1 row will require reading at least 2 pages. Table scans will also perform poorly since forward pointers will always be followed, causing table scans that are normally sequential to jump back and forth on your disk during a read.
In this case it's better to rename the existing table, recreate your table with the new column, insert into table_name select col1, col2, 'null or default for new col', col3 from temp_renamed_table and finally drop the old table that you renamed. The data pages will be much better organised and your IO will be faster despite looking the same from a SQL developer's point of view than when ALTER TABLE is used. If you have a clustered index the table will be reorganised when you add the column and page splits are less likely. You could also run ALTER TABLE x REBUILD if you have SQL Server 2008, don't have a clustered index and lots of time when users aren't using your table. It's hard to comment on your indexing strategy without knowing much more.
This is a much better reason for recreating the table than something cosmetic like column order.

It is an extremely bad practice to rearrange columns in a table. Do not even consider trying to do such a thing. The column order is irrelevant if you have used correct coding practices (such as never and I do mean never) using select *.
If you have used select * and you change the order of the columns, you are even more at risk of breaking code because the query may not be expecting Price as the third column but as the second column and that could seriously mess up a lot of things.
Further the only way to do this is to create another table, move your data and then drop the old table and rename the first one. Of course if you have FKs, they too have to be dropped and recreated. This takes alot of time if you hav ea large data set and could cause problems for users.
There is no cuircumstance where you would ever consider doing this for a table that in on production as it is just too risky. If you are in the early stages of design, you could consider doing it.

ALTER TABLE my_table ADD COLUMN column_name VARCHAR(50) AFTER col_name;
substituting whatever def you want for VARCHAR(50)
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Edit This is of course the right answer for a MySQL server... but this is not what the OP wants.

Related

Why can't columnar databases like Snowflake and Redshift change the column order?

I have been working with Redshift and now testing Snowflake. Both are columnar databases. Everything I have read about this type of databases says that they store the information by column rather than by row, which helps with the massive parallel processing (MPP).
But I have also seen that they are not able to change the order of a column or add a column in between existing columns (don't know about other columnar databases). The only way to add a new column is to append it at the end. If you want to change the order, you need to recreate the table with the new order, drop the old one, and change the name of the new one (this is called a deep copy). But this sometimes can't be possible because of dependencies or even memory utilization.
I'm more surprised about the fact that this could be done in row databases and not in columnar ones. Of course, there must be a reason why it's not a feature yet, but I clearly don't have enough information about it. I thought it was going to be just a matter of changing the ordinal of the tables in the information_schema but clearly is not that simple.
Does anyone know the reason of this?
Generally, column ordering within the table is not considered to be a first class attribute. Columns can be retrieved in whatever order you require by listing the names in that order.
Emphasis on column order within a table suggests frequent use of SELECT *. I'd strongly recommend not using SELECT * in columnar databases without an explicit LIMIT clause to minimize the impact.
If column order must be changed you do that in Redshift by creating a new empty version of the table with the columns in the desired order and then using ALTER TABLE APPEND to move the data into the new table very quickly.
https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE_APPEND.html
The order in which the columns are stored internally cannot be changed without dropping and recreating them.
Your SQL can retrieve the columns in any order you want.
General requirement to have columns listed in some particular order is for the viewing purpose.
You could define a view to be in the desired column order and use the view in the required operation.
CREATE OR REPLACE TABLE CO_TEST(B NUMBER,A NUMBER);
INSERT INTO CO_TEST VALUES (1,2),(3,4),(5,6);
SELECT * FROM CO_TEST;
SELECT A,B FROM CO_TEST;
CREATE OR REPLACE VIEW CO_VIEW AS SELECT A,B FROM CO_TEST;
SELECT * FROM CO_VIEW;
Creating a view to list the columns in the required order will not disturb the actual table underneath the view and the resources associated with recreation of the table is not wasted.
In some databases (Oracle especially) ordering columns on table will make difference in performance by storing NULLable columns at the end of the list. Has to do with how storage is beiing utilized within the data block.

What is the best way to implement soft deletion in a large relational Database? [duplicate]

Working on a project at the moment and we have to implement soft deletion for the majority of users (user roles). We decided to add an is_deleted='0' field on each table in the database and set it to '1' if particular user roles hit a delete button on a specific record.
For future maintenance now, each SELECT query will need to ensure they do not include records where is_deleted='1'.
Is there a better solution for implementing soft deletion?
Update: I should also note that we have an Audit database that tracks changes (field, old value, new value, time, user, ip) to all tables/fields within the Application database.
I would lean towards a deleted_at column that contains the datetime of when the deletion took place. Then you get a little bit of free metadata about the deletion. For your SELECT just get rows WHERE deleted_at IS NULL
You could perform all of your queries against a view that contains the WHERE IS_DELETED='0' clause.
Having is_deleted column is a reasonably good approach.
If it is in Oracle, to further increase performance I'd recommend partitioning the table by creating a list partition on is_deleted column.
Then deleted and non-deleted rows will physically be in different partitions, though for you it'll be transparent.
As a result, if you type a query like
SELECT * FROM table_name WHERE is_deleted = 1
then Oracle will perform the 'partition pruning' and only look into the appropriate partition. Internally a partition is a different table, but it is transparent for you as a user: you'll be able to select across the entire table no matter if it is partitioned or not. But Oracle will be able to query ONLY the partition it needs. For example, let's assume you have 1000 rows with is_deleted = 0 and 100000 rows with is_deleted = 1, and you partition the table on is_deleted. Now if you include condition
WHERE ... AND IS_DELETED=0
then Oracle will ONLY scan the partition with 1000 rows. If the table weren't partitioned, it would have to scan 101000 rows (both partitions).
The best response, sadly, depends on what you're trying to accomplish with your soft deletions and the database you are implementing this within.
In SQL Server, the best solution would be to use a deleted_on/deleted_at column with a type of SMALLDATETIME or DATETIME (depending on the necessary granularity) and to make that column nullable. In SQL Server, the row header data contains a NULL bitmask for each of the columns in the table so it's marginally faster to perform an IS NULL or IS NOT NULL than it is to check the value stored in a column.
If you have a large volume of data, you will want to look into partitioning your data, either through the database itself or through two separate tables (e.g. Products and ProductHistory) or through an indexed view.
I typically avoid flag fields like is_deleted, is_archive, etc because they only carry one piece of meaning. A nullable deleted_at, archived_at field provides an additional level of meaning to yourself and to whoever inherits your application. And I avoid bitmask fields like the plague since they require an understanding of how the bitmask was built in order to grasp any meaning.
if the table is large and performance is an issue, you can always move 'deleted' records to another table, which has additional info like time of deletion, who deleted the record, etc
that way you don't have to add another column to your primary table
That depends on what information you need and what workflows you want to support.
Do you want to be able to:
know what information was there (before it was deleted)?
know when it was deleted?
know who deleted it?
know in what capacity they were acting when they deleted it?
be able to un-delete the record?
be able to tell when it was un-deleted?
etc.
If the record was deleted and un-deleted four times, is it sufficient for you to know that it is currently in an un-deleted state, or do you want to be able to tell what happened in the interim (including any edits between successive deletions!)?
Careful of soft-deleted records causing uniqueness constraint violations.
If your DB has columns with unique constraints then be careful that the prior soft-deleted records don’t prevent you from recreating the record.
Think of the cycle:
create user (login=JOE)
soft-delete (set deleted column to non-null.)
(re) create user (login=JOE). ERROR. LOGIN=JOE is already taken
Second create results in a constraint violation because login=JOE is already in the soft-deleted row.
Some techniques:
1. Move the deleted record to a new table.
2. Make your uniqueness constraint across the login and deleted_at timestamp column
My own opinion is +1 for moving to new table. Its take lots of
discipline to maintain the *AND delete_at = NULL* across all your
queries (for all of your developers)
You will definitely have better performance if you move your deleted data to another table like Jim said, as well as having record of when it was deleted, why, and by whom.
Adding where deleted=0 to all your queries will slow them down significantly, and hinder the usage of any of indexes you may have on the table. Avoid having "flags" in your tables whenever possible.
you don't mention what product, but SQL Server 2008 and postgresql (and others i'm sure) allow you to create filtered indexes, so you could create a covering index where is_deleted=0, mitigating some of the negatives of this particular approach.
Something that I use on projects is a statusInd tinyint not null default 0 column
using statusInd as a bitmask allows me to perform data management (delete, archive, replicate, restore, etc.). Using this in views I can then do the data distribution, publishing, etc for the consuming applications. If performance is a concern regarding views, use small fact tables to support this information, dropping the fact, drops the relation and allows for scalled deletes.
Scales well and is data centric keeping the data footprint pretty small - key for 350gb+ dbs with realtime concerns. Using alternatives, tables, triggers has some overhead that depending on the need may or may not work for you.
SOX related Audits may require more than a field to help in your case, but this may help.
Enjoy
Use a view, function, or procedure that checks is_deleted = 0; i.e. don't select directly on the table in case the table needs to change later for other reasons.
And index the is_deleted column for larger tables.
Since you already have an audit trail, tracking the deletion date is redundant.
I prefer to keep a status column, so I can use it for several different configs, i.e. published, private, deleted, needsAproval...
Create an other schema and grant it all on your data schema.
Implment VPD on your new schema so that each and every query will have the predicate allowing selection of the non-deleted row only appended to it.
http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/cmntopc.htm#CNCPT62345
#AdditionalCriteria("this.status <> 'deleted'")
put this on top of your #entity
http://wiki.eclipse.org/EclipseLink/Examples/JPA/SoftDelete

Adding column at specific location vs at the end of data table

In SQL Server is adding a new column at the end of the table faster than adding it at a specific location?
I have to add 20 new columns to a table that already has 250 columns and roughly 2M records. The new columns are null, float columns. Adding single column took me more than 10 minutes. I'm looking for any way to speed up the process.
If you're using SSMS, you should know that under the hood it is creating a new table with new columns then inserting the data from the original table, redoing all the keys and constraints, dropping the old table and then renaming the new table to use the old table's name.
It is probably better practice to add the row at the end and you can use then T-SQL in an order you want for retrieval.
Another note, if you're not locking the db or don't put it in single user mode it will take longer to run.
tl;dr - do not add column at specific position unless you have time to waste
Are you using?:
ALTER TABLE table_name
ADD column_name datatype
Or are you using the SSMS designer? SSMS designer may drop and recreate the table (a temporary table is created to hold the data temporarily) - this may be why it is taking a longer time.

Add column VS Recreating table

I was going through some SQL Scripts a coworker had wrote to upgrade a table to a newer version. It was simply adding a new column to the table. However, instead of a
ALTER TABLE [Table] ADD [Column] [DataType]
statement, he instead made a copy of the table with the new column, repopulated it with the existing data, deleted the old table, renamed the new one to the old table, and then re-added all the indexes and relationships.
My question is, is there any benefit to doing this rather than the simple Alter statement or is there any difference?
I can't imagine all that work for practically no difference other than the column's ordinal position being in the desired place.
When you use the SSMS GUI, it will sometimes take this approach. One possible reasons for doing it this way is "Inserting" a column rather than "appending" a column. (if you want the column to appear before some already existing columns.) In that case, adding a column won't work.
Essentially, any time a simple addition of a new column to the end of the table isn't what you're looking for. But my guess is that he used the GUI to add the column and chose the "generate SQL script" option.
Just a couple of differences on how that would be different. Hope this helps!
Adding new column to existing table:
Pros:
No need to recreate indices and constraints on the existing table.
Data on existing column remains intact.
Cons:
Huge table that has millions of records will need to be updated for
the new column.
Recreating a New Table:
Pros:
No need to worry for any kind of limits (number of columns, total
size of table) in the RDBMS
Cons:
Recreate indices and constraints
Reload data for all columns

T-SQL Add Column In Specific Order

Im a bit new to T-SQL, Coming from a MySQL background Im still adapting to the different nuances in the syntax.
Im looking to add a new column AFTER a specific one. I've found out that AFTER is a valid keyword but I don't think it's the right one for the job.
ALTER TABLE [dbo].[InvStockStatus]
ADD [Abbreviation] [nvarchar](32) DEFAULT '' NOT NULL ;
This is my current query, which works well, except it adds the field at the end of the Table, Id prefer to add it after [Name]. What's the syntax im looking for to represent this?
You can't do it like that
for example if you have a table like this
create table TestTable(id1 int,id3 int)
and you want to add another column id2 between id1 and id3 then here is what SQL Server does behind the scene if you use the designer
BEGIN TRANSACTION
SET QUOTED_IDENTIFIER ON
SET ARITHABORT ON
SET NUMERIC_ROUNDABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
COMMIT
BEGIN TRANSACTION
GO
CREATE TABLE dbo.Tmp_TestTable
(
id1 int NULL,
id2 int NULL,
id3 int NULL
) ON [PRIMARY]
GO
ALTER TABLE dbo.Tmp_TestTable SET (LOCK_ESCALATION = TABLE)
GO
IF EXISTS(SELECT * FROM dbo.TestTable)
EXEC('INSERT INTO dbo.Tmp_TestTable (id1, id3)
SELECT id1, id3 FROM dbo.TestTable WITH (HOLDLOCK TABLOCKX)')
GO
DROP TABLE dbo.TestTable
GO
EXECUTE sp_rename N'dbo.Tmp_TestTable', N'TestTable', 'OBJECT'
GO
COMMIT
As you can see if you have a lot of data this can be problematic, why does it matter where the column is located? just use
select col1,col2,col3 from table
The sequence of columns is really irrelevant in a strict (functional) sense, in any RDBMS - it's just a "nicety" to have for documentation or humans to look at.
SQL Server doesn't support any T-SQL commands to order the columns in any way. So there is no syntax in T-SQL to accomplish this.
The only way to change that is to use the visual table designer in SSMS, which really recreates the whole table from scratch, when you move around columns or insert columns in the middle of a table.
While from a functional database perspective it is correct that the tuples can appear in any order.
However a database does not exist inside a vacuum. There is always a human that will want to read the table schema (dbas, devs) and get there head around it and maintain or write queries against it.
In the past, I've used conventions for a table's columns such as ordering a table with
primary key(s) first
then foreign keys
then frequently used columns
then other columns
and lastly audit related columns
and these help when scanning a table. Unfortunately, it appears you have to jump through hoops to maintain any order, so now I have to question whether it's worth having and maintaining these conventions. My new rule is just add it to the end.
If you are really worried about order from a readability perspective, you should create your own 'readability' views (perhaps in a different schema) in any order you feel like. You could have multiple views of the same table (one for just the core columns and another including stuff that usually isn't relevant).
It would be nice to be able to be able to re-order columns in SQL Server database diagrams (as a display thing only), but this isn't possible.
You should always add fields only at the end. You should select fields in the order you want, but never restructure an existing table to add a column inthe middle. This is likely to break some things (where people did dumb things like select * or inserts without specifying the columns granted people shouldn't do those things, but they do).
Recreating the table can be a long time-consuming process for no gain whatsoever and can cause lots of user complaints and lockups while it is going on.
The schema comparison tools I have seen will create a new table with the desired ordering and then copy the data from the old table to the new one (with some renaming magic to make the new one resemble the old). Given how akward this approach is, I figure there isn't a T-SQL statement to add a new column in a specific place.
This is a safe work around without using temp table. After you add the column towards the end, simply go to SQL Sever Management Studio. Click on the table, select -> Design (or Modify) and drag the last column to where ever position you want.
This way you do not have to worry about loosing data and indexes. The only other option is to recreate the table which can be problematic if you have large data.
This answer is for helping other people and not intended to be accepted as answer.
Technically, or perhaps I should say, academically, the order in which columns are added to a table, or the order in which they are stored in the database's internal storage model, should not be of any concern to you. You can simply list the columns in the Select clause of your SQL queries to control the order that columns or computed expressions appear in the output of any query you run. Internally the database is free to store the actual data any way it sees fit to optimize storage, and or help align data elements with disk and/or memory boundaries.
A fair amount of time has passed since this question was posted but it's worth noting that while the underlying raw SQL code to place columns in a specific order hasn't changed the process of generating scripts to do it is taken care of if you choose to use SQL Server Data Tools within Visual Studio to manage your database. The database you deploy to will always have the columns in the order that you specify in your project.

Resources