Permanently sorting a table in SQLServer based on pre-existing data - sql-server

I have made a table in SQL Server based on pre-existing data:
SELECT pre_existing_data
INTO new_table
FROM existing_table
I am trying to get the output to permanently sort by a particular field once the table is created. I thought this would be as simple as adding an ORDER BY clause at the end of the chunk of code that makes the table, but the data still won't sort properly.

There is no way to permanently sort a table in SQL.
You can create an index on the table and queries which use the index (in an ORDER BY clause) will be returned quicker, but the order the data is stored on the disk is not controllable.

You can create an index-organized table by using a CLUSTERED INDEX, which stores the data on disk in an ordered fashion on the clustering key. Then if you ORDER BY in your query based on the clustering key, data should come out very fast. Note that you have to use the ORDER BY in your query no matter what.

I have made a new table on SQL Server on pre-existing Schema
insert into new_table
select * from old_table
ORDER BY col ASC|DSC;
After it drop old_table and rename new table to old_table_name
drop table old_table_name;
rename new_table_name to old_table_name;
Try this trick to short your data in the table permanently

Related

Inserting new rows in a table

When I add news rows using the Insert into select code, the new rows get added randomly in between the already existing rows, instead of getting added to the end of the table.
I'm using, Insert into Table1 (Name1) select Name from Table2.
SQL tables are modeled after unordered sets, and hence you should not assume that there is any order to your data in the table. The only order which exists is what you specify when you query using ORDER BY, e.g.
SELECT Name1
FROM Table1
ORDER BY Name1
An index can also be thought of a way of ordering your records, but these two are mostly distinct entities from your actual table.
I agree with Tim's answer. But if you still want the data inserted in the way you want, then you can try to add the primary key yourself which is incremental (like 1,2,3 ... or 10,20,30 ...).
Although I don't recommend it, but I think following can help you if you don't want to handle the primary key yourself.
How do I add a auto_increment primary key in SQL Server database?

How to do incremental load in SQL server

I have DB tables where there are no identity column. We have client data fetched from DB2 to SQL Server and unfortunately DB2 design doesn't have identity columns.
Now we have some data inserted, updated and deleted from source (DB2/SQL Server) and these data I want to load to destination (SQL Server) using some incremental load concept.
I tried SSIS lookups in Dataflow task however it's taking huge time to simply insert one new record. Please note that, in "lookup transformation editor" I'm mapping all "available input columns" to available "available lookup columns " as there is no identity column. I think, this is why it's taking time. I have few tables having around 20 million records.
Is there any faster method /ways available to do this, specially when table does not have identity column? Is except or SQL merge will help?
I'm open to have any other approach other than SSIS.
Look up is SSIS takes some time, so you can use ESQL Task and call the merge procedures.
I think what you can do is use merge procedures there you can create a column in your source table and update the records in the column like
merge desination
using
{
source columns from source s}
join desination d
on s.primarykey=d.primary key
when matched then
s.updatedrecord=1
when not matched then
insert into desination columns.
from the above the query you new records will be inserted and the updated records with the help of updatedrecord column you can update or insert them in your destination table successfully.
you can go to the following link for merge procedures.
https://www.sqlservercentral.com/Forums/Topic1042053-392-1.aspx
https://msdn.microsoft.com/en-us/library/bb510625.aspx
If your source is a SQL query from DB2 for instance, try adding a new column to this. It will be a checksum value over the columns you select "expect to change or want to monitor changes over".
SELECT
BINARY_CHECKSUM(
Column1
,Column2
,Column3)AS ChecksumValue
,Column1
,Column2
,Column3
FROM #TEMP
You would have to add this to your existing table in SQL as well to be able to start comparing.
If you have this, then you can do the lookup on the checksum value rater than on the columns. Since number lookups are a lot quicker than varchar comparisons over multiple columns. I am guessing since there is no key, you would then have to split the data between checksum matches (which should be no change existing records) and non matches. The non matches Could be new rows or just updates. But your set should be smaller to work with.
Good luck. HTH

How To change the column order of An Existing Table in SQL Server 2008

I have situation where I need to change the order of the columns/adding new columns for existing Table in SQL Server 2008.
Existing column
MemberName
MemberAddress
Member_ID(pk)
and I want this order
Member_ID(pk)
MemberName
MemberAddress
I got the answer for the same ,
Go on SQL Server → Tools → Options → Designers → Table and Database Designers and unselect Prevent saving changes that require table re-creation
2- Open table design view and that scroll your column up and down and save your changes.
It is not possible with ALTER statement. If you wish to have the columns in a specific order, you will have to create a newtable, use INSERT INTO newtable (col-x,col-a,col-b) SELECT col-x,col-a,col-b FROM oldtable to transfer the data from the oldtable to the newtable, delete the oldtable and rename the newtable to the oldtable name.
This is not necessarily recommended because it does not matter which order the columns are in the database table. When you use a SELECT statement, you can name the columns and have them returned to you in the order that you desire.
If your table doesn't have any records you can just drop then create your table.
If it has records you can do it using your SQL Server Management Studio.
Just click your table > right click > click Design then you can now arrange the order of the columns by dragging the fields on the order that you want then click save.
Best Regards
I tried this and dont see any way of doing it.
here is my approach for it.
Right click on table and Script table for Create and have this on
one of the SQL Query window,
EXEC sp_rename 'Employee', 'Employee1' -- Original table name is Employee
Execute the Employee create script, make sure you arrange the columns in the way you need.
INSERT INTO TABLE2 SELECT * FROM TABLE1.
-- Insert into Employee select Name, Company from Employee1
DROP table Employee1.
Relying on column order is generally a bad idea in SQL. SQL is based on Relational theory where order is never guaranteed - by design. You should treat all your columns and rows as having no order and then change your queries to provide the correct results:
For Columns:
Try not to use SELECT *, but instead specify the order of columns in the select list as in: SELECT Member_ID, MemberName, MemberAddress from TableName. This will guarantee order and will ease maintenance if columns get added.
For Rows:
Row order in your result set is only guaranteed if you specify the ORDER BY clause.
If no ORDER BY clause is specified the result set may differ as the Query Plan might differ or the database pages might have changed.
Hope this helps...
This can be an issue when using Source Control and automated deployments to a shared development environment. Where I work we have a very large sample DB on our development tier to work with (a subset of our production data).
Recently I did some work to remove one column from a table and then add some extra ones on the end. I then had to undo my column removal so I re-added it on the end which means the table and all references are correct in the environment but the Source Control automated deployment will no longer work because it complains about the table definition changing.
The real problem here is that the table + indexes are ~120GB and the environment only has ~60GB free so I'll need to either:
a) Rename the existing columns which are in the wrong order, add new columns in the right order, update the data then drop the old columns
OR
b) Rename the table, create a new table with the correct order, insert to the new table from the old and delete from the old as I go along
The SSMS/TFS Schema compare option of using a temp table won't work because there isn't enough room on disc to do it.
I'm not trying to say this is the best way to go about things or that column order really matters, just that I have a scenario where it is an issue and I'm sharing the options I've thought of to fix the issue
SQL query to change the id column into first:
ALTER TABLE `student` CHANGE `id` `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT FIRST;
or by using:
ALTER TABLE `student` CHANGE `id` `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT AFTER 'column_name'

INSERT INTO vs SELECT INTO

What is the difference between using
SELECT ... INTO MyTable FROM...
and
INSERT INTO MyTable (...)
SELECT ... FROM ....
?
From BOL [ INSERT, SELECT...INTO ], I know that using SELECT...INTO will create the insertion table on the default file group if it doesn't already exist, and that the logging for this statement depends on the recovery model of the database.
Which statement is preferable?
Are there other performance implications?
What is a good use case for SELECT...INTO over INSERT INTO ...?
Edit: I already stated that I know that that SELECT INTO... creates a table where it doesn't exist. What I want to know is that SQL includes this statement for a reason, what is it? Is it doing something different behind the scenes for inserting rows, or is it just syntactic sugar on top of a CREATE TABLE and INSERT INTO.
They do different things. Use INSERT when the table exists. Use SELECT INTO when it does not.
Yes. INSERT with no table hints is normally logged. SELECT INTO is minimally logged assuming proper trace flags are set.
In my experience SELECT INTO is most commonly used with intermediate data sets, like #temp tables, or to copy out an entire table like for a backup. INSERT INTO is used when you insert into an existing table with a known structure.
EDIT
To address your edit, they do different things. If you are making a table and want to define the structure use CREATE TABLE and INSERT. Example of an issue that can be created: You have a small table with a varchar field. The largest string in your table now is 12 bytes. Your real data set will need up to 200 bytes. If you do SELECT INTO from your small table to make a new one, the later INSERT will fail with a truncation error because your fields are too small.
Which statement is preferable? Depends on what you are doing.
Are there other performance implications? If the table is a permanent table, you can create indexes at the time of table creation which has implications for performance both negatively and positiviely. Select into does not recreate indexes that exist on current tables and thus subsequent use of the table may be slower than it needs to be.
What is a good use case for SELECT...INTO over INSERT INTO ...? Select into is used if you may not know the table structure in advance. It is faster to write than create table and an insert statement, so it is used to speed up develoment at times. It is often faster to use when you are creating a quick temp table to test things or a backup table of a specific query (maybe records you are going to delete). It should be rare to see it used in production code that will run multiple times (except for temp tables) because it will fail if the table was already in existence.
It is sometimes used inappropriately by people who don't know what they are doing. And they can cause havoc in the db as a result. I strongly feel it is inappropriate to use SELECT INTO for anything other than a throwaway table (a temporary backup, a temp table that will go away at the end of the stored proc ,etc.). Permanent tables need real thought as to their design and SELECT INTO makes it easy to avoid thinking about anything even as basic as what columns and what datatypes.
In general, I prefer the use of the create table and insert statement - you have more controls and it is better for repeatable processes. Further, if the table is a permanent table, it should be created from a separate create table script (one that is in source control) as creating permanent objects should not, in general, in code are inserts/deletes/updates or selects from a table. Object changes should be handled separately from data changes because objects have implications beyond the needs of a specific insert/update/select/delete. You need to consider the best data types, think about FK constraints, PKs and other constraints, consider auditing requirements, think about indexing, etc.
Each statement has a distinct use case. They are not interchangeable.
SELECT...INTO MyTable... creates a new MyTable where one did not exist before.
INSERT INTO MyTable...SELECT... is used when MyTable already exists.
The primary difference is that SELECT INTO MyTable will create a new table called MyTable with the results, while INSERT INTO requires that MyTable already exists.
You would use SELECT INTO only in the case where the table didn't exist and you wanted to create it based on the results of your query. As such, these two statements really are not comparable. They do very different things.
In general, SELECT INTO is used more often for one off tasks, while INSERT INTO is used regularly to add rows to tables.
EDIT:
While you can use CREATE TABLE and INSERT INTO to accomplish what SELECT INTO does, with SELECT INTO you do not have to know the table definition beforehand. SELECT INTO is probably included in SQL because it makes tasks like ad hoc reporting or copying tables much easier.
Actually SELECT ... INTO not only creates the table but will fail if it already exists, so basically the only time you would use it is when the table you are inserting to does not exists.
In regards to your EDIT:
I personally mainly use SELECT ... INTO when I am creating a temp table. That to me is the main use. However I also use it when creating new tables with many columns with similar structures to other tables and then edit it in order to save time.
I only want to cover second point of the question that is related to performance, because no body else has covered this. Select Into is a lot more faster than insert into, when it comes to tables with large datasets. I prefer select into when I have to read a very large table. insert into for a table with 10 million rows may take hours while select into will do this in minutes, and as for as losing indexes on new table is concerned you can recreate the indexes by query and can still save a lot more time when compared to insert into.
SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
In day to day code you use INSERT because your tables should already exist to be read, UPDATEd, DELETEd, JOINed etc. Note: the INTO keyword is optional with INSERT
That is, applications won't normally create and drop tables as part of normal operations unless it is a temporary table for some scope limited and specific usage.
A table created by SELECT INTO will have no keys or indexes or constraints unlike a real, persisted, already existing table
The 2 aren't directly comparable because they have almost no overlap in usage
Select into creates new table for you at the time and then insert records in it from the source table. The newly created table has the same structure as of the source table.If you try to use select into for a existing table it will produce a error, because it will try to create new table with the same name.
Insert into requires the table to be exist in your database before you insert rows in it.
The simple difference between select Into and Insert Into is:
--> Select Into don't need existing table. If you want to copy table A data, you just type Select * INTO [tablename] from A. Here, tablename can be existing table or new table will be created which has same structure like table A.
--> Insert Into do need existing table.INSERT INTO [tablename] SELECT * FROM A;.
Here tablename is an existing table.
Select Into is usually more popular to copy data especially backup data.
You can use as per your requirement, it is totally developer choice which should be used in his scenario.
Performance wise Insert INTO is fast.
References :
https://www.w3schools.com/sql/sql_insert_into_select.asp
https://www.w3schools.com/sql/sql_select_into.asp
The other answers are all great/correct (the main difference is whether the DestTable exists already (INSERT), or doesn't exist yet (SELECT ... INTO))
You may prefer to use INSERT (instead of SELECT ... INTO), if you want to be able to COUNT(*) the rows that have been inserted so far.
Using SELECT COUNT(*) ... WITH NOLOCK is a simple/crude technique that may help you check the "progress" of the INSERT; helpful if it's a long-running insert, as seen in this answer).
[If you use...]
INSERT DestTable SELECT ... FROM SrcTable
...then your SELECT COUNT(*) from DestTable WITH (NOLOCK) query would work.
Select into for large datasets may be good only for a single user using one single connection to the database doing a bulk operation task. I do not recommend to use
SELECT * INTO table
as this creates one big transaction and creates schema lock to create the object, preventing other users to create object or access system objects until the SELECT INTO operation completes.
As proof of concept open 2 sessions, in first session try to use
select into temp table from a huge table
and in the second section try to
create a temp table
and check the locks, blocking and the duration of second session to create a temp table object. My recommendation it is always a good practice to create and Insert statement and if needed for minimal logging use trace flag 610.

Best way to move data between tables and generate mapping of old to new identity values

I need to merge data from 2 tables into a third (all having the same schema) and generate a mapping of old identity values to new ones. The obvious approach is to loop through the source tables using a cursor, inserting the old and new identity values along the way. Is there a better (possibly set-oriented) way to do this?
UPDATE: One additional bit of info: the destination table already has data.
Create your mapping table with an IDENTITY column for the new ID. Insert from your source tables into this table, creating your mapping.
SET IDENTITY_INSERT ON for your target table.
Insert into the target table from your source tables joined to the mapping table, then SET IDENTITY_INSERT OFF.
I created a mapping table based on the OUTPUT clause of the MERGE statement. No IDENTITY_INSERT required.
In the example below, there is RecordImportQueue and RecordDataImportQueue, and RecordDataImportQueue.RecordID is a FK to RecordImportQueue.RecordID. The data in these staging tables needs to go to Record and RecordData, and FK must be preserved.
RecordImportQueue to Record is done using a MERGE statement, producing a mapping table from its OUTPUT, and RecordDataImportQueue goes to RecordData using an INSERT from a SELECT of the source table joined to the mapping table.
DECLARE #MappingTable table ([NewRecordID] [bigint],[OldRecordID] [bigint])
MERGE [dbo].[Record] AS target
USING (SELECT [InstanceID]
,RecordID AS RecordID_Original
,[Status]
FROM [RecordImportQueue]
) AS source
ON (target.RecordID = NULL) -- can never match as RecordID is IDENTITY NOT NULL.
WHEN NOT MATCHED THEN
INSERT ([InstanceID],[Status])
VALUES (source.[InstanceID],source.[Status])
OUTPUT inserted.RecordID, source.RecordID_Original INTO #MappingTable;
After that, you can insert the records in a referencing table as folows:
INSERT INTO [dbo].[RecordData]
([InstanceID]
,[RecordID]
,[Status])
SELECT [InstanceID]
,mt.NewRecordID -- the new RecordID from the mappingtable
,[Status]
FROM [dbo].[RecordDataImportQueue] AS rdiq
JOIN #MappingTable AS mt
ON rdiq.RecordID = mt.OldRecordID
Although long after the original post, I hope this can help other people, and I'm curious for any feedback.
I think I would temporarily add an extra column to the new table to hold the old ID. Once your inserts are complete, you can extract the mapping into another table and drop the column.

Resources