I have a requirement wherein there are 2 tables (Staging & Target) in the same database.
Everytime data is first loaded in the Staging table. Now in second run data will be again first Loaded to the Staging table. Now I want to flip the tables using SQL query in such a way that after data is loaded into the Staging table make this changes
Staging becomes(flip) Target
Target becomes(flip) Staging
So ideally we will see both tables. But in actual at a time only 1 table has latest data.
Before opting for flip tables approach I have tried the sp_rename but that results in deadlock if someone tried to query Target table while it is being dropped and getting renamed.
Example,
IF OBJECT_ID('[dbo].[Target]','U') IS NOT NULL DROP TABLE [dbo].[Target] ;
EXEC sp_rename '[dbo].[Staging]','Target';
If we use the flip approach then there will be minimal chances of a lock. I tried to understand this flip tables concept and one approach I see is, it could be done using some kind of flag setting in SQL but not sure how. Any help on this would be really appreciated.
Related
I am struggling to find the best way to migrate some varchar columns to nvarchar. One of the options I am using is to add new nvarchar column, then update the values from the original column, drop the original column and rename the new one to the old name.
I know it will generate a lot of UNDO and REDO data. Still, I have other limitations (mostly by SQL Server not supporting parallel DDLs and multi-column ALTER TABLE operations), so let's focus on how to run the update statement faster.
My Oracle experience is telling me to use internal parallelism, but is it available in SQL Server?
I am not able to run this statement in parallel, although I especially created the table to be a heap table (no clustered index).
update t
set new_col_1 = col_1,
new_col_2 = col_2,
...,
new_col_N = col_N ;
Thanks in advance!!
I read the documentation but have not find an exact proof that parallel update plans are possible.
I constantly run into this problem. I am working in a data warehouse and I cannot find out what is populating a table. Typically the table is being populated on a daily basis from either other table in the warehouse or from an Oracle database. I have tried the below query and can confirm the updates, but i cannot see what is doing it. I searched to the known SSIS package and stored procedure with similar names and SQL jobs but I can find nothing.
select object_name(object_id) as DatabaseName, last_user_update, *
from sys.dm_db_index_usage_stats
where database_id = DB_ID('Warehouse')
and object_id=object_id('PAYMENTS_DAILY')
I only have the most basic SQL Server tools available so no fancy search tools :(
There is no way to tell, after data has been inserted into a data, where the data came from without having some sort of logging.
SSIS has logging, you can use triggers on the tables, change data capture, audit columns, etc. are the many ways to do this.
Frequently, if you know when the row was added, that can help you figure out what process is adding it. Add a new "InsertedDatetime" column to your warehouse table and give it a default value of getdate(). If you know that the rows always come in at 11:15 AM, you can use that to narrow your search.
That will probably be enough information, but if that doesn't help you track down the process, then you can add additional columns that contain everything from a source IP address to a calling object name.
As a last resort, you could rename your table and create a view named the same and then use an Instead Of Insert trigger on it that just holds open the connection so you can examine the currently executing processes to figure out where it's coming from.
I bet you can figure it out from the time alone though.
Everyday a company drops a text file with potentially many records (350,000) onto our secure FTP. We've created a windows service that runs early in the AM to read in the text file into our SQL Server 2005 DB tables. We don't do a BULK Insert because the data is relational and we need to check it against what's already in our DB to make sure the data remains normalized and consistent.
The problem with this is that the service can take a very long time (hours). This is problematic because it is inserting and updating into tables that constantly need to be queried and scanned by our application which could affect the performance of the DB and the application.
One solution we've thought of is to run the service on a separate DB with the same tables as our live DB. When the service is finished we can do a BCP into the live DB so it mirrors all of the new records created by the service.
I've never worked with handling millions of records in a DB before and I'm not sure what a standard approach to something like this is. Is this an appropriate way of doing this sort of thing? Any suggestions?
One mechanism I've seen is to insert the values into a temporary table - with the same schema as the target table. Null IDs signify new records and populated IDs signify updated records. Then use the SQL Merge command to merge it into the main table. Merge will perform better than individual inserts/updates.
Doing it individually, you will incur maintenance of the indexes on the table - can be costly if its tuned for selects. I believe with merge its a bulk action.
It's touched upon here:
What's a good alternative to firing a stored procedure 368 times to update the database?
There are MSDN articles about SQL merging, so Googling will help you there.
Update: turns out you cannot merge (you can in 2008). Your idea of having another database is usually handled by SQL replication. Again I've seen in production a copy of the current database used to perform a long running action (reporting and aggregation of data in this instance), however this wasn't merged back in. I don't know what merging capabilities are available in SQL Replication - but it would be a good place to look.
Either that, or resolve the reason why you cannot bulk insert/update.
Update 2: as mentioned in the comments, you could stick with the temporary table idea to get the data into the database, and then insert/update join onto this table to populate your main table. The difference is now that SQL is working with a set so can tune any index rebuilds accordingly - should be faster, even with the joining.
Update 3: you could possibly remove the data checking from the insert process and move it to the service. If you can stop inserts into your table while this happens, then this will allow you to solve the issue stopping you from bulk inserting (ie, you are checking for duplicates based on column values, as you don't yet have the luxury of an ID). Alternatively with the temporary table idea, you can add a WHERE condition to first see if the row exists in the database, something like:
INSERT INTO MyTable (val1, val2, val3)
SELECT val1, val2, val3 FROM #Tempo
WHERE NOT EXISTS
(
SELECT *
FROM MyTable t
WHERE t.val1 = val1 AND t.val2 = val2 AND t.val3 = val3
)
We do much larger imports than that all the time. Create an SSIS pacakge to do the work. Personally I prefer to create a staging table, clean it up, and then do the update or import. But SSIS can do all the cleaning in memory if you want before inserting.
Before you start mirroring and replicating data, which is complicated and expensive, it would be worthwhile to check your existing service to make sure it is performing efficiently.
Maybe there are table scans you can get rid of by adding an index, or lookup queries you can get rid of by doing smart error handling? Analyze your execution plans for the queries that your service performs and optimize those.
I have situation where I need to change the order of the columns/adding new columns for existing Table in SQL Server 2008. It is not allowing me to do without drop and recreate. But that is in production system and having data in that table. I can take backup of the data, and drop the existing table and change the order/add new columns and recreate it, insert the backup data into new table.
Is there any best way to do this without dropping and recreating. I think SQL Server 2005 will allow this process without dropping and recreating while changing to existing table structure.
Thanks
You can't really change the column order in a SQL Server 2008 table - it's also largely irrelevant (at least it should be, in the relational model).
With the visual designer in SQL Server Management Studio, as soon as you make too big a change, the only reliable way to do this for SSMS is to re-create the table in the new format, copy the data over, and then drop the old table. There's really nothing you can do about this to change it.
What you can do at all times is add new columns to a table or drop existing columns from a table using SQL DDL statements:
ALTER TABLE dbo.YourTable
ADD NewColumn INT NOT NULL ........
ALTER TABLE dbo.YourTable
DROP COLUMN OldColumn
That'll work, but you won't be able to influence the column order. But again: for your normal operations, column order in a table is totally irrelevant - it's at best a cosmetic issue on your printouts or diagrams..... so why are you so fixated on a specific column order??
There is a way to do it by updating SQL server system table:
1) Connect to SQL server in DAC mode
2) Run queries that will update columns order:
update syscolumns
set colorder = 3
where name='column2'
But this way is not reccomended, because you can destroy something in DB.
One possibility would be to not bother about reordering the columns in the table and simply modify it by add the columns. Then, create a view which has the columns in the order you want -- assuming that the order is truly important. The view can be easily changed to reflect any ordering that you want. Since I can't imagine that the order would be important for programmatic applications, the view should suffice for those manual queries where it might be important.
As the other posters have said, there is no way without re-writing the table (but SSMS will generate scripts which do that for you).
If you are still in design/development, I certainly advise making the column order logical - nothing worse than having a newly added column become part of a multi-column primary key and having it no where near the other columns! But you'll have to re-create the table.
One time I used a 3rd party system which always sorted their columns in alphabetical order. This was great for finding columns in their system, but whenever they revved their software, our procedures and views became invalid. This was in an older version of SQL Server, though. I think since 2000, I haven't seen much problem with incorrect column order. When Access used to link to SQL tables, I believe it locked in the column definitions at time of table linking, which obviously has problems with almost any table definition changes.
I think the simplest way would be re-create the table the way you want it with a different name and then copy the data over from the existing table, drop it, and re-name the new table.
Would it perhaps be possible to script the table with all its data.
Do an edit on the script file in something like notepad++
Thus recreating the table with the new columns but the same.
Just a suggestion, but it might take a while to accomplish this.
Unless you write yourself a small little c# application that can work with the file and apply rules to it.
If only notepadd++ supported a find and move operation
I have data coming in from datastage that is being put in our SQL Server 2008 database in a table: stg_table_outside_data. The ourside source is putting the data into that table every morning. I want to move the data from stg_table_outside_data to table_outside_data where I keep multiple days worth of data.
I created a stored procedure that inserts the data from stg_table_outside_Data into table_outside_data and then truncates stg_table_outside_Data. The outside datastage process is outside of my control, so I have to do this all within SQL Server 2008. I had originally planned on using a simple after insert statement, but datastage is doing a commit after every 100,000 rows. The trigger would run after the first commit and cause a deadlock error to come up for the datastage process.
Is there a way to set up an after insert to wait 30 minutes then make sure there wasn't a new commit within that time frame? Is there a better solution to my problem? The goal is to get the data out of the staging table and into the working table without duplications and then truncate the staging table for the next morning's load.
I appreciate your time and help.
One way you could do this is take advantage of the new MERGE statement in SQL Server 2008 (see the MSDN docs and this blog post) and just schedule that as a SQL job every 30 minutes or so.
The MERGE statement allows you to easily just define operations (INSERT, UPDATE, DELETE, or nothing at all) depending on whether the source data (your staging table) and the target data (your "real" table) match on some criteria, or not.
So in your case, it would be something like:
MERGE table_outside_data AS target
USING stg_table_outside_data AS source
ON (target.ProductID = source.ProductID) -- whatever join makes sense for you
WHEN NOT MATCHED THEN
INSERT VALUES(.......)
WHEN MATCHED THEN
-- do nothing
You shouldn't be using a trigger to do this, you should use a scheduled job.
maybe building a procedure that moves all data from stg_table_outside_Data to table_outside_data once a day, or by using job scheduler.
Do a row count on the trigger, if the count is less than 100,000 do nothing. Otherwise, run your process.