Following is the requirement for my table say "Orders":
1) On day-1, I am sending the full data using bcp command as a unicode text file.
2) From the next day daily, I need to send only delta data for the transactions happenned that day.
What is the best way to implement delta? I would like to avoid the current table design and not all table has timestamp fields.
Look into SQL Server change tracking. It does what you want.
You could also snapshot the PK values and a hash of each row on midnight. Next night you snapshot again and create the diff using a full join.
You've already excluded the best way. Now you are limited to manually performing a diff based on the previous day's snapshot.
Related
I upload an Excel file using BCP. (Truncate the current table in DB every day and BCP in from the excel file to repopulate table). It is important for me to keep a log of all the changes made to the rows (could be row additions or changes in columns of current rows). The idea is to keep a log of all the changes made.
I have read a few articles online, where we can create a log table and trigger (have no idea how to do it). A table of logs that has columns like
Date | Field | Old Value | New Value.
Firstly, how to do this?
Secondly, whats a smarter way to not log truncating of table and just the actual changes. I'm thinking of creating a temp table (tbl_Excefile_Temp) where I will import the file and then UPDATE the current table (tbl_Excefile) from the tbl_Excefile_Temp This way all the changes made in the current table will get logged automatically in the logs table.
I know its a big use case, could you please guide.
If you are using SQL server 2016 or higher I would advise you to look into temporal tables. If you stop truncating and use a merge statement you have a very easy way of keeping a log. Whenever you make a change SQL server will write to old values away and add the datetimes when the old row was valid.
With temporal tables you can query your table as they were at a specific datetime. In regular use there is no difference with a non-temporal table.
...is changing over time. I know time travel does not work on Information schema. So, wanted to know if there is an alternate approach
Yes, the alternative approach is to schedule a task with an insert query based on your query against information schema, that inserts the data into some table with a timestamp.
https://docs.snowflake.net/manuals/user-guide/tasks-intro.html
I have two table(T_1 & T_2) with same fields. What I need, after every hour T_2 table only have the data which was inserted on T_1 table within that hour(previous hour data will be erased). I am using sql server. Please help me.
Why would you set up two tables to do this?
Your use-case seems like a canonical case for table partitioning. This is a way of storing data in separate "units" (files). You seem to want T_1 to have its data split by hour.
Then you can directly access the data for a particular hour. This will be as efficient from an access perspective as copying the data into a separate table.
If you really wanted to, you could copy the most recent partition to another table every hour -- swapping in the new data for the older data. But that seems unnecessary in practice.
I have a SQL Server database. The size is about 150GB which saves some data for analysis. Each day, new data comes in and we need to delete old data (based on date). Recently, the daily data size increase a lot, it will be about 8-9GB per day soon.
Currently, we delete in small batch, which takes a very long time to finish. Is there a general guide to make it faster? Tried to drop/disable index before delete, after delete finished, then rebuild index. It does not help much.
Or, this will totally depend on the actual date?
Thanks
Given the amount of data I would use a partitioned table, one for each day.
Swapping partitions in and out is going to be the fastest way to delete all data for one day.
EDIT: since truncating a partition is not as trivial as it should be in SQL Server, I figured I'd provide more details, in case you're not familiar with partitions.
In the next release of SQL Server, you should be able to just TRUNCATE PARTITION or something like that. In the meantime you have to proceed as follows:
The quickest way to delete a day of data in your database is to have the table partitioned by day and then:
Swap out the partition that you want to delete to another table: ALTER TABLE partitioned SWAP PARTITION n TO otherTableToDelete.
TRUNCATE TABLE otherTableToDelete.
I am using JasperReports to generate reports from SQL Server on daily basis. The problem is that every day the report reads data from beginning, but I want it to exclude records read earlier and include only new rows. The database is old and doesn't have timestamp columns in table so there is no way to identify which records are 'new' and which ones are 'old'.
I am not allowed to modify it either.
Please suggest any other way if possible.
You can create a new table and every time you print records on your report, insert that records in the table. So you can use a query with a NOT EXISTS condition from the original table on the new table.
The obvious drawbacks of this approach is space consumption on the DB and the extra work needed in inserting records on the new table, but if you cannot modify the original table, it's the only solution.
Otherwise the Alex K suggestion is very good.