I am using JasperReports to generate reports from SQL Server on daily basis. The problem is that every day the report reads data from beginning, but I want it to exclude records read earlier and include only new rows. The database is old and doesn't have timestamp columns in table so there is no way to identify which records are 'new' and which ones are 'old'.
I am not allowed to modify it either.
Please suggest any other way if possible.
You can create a new table and every time you print records on your report, insert that records in the table. So you can use a query with a NOT EXISTS condition from the original table on the new table.
The obvious drawbacks of this approach is space consumption on the DB and the extra work needed in inserting records on the new table, but if you cannot modify the original table, it's the only solution.
Otherwise the Alex K suggestion is very good.
Related
I upload an Excel file using BCP. (Truncate the current table in DB every day and BCP in from the excel file to repopulate table). It is important for me to keep a log of all the changes made to the rows (could be row additions or changes in columns of current rows). The idea is to keep a log of all the changes made.
I have read a few articles online, where we can create a log table and trigger (have no idea how to do it). A table of logs that has columns like
Date | Field | Old Value | New Value.
Firstly, how to do this?
Secondly, whats a smarter way to not log truncating of table and just the actual changes. I'm thinking of creating a temp table (tbl_Excefile_Temp) where I will import the file and then UPDATE the current table (tbl_Excefile) from the tbl_Excefile_Temp This way all the changes made in the current table will get logged automatically in the logs table.
I know its a big use case, could you please guide.
If you are using SQL server 2016 or higher I would advise you to look into temporal tables. If you stop truncating and use a merge statement you have a very easy way of keeping a log. Whenever you make a change SQL server will write to old values away and add the datetimes when the old row was valid.
With temporal tables you can query your table as they were at a specific datetime. In regular use there is no difference with a non-temporal table.
I have two table(T_1 & T_2) with same fields. What I need, after every hour T_2 table only have the data which was inserted on T_1 table within that hour(previous hour data will be erased). I am using sql server. Please help me.
Why would you set up two tables to do this?
Your use-case seems like a canonical case for table partitioning. This is a way of storing data in separate "units" (files). You seem to want T_1 to have its data split by hour.
Then you can directly access the data for a particular hour. This will be as efficient from an access perspective as copying the data into a separate table.
If you really wanted to, you could copy the most recent partition to another table every hour -- swapping in the new data for the older data. But that seems unnecessary in practice.
BUSINESS SCENARIO, SEEKING A WAY TO PROGRAM THIS:
Every night, I have to update table ABC in the data warehouse database from the production database. The table is millions of rows, so I want to do this efficiently.
The table doesn't have any sort of timestamp marker (LastUpdated Date\Time).
The database was created by our vendor whose software we run, and they are giving us visibility into our data. We may not have much leverage in terms of asking for new columns to house information such as LastUpdate DateTime stamp.
Is there a way, absent such information, to be able to identify those rows that have changed or added.
For example, is there such a thing as query-able physical row number associated with the table record, that might help us work towards a solution? If that could be queried, and perhaps go sequentially, then maybe there is a way to get the inserted rows.
Updated rows, I am not so sure.
Just entertaining ideas at this point in time to see if there is an efficient solution for this scenario.
Ideally, the solution will be geared towards a stored procedure we can have run every night be a job.
Thank you.
I saw this comment but I am not so sure that the solution is efficient:
Find changed rows (composite key with nulls)
Please check the MERGE operator,You can create a SQL Server Job which can execute the MERGE Script to check and update the changes if any.
I have a database table which have more than 1 million records uniquely identified by a GUID column. I want to find out which of these record or rows was selected or retrieved in the last 5 years. The select query can happen from multiple places. Sometimes the row will be returned as a single row. Sometimes it will be part of a set of rows. there is select query that does the fetching from a jdbc connection from a java code. Also a SQL procedure also fetches data from the table.
My intention is to clean up a database table.I want to delete all rows which was never used( retrieved via select query) in last 5 years.
Does oracle DB have any inbuild meta data which can give me this information.
My alternative solution was to add a column LAST_ACCESSED and update this column whenever I select a row from this table. But this operation is a costly operation for me based on time taken for the whole process. Atleast 1000 - 10000 records will be selected from the table for a single operation. Is there any efficient way to do this rather than updating table after reading it. Mine is a multi threaded application. so update such large data set may result in deadlocks or large waiting period for the next read query.
Any elegant solution to this problem?
Oracle Database 12c introduced a new feature called Automatic Data Optimization that brings you Heat Maps to track table access (modifications as well as read operations). Careful, the feature is currently to be licensed under the Advanced Compression Option or In-Memory Option.
Heat Maps track whenever a database block has been modified or whenever a segment, i.e. a table or table partition, has been accessed. It does not track select operations per individual row, neither per individual block level because the overhead would be too heavy (data is generally often and concurrently read, having to keep a counter for each row would quickly become a very costly operation). However, if you have you data partitioned by date, e.g. create a new partition for every day, you can over time easily determine which days are still read and which ones can be archived or purged. Also Partitioning is an option that needs to be licensed.
Once you have reached that conclusion you can then either use In-Database Archiving to mark rows as archived or just go ahead and purge the rows. If you happen to have the data partitioned you can do easy DROP PARTITION operations to purge one or many partitions rather than having to do conventional DELETE statements.
I couldn't use any inbuild solutions. i tried below solutions
1)DB audit feature for select statements.
2)adding a trigger to update a date column whenever a select query is executed on the table.
Both were discarded. Audit uses up a lot of space and have performance hit. Similary trigger also had performance hit.
Finally i resolved the issue by maintaining a separate table were entries older than 5 years that are still used or selected in a query are inserted. While deleting I cross check this table and avoid deleting entries present in this table.
I need to keep a daily statistic of the count of records in a table.
Is there a way to automate counting the records daily and writing the result into another table? Maybe using a SQL Agent Job or something like that?
I'm using SQL Server 2008.
Thank you!
Edit:
If I delete today all records from 1/1/2010, the statistic still needs to show that at 1/1/2010 there were 500 records at the end of the day. So solely using GetDate() and summing up doesn't work, as I'd get 0 records with that method for 1/1/2010.
Add a column to your table like so:
ALTER TABLE My_Table
ADD insert_date DATETIME NOT NULL DEFAULT GETDATE()
You can then query against that as SQL intended.
Insert trigger: update counting table record for today (insert if not already created)
Delete trigger: decrement counting table record for today (insert if not already created)
In my opinion you answered your own question with the best option. Create a Job that just calls a stored procedure getting the count and stamping them.
The other option mentioned by Tom H. is a better choice, but If you can't alter the table for whatever reason the job is a good option.
Another option could be to place an insert trigger on that table to increment a count somewhere, but that could affect performance depending on how you implement it.
Setting up the job is simple through the SQL Management studio interface with a schedule of how often to run and what stored procedure to call. You can even just write the command directly in the command window of the step instead of calling a sp.
Tom's answer with OMG_Ponies' addendum about tombstoning instead of deleting is the best answer. If you are concerned about how many records were in the table on a certain day, there is a good possibility that someone one day will ask for information about those records on that day.
If that is a no go, then as others have said, create a second table with a field for the PK of the last record for the day, and then count for the day, then create a job that runs at the end of each day and counts all records with OrginalTable.PK > MAX(NewCountTable.Last_PK_Field) and adds that row (Last_PK_Field, Count) to the NewCountTable.
SQL Job is good -- yes.
Or you could add a date column to the table defaulted to GETDATE(). This wouldn't work if you don't want your daily counts to be affected by folks deleting records after the fact.