Is there a way to reset the table SYS.COL_USAGE$? Does the number keep going up for ever?
Of course, I can truncate the table or do DML operations but this is a SYSTEM table and I prefer not to do that.
Background: We have an unusual data warehouse setup with two databases; a warehouse database where the overnight ETL writes to and a user database which is customer facing and is cloned before the start of the day from the warehouse database. We gather stats in the warehouse database which gets copied to the user database as part of the clone.
However, I realized that SYS.COL_USAGE$, which drives the histogram creation, are based entirely on ETL queries and not user queries.
DBMS_STATS.RESET_COL_USAGE is your friend here.
https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_STATS.html#GUID-0ED25A41-8642-46E4-AB5C-AAC08E622A8F
Related
We are building a system in our company which will need temporal tables in sql server and might need log shipping as well. I wanted to know if there are any unexpected impacts of log shipping on a temporal table that wouldn't happen on a normal table?
I would expect no impact in either direction (that is, log song won't change the temporal table nor will the temporal table change log shipping). At its core, log shipping is just restoring transaction logs on another server. And temporal tables are (more or less) a trigger that maintains another table on data mutations. That extra work will be present in the log backup and will restore just fine at the log shipping secondary
Previously in the company we used temporary tables, but when the volume of business data had to grow, we had to change the queries using WITH
WITH table_alias AS (SELECT ...)
We are using Redshift at my workplace, and in the last week I have been running through a serie of requests about changing the schema of a certain table, which have become a very tedious process (involving update of ETL jobs and Redshift views) every day.
The process can be summarized to:
Change the ETL job that produces the raw data before loading it to Redshift
Modify temporarily a Redshift view that uses the underlying table to allow modifications on such table.
Modify the table (e.g. add/change/remove column(s))
Modify the view back to use the updated table.
Of course, in the process there's testing involved and other time-consuming steps.
How often is it "natural" for a table schema to change? What are the best practices to deal with this without losing too much time or having to do all the "mechanic" process all over again?
Thanks!
This is one of the reasons that data warehouse automation tools exist. We know that users will change their mind when they see the warehouse, or as business requirements change. Automating the process means that everything you asked for could be delivered in a few clicks of a mouse.
You'll find a list of all the data warehouse automation products we know, on our web site, at http://ajilius.com/competitors/
We have 70+ SQL Server 2008 databases that need to be copied from an OLTP environment to a separate reporting server. Once the DB's are copied, we will do some partial data transformation: de-normalization, row level security, etc.
SSRS Reports will be written based on these static denormalized tables and views.
We have a small nightly window for copying and transforming all 70 databases (3 hours).
Currently databases average about 10GB.
Options:
1. Transactional replication:
We would need to create 100+ static denormalized tables on each reporting database.
Doing this for all 70 databases almost reaches our nightly time limit.
As the databases grow we will exceed the time limit. We thought of mixing denormalized tables with views to speed up transformation. But then there would be some dynamic and some static data which is not a solution we can use.
Also with 70 databases using transactional replication we are concerned about bandwidth usage.
2. Snapshot replication:
Copy the entire database each night.
This means we could have a mixture of denormalized tables and views so the data transformation process is quicker.
But the snapshot is a full data copy, so as the DB grows, we will exceed our time limit for completing copy and transformation.
3. Log shipping:
In our nightly window, we could use the log shipping to update the reporting databases, then truncate and repopulate the denormalized tables and use some views.
However, I understand that with log shipping, extra tables and views cannot be added to the subscribing database.
4. Mirroring:
Mirroring is being deprecated, but also the DB is not active for reporting against until failover.
5. SQL Server 2012 AlwaysOn.
We don't have SQL Server 2012 yet, can this be configured to do an update once a day instead of realtime?
And can extra tables and views be created on the subscribing database (our reporting databases)?
6. Merge replication:
This is meant to be for combining multiple data sources into one database.
But is looks like it allows for a scheduled update (once per day) and only updates the subscriber DB with the latest changes rather than doing an entire snapshot.
It requires adding a rowversion column to every table but we could handle this. Also with this solution would additional tables be able to be created on the subscriber database without the update getting out of sync?
The final option is that we use SSIS to select only the data we need from the OLTP databases. I think this options creates more risk as we would have to handle inserts/updates/deletes to our denormalized tables, rather than just drop and recreate the denormalized tables daily.
Any help on our options would be greatly appreciated.
If I've made any incorrect assumptions, please say.
If it were me, I'd go with transactional replication that runs continuously and have views (possibly indexed) at the subscriber. This has the advantage of not having to wait for the data to come over since it's always coming over.
We have an audit database (oracle) that holds monitor information of all activities performed by services (about 100) deployed on application servers. As you may imagine the audit database is really huge because of the volume of requests the services process. And the only write transaction that occurs on this database is services writing audit information in real-time.
As the audit database started growing (more than a million records per day), querying required data (for example select all errors occurred with service A for requests between start date and end date) quickly became nearly impossible.
To address this, some "smart kids" decided to device a batch job that will copy data from the database over to another database (say, audit_archives) and delete records so that only 2 days worth of audit data is retained in audit database.
This initially looked neat but whenever the "batch" process runs, the audit process that inserts data to audit database starts to become very slow - and sometimes the "batch" process also fails due to database contention.
What is a better way to design this scenario to perform above mentioned archival in most efficient way so that there is least impact to the audit process and the batch?
You might want to look into partitioning your base table.
Create a mirror table (as the target of the "historic" data) and create the same partitioning scheme on that one (most probably on a per-date basis).
Then you can simply exchange the "old" partitions (using ALTER TABLE the_table EXCHANGE partition) from one table to the other. Should only take a few seconds to "move" the partition. The actual performance would depend on the indexes defined (local, global).
This technique is usually used to do it the other way round (prepare new data to be fed into a reporting table in a datawarehouse environment) but should work for "archiving" as well.
I Easy way.
delete old records partially the best with FORALL statement
copy data partially the best with FORALL
add partitioning based on day of the week
II Queues
delete old records partially the best with FORALL statement
fill audit_archives with trigger on audit, in trigger use queue to avoid long dml
I am new in designing a ETL process. Currently I have two database, one is the live database where the application use it for every day transaction. The other one is the data warehouse.
I have a table in the live database that regularly have new data insert into it. The goal is that every night the ETL Process will transfer the data in the live database to the data warehouse, follow by deleting the data in the live database.
Due to my lack of knowledge, the solution that I got is to implement something call a rolling table. Basically on the live database, I have two tables that have the same structure. I call them tblLive1 and tblLive2. I also has a synonym call tblLive. All insert is done on the synonym. The synonym would point at one of the table.
When I run the ETL process, I have a stored procedure that would drop and create a new synonym that point to tblLive2. This allow the ETL process to transform data from tblLive1 without effecting the application. The assumption is that the ETL Process takes an hour to run, and I won't want the ETL process lock the table preventing the application insert new data to it.
This solution should theoretically work, but not elegant.
I am sure this problem is a common problem, are there any other solutions out there?
To add to Bob's answer (above), It is usual in DWH/BI applications, that all necessary tables are essentially copied into a "staging" database or a "staging" schema on your DWH database(depending on the number of tables / size etc). These would ordinarily be on a different server to your OLTP system - for a DWH implementation of any size that is)
To answer the question on performance impact, it depends on your server spec/io configuration.
Is data being inserted into the OLTP system 24hours/day? or are there downtimes? or low traffic times?
It might be worthwhile using database compression as IO is going to be your biggest enemy and this will help considerably.
Read the table into a staging area and process the staging table. You usually want to spend as little time on the production system as you have too. Especially if it is in use.
You may also want to look into using tables loaded by a trigger. Or Change Data Capture if you are on SQL 2008