How are DDL changes replicated in PostgreSQL - database

In PostgreSQl 9.1, the streaming replication is done by streaming WALs records which is generated by UPDATEs and DELETEs on the Master’s data.
How are the DDL changes replicated? Those are not the part of WALs.

Postgresql's Write Ahead Log (WAL) does contain DDL. In Postgresql, DDL is transactional, just like DML. All goes through the WAL.
See http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis

To elaborate on Colin's answer, almost everything goes through the write-ahead log. It is a block level journal that records every write that will be made to any database structure. Every change to any part of the data directory is first recorded in the WAL. That's because the primary purpose of the WAL is to allow replay of changes if the system crashes or loses power, so it needs to record every single planned disk write.
In PostgreSQL, tables, views, etc are just entries in the system catalog tables. Changes to these catalogs get write-ahead logged along with everything else. The same is true of database creation; a db is just an entry in pg_database and the corresponding directory structure.
Changes to tables made by VACUUM, CLUSTER, TRUNCATE etc; they all go through WAL, either with block-level change records or special WAL entries to describe the operation.
Only a few non-durable things don't go through WAL, like:
changes to UNLOGGED and TEMPORARY tables
Temp files for on-disk sorts

Related

Temporal Tables and Log shipping

We are building a system in our company which will need temporal tables in sql server and might need log shipping as well. I wanted to know if there are any unexpected impacts of log shipping on a temporal table that wouldn't happen on a normal table?
I would expect no impact in either direction (that is, log song won't change the temporal table nor will the temporal table change log shipping). At its core, log shipping is just restoring transaction logs on another server. And temporal tables are (more or less) a trigger that maintains another table on data mutations. That extra work will be present in the log backup and will restore just fine at the log shipping secondary
Previously in the company we used temporary tables, but when the volume of business data had to grow, we had to change the queries using WITH
WITH table_alias AS (SELECT ...)

Disable transactions on SQL Server

I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.

SQL Server - Tempdb vs. Database Log usage

This may be a very basic question, but how can you determine beforehand whether a large operation will end up using database log or tempdb space?
For instance, one large insert / update operation I did used the database log to a point where we needed to employ SSIS & bulk operations just so the space wouldn't run out, because all the changes in the script had to be deployed at one time.
So now I'm working with a massive delete operation, that would fill the log 10 times over. So I created a script to check the space used by the database log file and delete the rows in smaller batches, with the idea that once the log file was large enough, the script would abort and then continue from that point the next day (allowing normal usage to continue till the next backup, without risk of the log running out of space).
Now, instead of filling the log, the latter query started filling up tempdb. Tempdb data file, not log file, to be specific. So I'm thinking there's a huge hole where my understanding of these two should be. :)
Thanks for any advice!
Edit:
To clarify, the question here is that why does the first example use database log, while the latter uses tempdb data file, to store the changes? And in general, by which logic are DML operations stored to either tempdb or log? Normally log should store all DB changes while tempdb is only used to store the processed data during operation when explicitly requested (ie, temp objects) or when the server runs out of RAM, right?
There is actually quite a bit that goes on behind the scenes when deleting records from a table. This MSDN Blog link may help shed some light on why tempdb is filling up when you try and delete. Either way, the delete will fill up the transaction logs as well, it just sounds like tempdb is filling up before it gets to the step of logging the transaction(s).
I'm not entirely sure what your requirements are, but the following links could be somewhat enlightening on your transaction logging issues. These are all set for SQL Server 2008 R2, but you can switch to whatever version you are running.
Recovery Model Overiew
Considerations for Switching from the Simple Recovery Model
Considerations for Switching from the Full or Bulk-Logged Recovery Model
You also have the option of truncating the table, but that depends on a few things. If you don't need the operation to be logged and you're deleting all the records from the table you can truncate. If you are doing some sort of conditional delete, but you're deleting more than you're keeping, you could always insert all of the records you want to keep into another "staging" table and then truncate the original. Then you can re-insert the records into the staging table. However, that really only works when you have no foreign key relationships on that table.

Archiving a huge database(oracle) without impacting processes that inserts records to it

We have an audit database (oracle) that holds monitor information of all activities performed by services (about 100) deployed on application servers. As you may imagine the audit database is really huge because of the volume of requests the services process. And the only write transaction that occurs on this database is services writing audit information in real-time.
As the audit database started growing (more than a million records per day), querying required data (for example select all errors occurred with service A for requests between start date and end date) quickly became nearly impossible.
To address this, some "smart kids" decided to device a batch job that will copy data from the database over to another database (say, audit_archives) and delete records so that only 2 days worth of audit data is retained in audit database.
This initially looked neat but whenever the "batch" process runs, the audit process that inserts data to audit database starts to become very slow - and sometimes the "batch" process also fails due to database contention.
What is a better way to design this scenario to perform above mentioned archival in most efficient way so that there is least impact to the audit process and the batch?
You might want to look into partitioning your base table.
Create a mirror table (as the target of the "historic" data) and create the same partitioning scheme on that one (most probably on a per-date basis).
Then you can simply exchange the "old" partitions (using ALTER TABLE the_table EXCHANGE partition) from one table to the other. Should only take a few seconds to "move" the partition. The actual performance would depend on the indexes defined (local, global).
This technique is usually used to do it the other way round (prepare new data to be fed into a reporting table in a datawarehouse environment) but should work for "archiving" as well.
I Easy way.
delete old records partially the best with FORALL statement
copy data partially the best with FORALL
add partitioning based on day of the week
II Queues
delete old records partially the best with FORALL statement
fill audit_archives with trigger on audit, in trigger use queue to avoid long dml

Where should transaction records go? Flat file or Database

I'm developing a Java Enterprise Application which needs to write transaction records either to flat files or directly to a relational database. Transaction records are records which show when the transaction starts, ends, transaction status (success/failure) and also data unique to this transaction.
These transaction records will then be used to generate reports. The reports generating tool reads data from a database and generates them.
If flat file is used, records will be eventually loaded into database for reports generating. This adds an extra step.
If database is used, there will be no flat file. My concern is that if database is down, some records will be missing. Thus this approach is not as secure as the flat file one.
So, I cannot decide. Maybe there are other things I didn't consider? What's your view?
Thanks in advance.
If you DO use a flat file, you'll need to worry about locking and flushing and all of that garbage. Furthermore, it can only live in one place which makes it a pain if you ever want the app to scale. Go with the database unless downtime is a REALLY big concern.

Resources