We are using python snowflake connector to execute queries. We recently made code changes to encapsulate few queries to execute under a single transaction. As part of the testing, we want to make sure those queries are all executed under a single transaction. We've checked under history tab. We just see the queriy id, sql text, session id and other non-transactional details.
Is there any way to know the list of queries executed under a transaction? Thanks!
You should just make sure that the queries are running within the same Session and that you've started the transaction by issuing a begin statement in the first query.
Once you've done that you can run the select current_transaction(); and print it to the screen at the beginning and the end of your python script to debug check if the transaction IDs are the same.
Alternatively you should be able to use a python context manager as in this example from the Snowflake documentation to manage the transaction. Just makes sure you set the connection parameter autocommit = False
If you are using python connector, you could leverage the query_tag to "mark" a transaction within a session. You'd just need to set the value at the beginning and end of your transaction. This would allow you to query for a specific query_tag in query_history, or maybe group statements to report on duration or other properties.
https://docs.snowflake.net/manuals/user-guide/python-connector-example.html#setting-session-parameters
There is nothing that provides transaction information. At this point you're best bet to log the information yourself.
Related
I have some SQL statements in a batch that I want to profile for performance. To that end, I have created a stored procedure that logs execution times.
However, I also want to be able to roll back the changes the main batch performs, while still retaining the performance logs.
The alternative is to run the batch, copy the performance data to another database, restore the database from backup, re-apply all the changes made that I want to profile, plus any more, and start again. That is rather more time-consuming than not including the act of logging in the transaction.
Let us say we have this situation:
BEGIN TRANSACTION
SET #StartTime = SYSDATETIME
-- Do stuff here
UPDATE ABC SET x = fn_LongRunningFunction(x)
EXECUTE usp_Log 'Do stuff', #StartTime
SET #StartTime = SYSDATETIME
-- Do more stuff here
EXEC usp_LongRunningSproc()
EXECUTE usp_Log 'Do more stuff', #StartTime
ROLLBACK
How can I persist the results that usp_Log saves to a table without rolling them back along with the changes that take place elsewhere in the transaction?
It seems to me that ideally usp_Log would somehow not enlist itself into the transaction that may be rolled back.
I'm looking for a solution that can be implemented in the most reliable way, with the least coding or work possible, and with the least impact on the performance of the script being profiled.
EDIT
The script that is being profiled is extremely time-consuming - taking from an hour to several days - and I need to be able to see intermediate profiling results before the transaction completes or is rolled back. I cannot afford to wait for the end of the batch before being able to view the logs.
You can use a table variable for this. Table variables, like normal variables, are not affected by ROLLBACK. You would need to insert your performance log data into a table variable, then insert it into a normal table at the end of the procedure, after all COMMIT and ROLLBACK statements.
It might sound a bit overkill (given the purpose), but you can create a CLR stored procedure which will take over the progress logging, and open a separate connection inside it for writing log data.
Normally, it is recommended to use context connection within CLR objects whenever possible, as it simplifies many things. In your particular case however, you wish to disentangle from the context (especially from the current transaction), so regular connection is a way to go.
Caveat emptor: if you never dabbled with CLR programming within SQL Server before, you may find the learning curve a bit too steep. That, and the amount of server reconfiguration (both the SQL Server instance and the underlying OS) required to make it work might also seem to be prohibitively expensive, and not worth the hassle. Still, I would consider it a viable approach.
So, as Roger mentions above, SQLCLR is one option. However, since SQLCLR is not permitted, you are out of luck.
In SQL Server 2017 there is another option and that is to use the SQL Server extensibility framework and the support for Python.
You can use this to have Python code which calls back into your SQL Server instance and executes the usp_log procedure.
Another, rather obscure, option is to bind other sessions to the long-running transaction for monitoring.
At the beginning of the transaction call sp_getbindtoken and display the bind token.
Then in another session call sp_bindsession, and you can examine the intermediate state of the transaction.
Or you can read the logs with (NOLOCK).
Or you can use RAISERROR WITH LOG to send debug messages to the client and mirror them to the SQL Log.
Or you can use custom user-configurable trace events, and monitor them in SQL Trace or XEvents.
Or you can use a Loopback linked server configured to not propagate the transaction.
I'm a last year college student and I'm doing my thesis right now. My title is "Index Suggestion based on Log Analysis". This project will analyze the PostgreSQL transaction log to give index recommendation to the database that will be tested.
This research will develop an index recommender tool by analyzing the attribute that is frequently accessed (using SELECT statement).
But, I found it's hard to find the PostgreSQL log file. My question is, where can I find PostgreSQL log transaction dataset? Or maybe other database log transaction dataset?
You are mixing up the transaction log (WAL) and the regular text log file.
The latter does contain statements (if the configuration is set like that), while the transaction log doesn't contain statements at all, just binary information about what has changed in which block.
You won't be able to recommend an index just from looking at the query, I can't do that either.
I have a suggestion for you: if you want to write a tool that suggests indexes, it should take the output of EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT /* your query */ as input.
Moreover, the tool will have to be connected to the database to query table and index metadata (and perhaps statistics). That makes you dependent on the database version, because metadata can change (and do – see partitioned tables), but that won't concern you so much in a thesis paper.
The task is still not simple (query optimization is AI), but then you have at least a chance.
A bit late to the party here, but the thing you would probably want in practice is pg_stat_statements. Use it to list the queries with the highest total_exec_time, and look at their query plans. Then you would consider adding indexes that would speed up joins or scans in those queries.
This should be possible to automate to some extent. Similarly, recommending indexes to drop should be possible to do using index usage statistics. Personally, I'd love to have a tool that does this kind of suggestions automatically, and it would be a great example of profile guided optimization.
You need to run the query below then restart PostgreSQL to enable logging persistently. *The parameter with ALTER SYSTEM SET is set to postgresql.auto.conf rather than postgresql.conf:
ALTER SYSTEM SET log_statement = 'all';
And, you need to run either of the queries below then restart PostgreSQL to disable logging persistently:
ALTER SYSTEM RESET log_statement;
Or:
ALTER SYSTEM SET log_statement = 'none';
You can also run the query below then need to restart PostgreSQL to enable logging persistently:
ALTER SYSTEM SET log_min_duration_statement = 0;
And, you can also run either of the queries below then need to restart PostgreSQL to disable logging persistently:
ALTER SYSTEM RESET log_min_duration_statement;
Or:
ALTER SYSTEM SET log_min_duration_statement = -1;
You can see my answer explaining more about how to enable and disable query logs on PostgreSQL.
Inside a transaction that have a savepoint I have to make a join with a table that is in a linked server. When I try to do it, I get the error message:
“Cannot use SAVE TRANSACTION within a distributed transaction”
The remote table data rarely changes. It is almost fixed. Is is possible to tell SqlServer to exclude this table from the transaction? I've tried a (NOLOCK) hint, but it isn't possible to use this hint for a table in a linked server.
Does anyone knows about a workaround? I'm using the ole SqlServer 2000.
One thing that you could do is to make a local copy of the remote table before you start the transaction. I know that this may sound like a lot of overhead, but remote joins are frequently a performance problem anyway and the SOP fix for that is also to make a local copy.
According to this link, the ability to use SAVEPOINTs in a Distributed transaction was dropped in SQL 7.
To allow application migration from Microsoft SQL Server 6.5 when
savepoints inside distributed transactions are in use, Microsoft SQL
Server 2000 Service Pack 1 introduces a trace flag that allows a
savepoint within a distributed transaction. The trace flag is 8599 and
can be turned on during the SQL Server startup or within an individual
session (that is, prior to enabling a distributed transaction with a
BEGIN DISTRIBUTED TRANSACTION statement) by using the DBCC TRACEON
command. When trace flag 8599 is set to ON, SQL Server allows you to
use a savepoint within a distributed transaction.
So unfortunately, you may either have to drop the bounding ACID transaction, or change the SPROC on the remote server so that it doesn't use SAVEPOINTs.
On a side note (Although I have seen that you have tagged it SQL SERVER 2000) but to make a point that SQL SERVER 2008 has remote proc trans Option for this.
In this case if the distributed table is not too large I would copy it to a temp table. If possible, include any filtering to get the number of rows to a minimum. Then you can proceed normally. Another option since the data changes rarely is copy the data to a permanant table and checking if anything has changed to prevent sending to much data over the network every time you run the transaction. You could only pull over the recent changes.
If you wish to handle transaction from UI level and you have Visual Studio 2008/.net fx 3.5 or + framework then you can wrap your logic with TransactionScope Class. If you dont have any frontends and you are working only on Sql Servers kindly ignore my answer...
I have two instances of database located in two servers. I want to create an application to insert data into the first one and then update data on the second instance, if one of these process fail then I want to rollback all operations.
The database servers do not enable DTC/MSDTC. I tired to use transaction scope but no luck. Do you guys have any idea how can I do this?
if one of these process fail then I want to rollback all operations
You are describing a distributed transaction. To use distributed transactions, you need a transaction coordinator. You can't have the cake and eat it too.
There are alternatives if you consider asynchronous application of the changes, ie. Replication. This removes the distributed transaction requirement but the changes are applied asynchronously to the second server, after they are committed on the first server.
One option would be to put compensation code into your application. For example, if your application were c# based, you could have a try...catch block. In the catch block, you could add compensation code to "undo" the changes you made to the data on the first server.
The best method however, is of course to make a case to the DBAs to enable DTC
I am a part time developer (full time student) and the company I am working for uses SQL Server 2005. The thing I find strange about SQL Server that if you do a script that involves inserting, updating etc there isn't any real way to undo it except for a rollback or using transactions.
You might say what's wrong with those 2 options? Well if for example someone does an update statement and forgets to put in a WHERE clause, you suddenly find yourself with 13k rows updated and suddenly all the clients in that table are named 'bob'. Now you have the wrath of 13k bobs to face since that "someone" forgot to use a transaction and if you do a rollback you are going to undo critical changes that were needed in other fields.
In my studies I have Oracle. In Oracle you can first run the script then commit it if you find that there isn't any mistakes. I was wondering if there was something that I missed in SQL Server since I am still relatively new in working developer world.
I don't believe you missed anything. Using transactions to prevent against these kind of errors is the best mechanism and it is the same mechanism Oracle uses to protected the end user. The difference is that Oracle implicitly begins a transaction for you whereas in SQL Server you must do it explicitly.
SET IMPLICIT_TRANSACTIONS is what you are probably looking for.
I'm no database/sql server expert and I'm not sure if this is what you're looking for, but there is the possibility to create snapshots of a database. A snapshot allows you to revert the database to that state at any time.
Check this link for more information:
http://msdn.microsoft.com/en-us/library/ms175158.aspx
I think transactions work well. You could rollback the DB (to a previous backup or point in the log), but I think transactions are much simpler.
How about this: never make changes to a production database that have not 1st been tested on your development server, and always make a backup before trying anything that is un-proven.
From what I understand, SQL Server 2008 added an Auditing feature that logs all changes made by users to the various databases and also has the option to roll them back after the fact.
Again, this is from what I've read or overheard from our DBA, but might be worth looking into.
EDIT: After looking into it, it appears to only give the ability to rollback on schema changes, not data modifications (DDL triggers).
If I am doing something with any risk in SQL Server, I write the script like this:
BEGIN TRAN
Insert .... whatever
Update .... whatever
-- COMMIT
The last line is a comment on purpose: I first run the lines before, then make sure there's no error, and then highlight just the word Commit and execute that. This works because in Management Studio you can select a part of the T-SQL and just execute the selected portion.
There are a couple of advantages: Implicit Transactions works too, but it's not the default for SQL Server so you have to remember to turn it on or set options to do that. Also, if it's on all the time, I find it's easy for people to "forget" and leave uncommitted transactions open, which can block others. That's mainly because it's not the default behavior and SQL Server folks aren't used to it.