Batching separate prepared npgsql commands - npgsql

Say you have three prepared NpgsqlCommands. Because they must be executed separatedly. But sometimes two or all three must be executed at once (well, one after the other).
Is there a way to batch and reuse these previously prepared commands using a single roundtrip to the server? (goal is to minimize latency)
Now I use a fourth command and semicolon separate a copy of the original three commands, and prepare this -- but I assume this will use more resources at the server, and more SQL parsing at the npgsql client.

Npgsql supports batching through including several statements in your CommandText, separated by semicolons. These are executed in a single network roundtrip, and can also be prepared:
cmd.CommandText = "SELECT ...; UPDATE ...";
cmd.Prepare();
Internally, Npgsql splits such commands on semicolons and prepares each statement separately (PostgreSQL does not actually recognize batches, only individual statements). In addition, Npgsql manages prepared statements on a statement-by-statement level, and knows to reuse already-existing statements. This means that if you prepare two commands which contain the same statements, those statements will share the same server-side prepared statement resource.

Related

Is it possible to process a for loop in parallel within the same stored procedure?

Have a SQL Server stored procedure where I want to loop through a set of options that I read from a table. So say a table has 100 options. My stored procedure will loop through these options and for each option I need to do some checks - by querying few specific tables based on the option and flag a status related to it.
Is it possible for me to split the for loop such that row 1 -50 are processed in one loop and row 51-100 in another loop and I am able to run both of these in parallel?. I see a way where you can run multiple stored procedure in parallel through a SQL job or other means but not able to see if I can get a for loop to execute in parallel by splitting it.
Treating your question as academic, and not considering whether a set-based solution might exist, since there isn't nearly enough information to do that.
No you can't do this in a single loop (or in two separate loops for that matter) using standard TSQL, because TSQL is synchronous. Even if you "split" the loop, the second procedure call could not start until the first call finished. They would not run in parallel.
To run two loops in parallel, you would have to introduce some other language. The results of this search turned up quite a few ideas but the first few I looked at had lots of warnings of pitfalls and unexpected results. Up to you if you want to experiment with any of them.

Is there a way to force a JDBC prepared statement to re-prepare in Postgresql?

I have reviewed the similar question See and clear Postgres caches/buffers? , but all of the answers focus on the data buffers, and Postgresql has changed a lot since 2010.
Unlike the OP of that question, I am not looking for consistent behavior when computing performance, I am looking for adaptive behavior as the database changes over time.
In my applicaiton, at the beginning of a job execution, rows in the working tables are empty. Queries run very quickly, but as time goes on performance degrades because the prepared statements are not using ideal access paths (they were prepared when the tables were empty - doh!). Since a typical execution of the job will ultimately cover a few hundred million rows, I need to minimize all of the overheads and periodically run statistics to get the best access paths.
In SQLServer, one can periodically call update statistics and DBCC FreeProccache, and the prepared statements will automatically be re-prepared to use the new access paths.
Edit: FreeProcCache: in SQLServer, prepared statements are implemented as stored procedures. FreeProcCache wipes the compiled stored procedures so that they will be recompiled on the next invocation, and the new access paths come into effect immediately.
Edit: Details of postgresql management of prepared statements: Postgresql defers the prepare until the first call to EXECUTE, and caches the result ofthe prepare after the 5th execution. Once cached, the plan is fixed until the session ends or the prepared statement is freed with DEALLOCATE. Closing JDBC objects does not invoke DEALLOCATE, as an optimization to support open/read/close programming like many web apps display.
Is there a way to force a (Edit)JDBC prepared statement to recompile, (Edit) after running ANALYZE, so it will use the latest statistics?
EDIT: I am using JDBC PreparedStatement to prepare and execute queries against the database and the Postgres JDBC driver.
The way Postgresql updates statistics is via ANALYZE. This is also autoexecuted after a VACUUM run (since VACUUM frees references, and truncates empty pages, I would imagine much like your FreeProccache).
If autovacuum is enabled (the default), ANALYZE will be autorun according to the autovacuum cadence.
You do not need to "recompile" the prepared statement to pick up the new statistics in most cases because it will re-plan during each EXECUTE, and a parameterized prepared statement will replan based on the parameter values and the updated statistics. EDIT: The edge case described is where the query planner has decided to force a "generic plan" because the estimated cost of the specific plan exceeds the cost of such "generic plan" after 5 planned-executions.
Edit:
If you do reach this edge case, you can "drop" the prepared statement via DEALLOCATE (and then a re-PREPARE).
You may want to try ANALYZE before EXECUTE, but this will not guarantee a better performance...
Please ensure you really want to re-prepare statements. It might be the case you just want to close DB connection from time to time so statements get prepared "from scratch"
In case you really understand what you are doing (there might be valid reasons like you describe), you can issue DEALLOCATE ALL statement (it is a PostgreSQL-specific statement to deallocate all prepared statements). Recent pgjdbc versions (since 9.4.1210, 2016-09-07) handle that just fine and re-prepare the statements on subsequent use

How to run multiple stored procedures in parallel using ssis?

We have a list of stored procedures (more than 1000) in a table which need to be executed every morning.
The stored procedures do not have any dependency with each other.
We have tried while loop and cursor it used to takes a lot of time in execution.
We taught of creating job for each stored procedure and call them using sp_start_job (sp_start_job is called in async manner) we got level of parallelism.
Problem arise when a new stored procedure is added to list and it became huge.
some time ppl missed to create job related new stored procedure
DB got bombarded with a no of jobs (manageability issue for DBA)
Note: list of may altered any day (stored procedures can be added or removed from list).
If the SPs run for longer, I would have categorized the 1000 SPs into 5-10 numbers, then 1 SSIS package for each category and then Agent Jobs for each package. Then, schedule those jobs at same time.
There are many ways like Loops, Scripting and multiple factors to achieve it. You can test with different ways and go with the best one.
Note: Performance of the SSIS execution depends on your Memory, Processor and Hardware.
Adding to # Nick.MacDermaid - you can utilize MaxConcurrentExecutables property of package to implement custom parallelism. Of course you would need to have multiple containers and corresponding stored proc groups.
Parallel Execution in SSIS
MaxConcurrentExecutables, a property of the package. It defines how
many tasks (executables) can run simultaneously. It defaults to -1
which is translated to the number of processors plus 2. Please note
that if your box has hyperthreading turned on, it is the logical
processor rather than the physically present processor that is
counted.
Hi you can use the following piece of code to get basically write the script of running all your stored procedures if you add a new procedure it will automatically be added to the list
SELECT'EXEC '+SPECIFIC_NAME [Command] + ';'
FROM information_schema.routines
WHERE routine_type = 'PROCEDURE'
After this you take the result set put it into a tab delimited text file and save the file in a location.
use this link to import the text into a execute SQL task the first answer works well
SSIS: How do I pull a SQL statement from a file into a string variable?
execute the task and it should work, if you need to narrow the ;ist of procedures you can specify a specific prefix in the name of the procedure and use that in the where clause
It will run in serial, sorry i dont have enough rep to comment yet

Handling batch T-SQL statements with GO commands

I have a Node.js app that is handling database build / versioning by reading multiple .sql files and executing them as transactions.
The problem I am running into is that these database build scripts require a lot of GO statements, as you cannot execute multiple CREATEs, etc. in the same context.
However, GO is not T-SQL and errors when used outside of a Microsoft application context.
Take the following shorthand:
CREATE database foo /* ... */
use [foo]
CREATE TABLE bar /* ... */
This would error if GO statements were not injected between each line.
I would rather not break this into multiple .sql files for every separate transaction - I am building a database and that would turn into hundreds of files!
I could run String.split() functions on all go statements and have node execute each as a separate transaction, but that seems very hacky.
Is there any standard or best-practice solution to this type of problem?
Update
Looks like a semicolon will do the trick for everything except CREATE statements on functions, stored procedures, etc. Doesn't apply to tables or databases, though which is good.
I ended up:
Parsing the SQL file into an array of statements split on lines whose only content is GO statements & comments,
Creating a SQL transaction,
Executing all statements in the array in sequence,
Conditionally rolling back the entire transaction on any single statement fail,
Ending the transaction once all queries have completed synchronously.
A bit of work, but I think that was the best way to do it.

T-SQL GO in UPDATE statements

I have a single derived field that is populated by a series of update statements, each statement joining to a different table and different fields. It is important that the series of updates execute in a specific order, i.e. a join to table A may produce result X then a join to table B produces result Y in which case I want result Y. Normally I just create a series of Update statments in the appropriate order and store them either in a single SSIS SQL container or in a single stored procedure. Is there a best practice regarding using or not using a GO command or BEGIN END between these update statements?
Why do you think consecutive statements would be executed out of order? Do you have specific locking hints on any of the statements (e.g. UPDLOCK, HOLDLOCK, etc.)? Otherwise if you have two consecutive statements, A and B, and A changes something, B will see that change. How that works in SSIS may be different if you have some branching or multi-threading capabilities, but this is not possible in a stored procedure.
Also GO is not a T-SQL command, it is a batch separator recognized by certain client tools like Management Studio. If you try to put a GO between two statements in a stored procedure, one of two things will happen:
the procedure will fail to compile (if the opening BEGIN doesn't have a matching END right before the GO).
the procedure will compile (if there is no BEGIN/END wrapper), but it will be shorter than you thought, ending at the first GO rather than where you intended.
Statements are executed in exactly the order that you write them in. You don't need GO or BEGIN...END to ensure ordering. For that reason using either of these has no effect. They also have nothing to do with transactions.

Resources