PowerShell Invoke-SqlCmd and memory usage

PowerShell Invoke-SqlCmd and memory usage - sql-server

If I execute the following Powershell command:
Invoke-Sqlcmd `
-ServerInstance '(local)' -Database 'TestDatabase' `
-Query "select top 1000000 * from dbo.SAMPLE_CUSTOMER" | Out-Null
I see my memory usage go through the roof. It uses 1GB of memory.
If I start the command again, memory increases to 1.8GB of memory, then it gets reduced to 800MB (garbage collect?) and starts growing again.
I tried to reduce memory footprint for the PowerShell shell and plugins to 100MB by following the steps in the article http://blogs.technet.com/b/heyscriptingguy/archive/2013/07/30/learn-how-to-configure-powershell-memory.aspx, but I still see memory increase far above the configured 100MB.
I have some questions:
Why does PowerShell not respect the memory limitations given by the setting MaxMemoryPerShellMB?
Why does Invoke-Sqlcmd eat memory and doesn't it "forget" the records processed in the pipeline
Why does the PowerShell process not reclaim memory automatically when finished processing?
How can I process many SQL records without a large memory footprint?

Related

Powershell ForEach - new loops starting before current loops ending

I have a quick and dirty Powershell script for running sql scripts against many servers sequentially. I have the servers stored in an array, and loop through them with ForEach. The ForEach roughly looks like this:
ForEach ($Server in $ServerList) {
Write-Host "Executing $Script against $Server..."
Invoke-SqlCmd ........
}
But the problem I have is my output looks something like this:
Executing script.sql against Server1
Executing script.sql against Server2
Executing script.sql against Server3
<Output from Server1>
<Output from Server2>
<Output from Server3>
Executing script.sql against Server4
<Output from Server4>
Executing script.sql against Server5
<Output from Server5>
...you get the idea. Is there any way to marry up the outputs so that the output appears under the message depicting which server is currently being executed on? It would help with using output for debugging etc. Executing on PS7 by the way.

What you're observing here is not that the next iteration of the foreach loops starts before the last one has ended - foreach (the loop statement, not the cmdlet) only invokes the loop body in series, never concurrently.
That doesn't mean the next iteration won't start before the formatting subsystem in the host application (eg. powershell.exe or powershell_ise.exe or pwsh.exe) has written any output from the loop body to the screen buffer.
The default host applications usually waits a few 100 milliseconds to see if there's more than one output object of the same type in the output stream - which will then inform how to format the output (table vs list view etc.).
Write-Host on the other hand is an instruction to bypass all that and instead write a message straight to the host application.
So this differentiated delay makes it look like the Write-Host statement at the top is being executed before the code from 2 iterations back - but what you're observing is actually an intentional decoupling of output vs rendering/presentation.
As zett42 notes, you can force the host application to synchronously render the output you want displayed in order by piping it to Out-Host:
ForEach ($Server in $ServerList) {
Write-Host "Executing $Script against $Server..."
Invoke-SqlCmd ........ |Out-Host
}
Since PowerShell must now fulfill your request to render the output before it can move on with the next statement/iteration, it'll no longer delay it for formatting purposes :)

Typecast bytes in powershell

In C, I have the following code
struct _file_header* file_header = (struct _file_header*)(cfg_bytes + (cfg_size - 16));
Which effectively fills the _file_header structure with the bytes from memory. I want to do the same thing in Powershell, but I am confused with the [System.Text.Encoding]::ASCII.GetBytes returning some bytes that aren't related to the data I see in cfg_bytes
Take the following PowerShell code,
$cfg_bytes = (Get-Content $env:ProgramData'\config.bin' -Raw)
$cfg_size = [System.Text.Encoding]::ASCI.GetByteCount($cfg_bytes)
$file_header = $cfg_bytes.Substring($cfg_size - 16, 16)
When I Write-Output [System.Text.Encoding]::ASCII.GetBytes($file_header), the output is not the same as I see from my debugger memory viewer. How can I obtain the bytes in the same format as the C example, such that I'd be able to read the same structure in PowerShell?

As Santiago Squarzon
suggests, use the following to directly get a file's raw bytes in PowerShell:
In Windows PowerShell:
$cfg_bytes = Get-Content -Encoding Byte -Raw $env:ProgramData\config.bin
In PowerShell (Core) 7+:
$cfg_bytes = Get-Content -AsByteStream -Raw $env:ProgramData\config.bin
Note: This breaking change in syntax between the two PowerShell editions is unfortunate, and arguably should never have happened - see GitHub issue #7986.
Note that adding -Raw to the Get-Content call - when combined with -Encoding Byte / -AsByteStream - efficiently returns all bytes as a single array, strongly typed as [byte[]].
Without -Raw, the bytes would be streamed - i.e., output to the pipeline one by one - which, if the output isn't accumulated, keeps memory use constant - but is much slower.

clickhouse - Clickhouse imported data is forcibly killed by the system

When I import a 120g text file to the Clickhouse, there are 400 million data in it. After importing more than 100 million data, I will be killed.
The import statement is as follows:
clickhouse-client --user default --password xxxxx --port 9000 -hbd4 --database="dbs" --input_format_allow_errors_ratio=0.1 --query="insert into ... FORMAT CSV" < /1.csv
The error is as follows:
2021.04.29 10:20:23.135790 [ 19694 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).
Is the imported file too large, bursting the memory? Should I subdivide the file again?

take a look at system logs - they should have some clues:
as suggested in the error message - run dmesg and see if there's mention of OOM Killer [ kernel self-protection mechanism triggering on out-of-memory events ]. if that's the case - you're out of memory or you've granted too much memory to clickhouse.
see what clickhouse own logs tell. path to the log file is defined in clickhouse-server/config.xml, under yandex/logger/log - it's likely /var/log/clickhouse-server/clickhouse-server.log + /var/log/clickhouse-server/clickhouse-server.err.log

Make Powershell Wait for SQL Script/Query to Finish

So I am trying to write a Powershell script that creates a backup of a databases, compresses the backup, and uploads it to an FTP site. Here is a part of my script
Sample Script/Code:
Write-Host "Backup of Database " $databaseName " is starting"
push-location
Invoke-Sqlcmd -Query "--SQL Script that backs up database--" -ServerInstance "$serverName"
pop-location
Write-Host "Backup of Database " + $databaseName " is complete"
#Create a Zipfile of the database
Write-Host "Compressing Database and creating Zip file...."
sz a -t7z "$zipfile" "$file"
Write-Host "Completed Compressing Database and creating Zip file!"
I am wanting to prevent any code after the "Invoke-Sqlcmd......." part from being executed until the SQL script backing up the database is complete because the compression line is failing to find the backup of the database because the backup takes a fairly long time to complete.
I am extremely new to using Powershell and didn't quite understand what a couple of the other possibly related questions I found were offering as a solution as I call my SQL statement a different way.
Possible Related Questions:
Get Powershell to wait for an SQL command to execute
Powershell run SQL job, delay, then run the next one

Are you sure your ...script that backs up the database isnt just throwing an error and the ps continuing?
This seems to indicate that it does in fact wait on that call:
Write-Host "starting"
push-location
Invoke-Sqlcmd -Query "waitfor delay '00:00:15';" -ServerInstance "$serverName"
pop-location
Write-Host "complete"
In any case, you should guard against the file existing, by either aborting if the file does not exist or polling until it does (i'm not 100% on when the .bak file is written to disk).
# abort
if(!(test-path $file)) {
}
# or, poll
while(!(test-path $file)) {
start-sleep -s 10;
}

osql vs Invoke-Sqlcmd-- redirecting output of the latter

We're moving from a batch file that calls osql to a Powershell script which uses the Invoke-Sqlcmd cmdlet.
Would anyone know what the equivalent steps are for redirecting the output in the latter case, to using the -o flag in osql? We have some post-processing steps that look at the osql output file and act accordingly (report an error if those logs are greater than X bytes). I would very much like it if Invoke-Sqlcmd could duplicate the same output information given the same SQL commands going in.
Right now in my script I'm planning to call Invoke-Sqlcmd <...> | Out-file -filepath myLog.log. Anyone know if this is ok or makes sense?

From the documentation for the cmdlet itself:
Invoke-Sqlcmd -InputFile "C:\MyFolder\TestSQLCmd.sql" | Out-File -filePath "C:\MyFolder\TestSQLCmd.rpt"
The above is an example of calling Invoke-Sqlcmd, specifying an input file and piping the output to a file. This is similar to specifying sqlcmd with the -i and -o options.
http://technet.microsoft.com/en-us/library/cc281720.aspx

I think you'll find it's difficult to reproduce the same behavior in invoke-sqlcmd as I have.
osql and sqlcmd.exe will send T-SQL PRINT and RAISERROR and errors to the output file.
Using Powershell you can redirect standard error to standard output with the standard error redirection technique (2>&1):
Invoke-Sqlcmd <...> 2>&1 | Out-file -filepath myLog.log
However this still won't catch everything. For example RAISERROR and PRINT statements only output in Invoke-sqlcmd when using the -verbose parameter as documented in help invoke-sqlcmd. In Powershell V2 you can't redirect verbose output. Although you can with Powershell V3 using 4>
For these reason and others (like trying to recreate all the many different options in sqlcmd) I switched back to using sqlcmd.exe for scheduled job in my environment. Since osql.exe is deprecated, I would suggest switching to sqlcmd.exe which supports the same options as osql.

You can still call osql from PowerShell. I would continue to do just that. Invoke-SqlCmd returns objects representing each of the rows in your result set. If you aren't going to do anything with those objects, there's no reason to upgrade to Invoke-SqlCmd.