I try to bulk load data from Oracle to SqlServer through Powershell Sqlserver Module Sqlbulkcopy
On small Data, everything works fine, but on big Datasets, even if bachsize and streaming are set, sqlbulkcopy is taking all the memory available... until an out of memory
Also the notify function seems to give no answer, so I guess even with streaming=True, the process first load everything to memory...
What did I missed ?
$current = Get-Date
#copy table from Oracle table to SQL Server table
add-type -path "D:\oracle\product\12.1.0\client_1\odp.net\managed\common\Oracle.ManagedDataAccess.dll";
#define oracle connectin string
$conn_str = "cstr"
# query for oracle table
$qry = "
SELECT
ID,CREATEDT,MODIFIEDDT
FROM MYTABLE
WHERE source.ISSYNTHETIC=0 AND source.VALIDFROM >= TO_Date('2019-01-01','yyyy-mm-dd')
";
# key (on the left side) is the source column while value (on the right side) is the target column
[hashtable] $mapping = #{'ID'='ID';'CREATEDT'='CREATEDT';'MODIFIEDDT'};
$adapter = new-object Oracle.ManagedDataAccess.Client.OracleDataAdapter($qry, $conn_str);
#$info = new-object Oracle.ManagedDataAccess.Client;
#Write-Host ( $info | Format-Table | Out-String)
$dtbl = new-object System.Data.DataTable('MYTABLE');
#this Fill method will populate the $dtbl with the query $qry result
$adapter.Fill($dtbl);
#define sql server target instance
$sqlconn = "cstr";
$sqlbc = new-object system.data.sqlclient.Sqlbulkcopy($sqlconn)
$sqlbc.BatchSize = 1000;
$sqlbc.EnableStreaming = $true;
$sqlbc.NotifyAfter = 1000;
$sqlbc.DestinationTableName="DWHODS.MYTABLE";
#need to tell $sqlbc the column mapping info
foreach ($k in $mapping.keys)
{
$colMapping = new-object System.Data.SqlClient.SqlBulkCopyColumnMapping($k, $mapping[$k]);
$sqlbc.ColumnMappings.Add($colMapping) | out-null
}
$sqlbc.WriteToServer($dtbl);
$sqlbc.close;
$end= Get-Date
$diff= New-TimeSpan -Start $current -End $end
Write-Output "import needed : $diff"
Thanks to Jeroen, I changed the code like this, now its no more consuming memory :
$oraConn = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($conn_str);
$oraConn.Open();
$command = $oraConn.CreateCommand();
$command.CommandText=$qry;
$reader = $command.ExecuteReader()
...
$sqlbc.WriteToServer($reader);
Output the results (XML) of a stored procedure to a file.
I have a stored procedure in SQL server that creates an XML file. It currently displays the resulting XML and I have to manually save as a file.
I have tried to call the procedure from Powershell, as in this question, this works for small files but not for large (>1gb files) as Powershell tries to store the entire thing as a variable and it quickly runs out of memory.
I'm opening this as a new question as I think there should be a way of doing this within SQL server (or a better way of doing it with Powershell).
You shouldn't use a stored procedure here. Just use better PowerShell. You can stream large types to and from SQL Server with SqlClient. So you just need to drop down and use ADO.NET instead of using the invoke-sqlcmd convenience method.
EG:
$conString = "server=localhost;database=tempdb;integrated security=true"
$sql = #"
select top (1000*1000) *
from sys.messages m
cross join sys.objects o
for xml auto
"#
$fn = "c:\temp\out.xml"
$con = new-object System.Data.SqlClient.SqlConnection
$con.connectionstring = $conString
$con.Open()
$cmd = $con.createcommand()
$cmd.CommandText = $sql
$cmd.CommandTimeout = 0
$rdr = $cmd.ExecuteXmlReader()
$w = new-object System.Xml.XmlTextWriter($fn,[System.Text.Encoding]::UTF8)
$w.WriteNode($rdr,$true)
$w.Close()
$rdr.Close()
write-host "Process Memory: $( [System.GC]::GetTotalMemory($false) )"
write-host "File Size: $( (ls $fn)[0].Length )"
outputs
Process Memory: 34738200
File Size: 468194885
Other solution if you can you have to build your XML file in Temporary Table line by line and then output and read the result line by line from Powershell or other code :
SQL Example :
/*
**
** Stored procedure
**
*/
/*** Effacement: ********************************************************
IF EXISTS ( SELECT name FROM sysobjects
WHERE type = 'P' AND name = 'procTEST' )
DROP PROCEDURE procTEST
*** Effacement: ********************************************************/
CREATE PROCEDURE procTEST
AS
CREATE TABLE #TEMP (vInfo VARCHAR(MAX), nLine int)
INSERT INTO #TEMP
SELECT 'Line 1',1
UNION ALL
SELECT 'Line 2',2
UNION ALL
SELECT 'Line 3',3
SELECT vInfo FROM #TEMP ORDER BY nLine ASC
SET NOCOUNT OFF
/*** TESTS ****************************************************************************************************************************************
sp_helptext procTEST
-- DROP PROCEDURE procTEST
EXEC procTEST
*** TESTS ****************************************************************************************************************************************/
Powershell Script :
$readconn = New-Object System.Data.OleDb.OleDbConnection
$writeconn = New-Object System.Data.OleDb.OleDbConnection
[string]$connstr="Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=TEST;Data Source=.\XXXXX;Workstation ID=OMEGA2"
$readconn.connectionstring = $connstr
$readconn.open()
$readcmd = New-Object system.Data.OleDb.OleDbCommand
$readcmd.connection=$readconn
$readcmd.commandtext='EXEC procTEST'
$reader = $readcmd.executereader()
# generate header
$hash=#{}
for ($i=0;$i -lt $reader.FieldCount;$i++){
$hash+=#{$reader.getname($i)=''}
}
$dbrecords=while($reader.read()) {
for ($i=0;$i -lt $reader.FieldCount;$i++){
$hash[$reader.getname($i)] = $reader.GetValue($i)
}
New-Object PSObject -Property $hash
}
$reader.close()
$readconn.close()
$dbrecords
I have created a PowerShell function that bulk copies data from a .csv file (first row is the header), and inserts the data in to a SQL Server database table.
See my code:
function BulkCsvImport($sqlserver, $database, $table, $csvfile, $csvdelimiter, $firstrowcolumnnames) {
Write-Host "Bulk Import Started."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000
# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
# Wipe the bulk insert table first
Invoke-Sqlcmd -Query "TRUNCATE TABLE $table" -ServerInstance $sqlserver -Database $database
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable
# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstrowcolumnnames -eq $true) { $null = $reader.readLine() }
foreach ($column in $columns) {
$null = $datatable.Columns.Add()
}
# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null) {
$null = $datatable.Rows.Add($line.Split($csvdelimiter))
$i++;
if (($i % $batchsize) -eq 0) {
$bulkcopy.WriteToServer($datatable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}
# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
# Clean Up
$reader.Close();
$reader.Dispose()
$bulkcopy.Close();
$bulkcopy.Dispose()
$datatable.Dispose()
Write-Host "Bulk Import Completed. $i rows have been inserted into the database."
# Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
$i = 0
[System.GC]::Collect()
}
I am looking to modify the above though so that the column names in the .csv file match up with the column names in the SQL Server database table. They should be identical. At the moment the data is being imported in to the incorrect database columns.
Could I get some assistance as what I need to do to modify the above function to achieve this?
I would use existing open source solution:
Import-DbaCsv - dbatools.io
Import-DbaCsv.ps1
Efficiently imports very large (and small) CSV files into SQL Server.
Import-DbaCsv takes advantage of .NET's super fast SqlBulkCopy class to import CSV files into SQL Server.
Parameters:
-ColumnMap
By default, the bulk copy tries to automap columns. When it doesn't
work as desired, this parameter will help.
PS C:\> $columns = #{
>> Text = 'FirstName'
>> Number = 'PhoneNumber'
>> }
PS C:\> Import-DbaCsv -Path c:\temp\supersmall.csv
-SqlInstance sql2016 -Database tempdb -ColumnMap $columns
-BatchSize 50000 -Table table_name -Truncate
The CSV column 'Text' is inserted into SQL column 'FirstName' and CSV column Number is inserted into the SQL Column 'PhoneNumber'. All other columns are ignored and therefore null or default values.
I came across a post discussing how to use Powershell to bulk import massive data relatively fast. I have a typical csv file with about 5 million rows formatted in the usual way.
I keep getting the same error messages regardless if I choose to import a txt or csv file. Playing around with the csvdelimiter/firstcolumnnames section also created their own issues.
I've spent hours trying to figure out how to get it to work with MY csv files and I keep getting the same error messages no matter what I try. All field names accept Null and they are identical in every way between the table and csv file. I do not have a primary key for the database.
# Database variables
$sqlserver = "SERVERNAMEHERE"
$database = "autos"
$table = "AgedAutos"
# CSV variables
$csvfile = "C:\temp\aged.csv"
$csvdelimiter = "',"
$firstRowColumnNames = $true
################### No need to modify anything below ###################
Write-Host "Script started..."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000
# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable
# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }
foreach ($column in $columns) {
$null = $datatable.Columns.Add()
}
# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null) {
$null = $datatable.Rows.Add($line.Split($csvdelimiter))
$i++; if (($i % $batchsize) -eq 1) {
$bulkcopy.WriteToServer($datatable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}
# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
# Clean Up
$reader.Close(); $reader.Dispose()
$bulkcopy.Close(); $bulkcopy.Dispose()
$datatable.Dispose()
Write-Host "Script complete. $i rows have been inserted into the database."
Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()
Error message listed below.
Exception calling "WriteToServer" with "1" argument(s): "The given value of type String from the data source cannot be converted to
type date of the specified target column."
At C:\powershell_scripts\batch_csv_import-code1-working-test for auto table.ps1:43 char:3
+ $bulkcopy.WriteToServer($datatable)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : InvalidOperationException
340000 rows have been inserted in 00:00:03.5156162
I have no idea what that error means since I cannot find anything useful on Google. I'm thinking one of the columns might be listed incorrectly in SQL Server, but I could be wrong.
Please help me figure out the problem. Thanks.
You are getting all the data in the first column because your value for $csvdelimiter is incorrect.
you have: $csvdelimiter = "',"
it should be: $csvdelimiter = ","
I have this powershell script that would work if my DEST table ONLY had the columns listed in the select from my SOURCE server, but the DEST table has more. I haven't been able to find anything that gives examples on how to specify the columns from my dest table I want to insert into. Note that the SourceServer and DestServer are not linked servers.
Param (
#[parameter(Mandatory = $true)]
[string] $SrcServer = "SourceServer",
[parameter(Mandatory = $true)]
[string] $SrcDatabase = "SourceDb",
#[parameter(Mandatory = $true)]
[string] $SrcTable = "stage.InternalNotes",
#[parameter(Mandatory = $true)]
[string] $DestServer = "DestServer",
#[parameter(Mandatory = $true)]
[string] $DestDatabase = "DestDb",
[parameter(Mandatory = $true)]
[string] $DestTable = "dbo.InternalNotes",
)
Function ConnectionString([string] $ServerName, [string] $DbName)
{
"Data Source=$ServerName;Initial Catalog=$DbName;Integrated Security=True;User ID=$UID;Password=$PWD;"
}
$SrcConnStr = ConnectionString $SrcServer $SrcDatabase
$SrcConn = New-Object System.Data.SqlClient.SQLConnection($SrcConnStr)
$CmdText = "SELECT
ino.UserId
,ino.StoreId
,ino.PostedById
,ino.DatePosted
,ino.NoteSubject
,ino.NoteText
,ino.NoteType
,ino.Classify
,ino.CreatedBy
,ino.CreatedUtc
,IsReadOnly = 0
FROM
stage.InternalNotes AS ino
"
$SqlCommand = New-Object system.Data.SqlClient.SqlCommand($CmdText, $SrcConn)
$SrcConn.Open()
[System.Data.SqlClient.SqlDataReader] $SqlReader = $SqlCommand.ExecuteReader()
Try
{
$DestConnStr = ConnectionString $DestServer $DestDatabase
$bulkCopy = New-Object Data.SqlClient.SqlBulkCopy($DestConnStr, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity)
$bulkCopy.DestinationTableName = $DestTable
$bulkCopy.WriteToServer($sqlReader)
}
Catch [System.Exception]
{
$ex = $_.Exception
Write-Host $ex.Message
}
Finally
{
Write-Host "Table $SrcTable in $SrcDatabase database on $SrcServer has been copied to table $DestTable in $DestDatabase database on $DestServer"
$SqlReader.close()
$SrcConn.Close()
$SrcConn.Dispose()
$bulkCopy.Close()
}
Essentially, I need to be able to do this:
INSERT INTO dbo.InternalNotes --DEST Server table
(
userID
,StoreID
,PostedByID
,DatePosted
,NoteSubject
,NoteText
,NoteType
,Classify
,CreatedBy
,CreatedDateUTC
,IsReadOnly
)
SELECT
ino.UserId
,ino.StoreId
,ino.PostedById
,ino.DatePosted
,ino.NoteSubject
,ino.NoteText
,ino.NoteType
,ino.Classify
,ino.CreatedBy
,ino.CreatedUtc
,IsReadOnly = 0
FROM
stage.InternalNotes AS ino --SOURCE Server table
Edits after getting everything to work based on the accepted answer:
For some reason it didn't like the line:
$bulkCopy = New-Object -TypeName Data.SqlClient.SqlBulkCopy -ArgumentList $DestSqlConnection, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity, $DestSqlTransaction;
It gave the error:
Cannot convert argument "1", with value:
"[System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity", for
"SqlBulkCopy" to type "System.Data.SqlClient.SqlBulkCopyOptions":
"Cannot convert value
"[System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity" to type
"System.Data.SqlClient.SqlBulkCopyOptions". Error: "Unable to match
the identifier name
[System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity to a valid
enumerator name. Specify one of the following enumerator names and try
again: Default, KeepIdentity, CheckConstraints, TableLock, KeepNulls,
FireTriggers, UseInternalTransaction,
AllowEncryptedValueModifications""
So Instead I changed it to this, and everything worked:
$bulkCopy = New-Object Data.SqlClient.SqlBulkCopy($DestSqlConnection, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity,$DestSqlTransaction)
To do manual column mapping, you need to populate SqlBulkCopy.ColumnMappings. If you don't specify the mapping, then as far as I know SqlBulkCopy will assume the first column in the select list or DataRow goes into the first ordinal column of the destination table.
For example:
$bulkCopy.DestinationTableName = $DestTable;
$bulkCopy.ColumnMappings.Add('sourceColumn1','destinationColumn1');
$bulkCopy.ColumnMappings.Add('sourceColumn2','destinationColumn2');
$bulkCopy.ColumnMappings.Add('sourceColumn3','destinationColumn3');
$bulkCopy.ColumnMappings.Add('sourceColumn4','destinationColumn4');
$bulkCopy.ColumnMappings.Add('sourceColumn5','destinationColumn5');
However, there's a number of other issues with your script.
Your connection string authentication section is nonsense:
`Integrated Security=True; User ID=$UID; Password=$PWD;`
Integrated Security=True says, "Use passthrough Windows authentication with currently logged on user." User ID=$UID; Password=$PWD; says, "Use SQL authentication with the specified username and password." You can't do both.
You should specify only one or the other.
$SqlCommand = New-Object system.Data.SqlClient.SqlCommand($CmdText, $SrcConn)
[...]
$bulkCopy = New-Object Data.SqlClient.SqlBulkCopy($DestConnStr, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity)
I may be wrong, but I'm pretty sure you're trying to pass two variables as one argument here. Just like with your ConnectionString function, I don't think you don't want parentheses here. In any case it's syntactically confusing. Do this instead:
$SqlCommand = New-Object -TypeName System.Data.SqlClient.SqlCommand -ArgumentList $CmdText, $SrcConn
[...]
$bulkCopy = New-Object -TypeName Data.SqlClient.SqlBulkCopy -ArgumentList $DestConnStr, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity
Speaking of that last one, I have another issue with it. SqlBulkCopy is powerful, but you really have to hold it's hand. By default, SqlBulkCopy doesn't run with any transaction benefits. That means that if it errors in the middle, well, too bad, your data has been partially updated. You can enable internal transactions, but then only the most recent batch of the inserts will be rolled back. You really need to manage your own transaction to get an all-or-nothing result.
So you'll end up with something like this:
Try {
$DestConnStr = ConnectionString $DestServer $DestDatabase
# We have to open the connection before we can create the transaction
$DestSqlConnection = New-Object -TypeName System.Data.SqlClient.SqlConnection -ArgumentList $DestConnStr;
$DestSqlConnection.Open();
$DestSqlTransaction = $DestSqlConnection.BeginTransaction();
$bulkCopy = New-Object -TypeName Data.SqlClient.SqlBulkCopy -ArgumentList $DestSqlConnection, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity, $DestSqlTransaction;
$bulkCopy.DestinationTableName = $DestTable
$bulkCopy.ColumnMappings.Add('sourceColumn1','destinationColumn1');
$bulkCopy.ColumnMappings.Add('sourceColumn2','destinationColumn2');
$bulkCopy.ColumnMappings.Add('sourceColumn3','destinationColumn3');
$bulkCopy.ColumnMappings.Add('sourceColumn4','destinationColumn4');
$bulkCopy.ColumnMappings.Add('sourceColumn5','destinationColumn5');
Try {
$bulkCopy.WriteToServer($sqlReader)
# Commit on success
$DestSqlTransaction.Commit();
}
Catch {
# Rollback on error
$DestSqlTransaction.Rollback();
# Rethrow the error to the outer catch block
throw ($_);
}
}
Catch [System.Exception] {
$ex = $_.Exception
Write-Host $ex.Message
}
Finally {
[...]
}
I'd probably rewrite the above more because I don't like nested try blocks, but for a quick and dirty rewrite this will work. I don't think you'll run into any problems with distributed transaction problems doing this, but I may be wrong. I tend to use SSIS or linked servers when I need this sort of data pump.