Powershell function to import csv file to SQL Server database table - sql-server

I have created a PowerShell function that bulk copies data from a .csv file (first row is the header), and inserts the data in to a SQL Server database table.
See my code:
function BulkCsvImport($sqlserver, $database, $table, $csvfile, $csvdelimiter, $firstrowcolumnnames) {
Write-Host "Bulk Import Started."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000
# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
# Wipe the bulk insert table first
Invoke-Sqlcmd -Query "TRUNCATE TABLE $table" -ServerInstance $sqlserver -Database $database
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable
# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstrowcolumnnames -eq $true) { $null = $reader.readLine() }
foreach ($column in $columns) {
$null = $datatable.Columns.Add()
}
# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null) {
$null = $datatable.Rows.Add($line.Split($csvdelimiter))
$i++;
if (($i % $batchsize) -eq 0) {
$bulkcopy.WriteToServer($datatable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}
# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
# Clean Up
$reader.Close();
$reader.Dispose()
$bulkcopy.Close();
$bulkcopy.Dispose()
$datatable.Dispose()
Write-Host "Bulk Import Completed. $i rows have been inserted into the database."
# Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
$i = 0
[System.GC]::Collect()
}
I am looking to modify the above though so that the column names in the .csv file match up with the column names in the SQL Server database table. They should be identical. At the moment the data is being imported in to the incorrect database columns.
Could I get some assistance as what I need to do to modify the above function to achieve this?

I would use existing open source solution:
Import-DbaCsv - dbatools.io
Import-DbaCsv.ps1
Efficiently imports very large (and small) CSV files into SQL Server.
Import-DbaCsv takes advantage of .NET's super fast SqlBulkCopy class to import CSV files into SQL Server.
Parameters:
-ColumnMap
By default, the bulk copy tries to automap columns. When it doesn't
work as desired, this parameter will help.
PS C:\> $columns = #{
>> Text = 'FirstName'
>> Number = 'PhoneNumber'
>> }
PS C:\> Import-DbaCsv -Path c:\temp\supersmall.csv
-SqlInstance sql2016 -Database tempdb -ColumnMap $columns
-BatchSize 50000 -Table table_name -Truncate
The CSV column 'Text' is inserted into SQL column 'FirstName' and CSV column Number is inserted into the SQL Column 'PhoneNumber'. All other columns are ignored and therefore null or default values.

Related

Sqlbulkcopy Excessive Memory Consumtion even with EnableStreaming and low BatchSize

I try to bulk load data from Oracle to SqlServer through Powershell Sqlserver Module Sqlbulkcopy
On small Data, everything works fine, but on big Datasets, even if bachsize and streaming are set, sqlbulkcopy is taking all the memory available... until an out of memory
Also the notify function seems to give no answer, so I guess even with streaming=True, the process first load everything to memory...
What did I missed ?
$current = Get-Date
#copy table from Oracle table to SQL Server table
add-type -path "D:\oracle\product\12.1.0\client_1\odp.net\managed\common\Oracle.ManagedDataAccess.dll";
#define oracle connectin string
$conn_str = "cstr"
# query for oracle table
$qry = "
SELECT
ID,CREATEDT,MODIFIEDDT
FROM MYTABLE
WHERE source.ISSYNTHETIC=0 AND source.VALIDFROM >= TO_Date('2019-01-01','yyyy-mm-dd')
";
# key (on the left side) is the source column while value (on the right side) is the target column
[hashtable] $mapping = #{'ID'='ID';'CREATEDT'='CREATEDT';'MODIFIEDDT'};
$adapter = new-object Oracle.ManagedDataAccess.Client.OracleDataAdapter($qry, $conn_str);
#$info = new-object Oracle.ManagedDataAccess.Client;
#Write-Host ( $info | Format-Table | Out-String)
$dtbl = new-object System.Data.DataTable('MYTABLE');
#this Fill method will populate the $dtbl with the query $qry result
$adapter.Fill($dtbl);
#define sql server target instance
$sqlconn = "cstr";
$sqlbc = new-object system.data.sqlclient.Sqlbulkcopy($sqlconn)
$sqlbc.BatchSize = 1000;
$sqlbc.EnableStreaming = $true;
$sqlbc.NotifyAfter = 1000;
$sqlbc.DestinationTableName="DWHODS.MYTABLE";
#need to tell $sqlbc the column mapping info
foreach ($k in $mapping.keys)
{
$colMapping = new-object System.Data.SqlClient.SqlBulkCopyColumnMapping($k, $mapping[$k]);
$sqlbc.ColumnMappings.Add($colMapping) | out-null
}
$sqlbc.WriteToServer($dtbl);
$sqlbc.close;
$end= Get-Date
$diff= New-TimeSpan -Start $current -End $end
Write-Output "import needed : $diff"
Thanks to Jeroen, I changed the code like this, now its no more consuming memory :
$oraConn = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($conn_str);
$oraConn.Open();
$command = $oraConn.CreateCommand();
$command.CommandText=$qry;
$reader = $command.ExecuteReader()
...
$sqlbc.WriteToServer($reader);

Export return of SQL script to Excel document using powershell

At the moment I have the following code which grabs the return table and outputs it into a CSV file.
Push-Location; Import-Module SQLPS -DisableNameChecking; Pop-Location
$SQLServer = "localhost"
$today = (get-date).ToString("dd-MM-yyyy")
$DBName = "ZoomBI"
$ExportFile = "\\Shared_Documents\FC Folder\Despatch\Brexit Files\DHL\DHL "+$today+".csv"
$Counter = 0
$Storedprocedure = "EXEC [dbo].[DHLDeliveries]"
while ( $true )
{
# Remove the export file
if (Test-Path -Path $ExportFile -PathType Leaf) {
Remove-Item $ExportFile -Force
}
# Clear the buffer cache to make sure each test is done the same
$ClearCacheSQL = "DBCC DROPCLEANBUFFERS"
Invoke-Sqlcmd -ServerInstance $SQLServer -Query $ClearCacheSQL
# Export the table through the pipeline and capture the run time. Only the export is included in the run time.
$sw = [Diagnostics.Stopwatch]::StartNew()
Invoke-Sqlcmd -ServerInstance $SQLServer -Database $DBName -Query $Storedprocedure | Export-CSV -Path $ExportFile -NoTypeInformation
$sw.Stop()
$sw.Elapsed
$Milliseconds = $sw.ElapsedMilliseconds
$Counter++
Exit
}
However, instead of that I need to be able to output the results to an Excel document with two sheets
and put the results into each sheet.
# Create a Excel Workspace
$excel = New-Object -ComObject Excel.Application
# make excel visible
$excel.visible = $true
# add a new blank worksheet
$workbook = $excel.Workbooks.add()
# Adding Sheets
foreach($input in (gc c:\temp\input.txt)){
$s4 = $workbook.Sheets.add()
$s4.name = $input
}
# The default workbook has three sheets, remove them
($s1 = $workbook.sheets | where {$_.name -eq "Sheet1"}).delete()
#Saving File
"`n"
write-Host -for Yellow "Saving file in $env:userprofile\desktop"
$workbook.SaveAs("$env:userprofile\desktop\ExcelSheet_$Today.xlsx")
Can anyone help ?
I would take a look at the ImportExcel module. It took me 2 lines of code to create an excel document with two sheets.
https://www.powershellgallery.com/packages/ImportExcel/5.4.2
https://www.youtube.com/watch?v=fvKKdIzJCws&list=PL5uoqS92stXioZw-u-ze_NtvSo0k0K0kq

Using Powershell to Bulk Import Large CSV into SQL Server

I came across a post discussing how to use Powershell to bulk import massive data relatively fast. I have a typical csv file with about 5 million rows formatted in the usual way.
I keep getting the same error messages regardless if I choose to import a txt or csv file. Playing around with the csvdelimiter/firstcolumnnames section also created their own issues.
I've spent hours trying to figure out how to get it to work with MY csv files and I keep getting the same error messages no matter what I try. All field names accept Null and they are identical in every way between the table and csv file. I do not have a primary key for the database.
# Database variables
$sqlserver = "SERVERNAMEHERE"
$database = "autos"
$table = "AgedAutos"
# CSV variables
$csvfile = "C:\temp\aged.csv"
$csvdelimiter = "',"
$firstRowColumnNames = $true
################### No need to modify anything below ###################
Write-Host "Script started..."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000
# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable
# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }
foreach ($column in $columns) {
$null = $datatable.Columns.Add()
}
# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null) {
$null = $datatable.Rows.Add($line.Split($csvdelimiter))
$i++; if (($i % $batchsize) -eq 1) {
$bulkcopy.WriteToServer($datatable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}
# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
# Clean Up
$reader.Close(); $reader.Dispose()
$bulkcopy.Close(); $bulkcopy.Dispose()
$datatable.Dispose()
Write-Host "Script complete. $i rows have been inserted into the database."
Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()
Error message listed below.
Exception calling "WriteToServer" with "1" argument(s): "The given value of type String from the data source cannot be converted to
type date of the specified target column."
At C:\powershell_scripts\batch_csv_import-code1-working-test for auto table.ps1:43 char:3
+ $bulkcopy.WriteToServer($datatable)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : InvalidOperationException
340000 rows have been inserted in 00:00:03.5156162
I have no idea what that error means since I cannot find anything useful on Google. I'm thinking one of the columns might be listed incorrectly in SQL Server, but I could be wrong.
Please help me figure out the problem. Thanks.
You are getting all the data in the first column because your value for $csvdelimiter is incorrect.
you have: $csvdelimiter = "',"
it should be: $csvdelimiter = ","

Adding column to SQL query on multiple instances from powershell

I have the following powershell script which reads in a list of servers, and runs SQL command on these servers. This data is then exported to csv and to excel format
I would like to be able to add the targeted server name from my server list as the first column so columns would look like this (server name added to front)
Server Name | Name | CollectionSet ID | Collection Mode | Retention Period | Schedule
This is the current script I have:
Param
(
[string]$fServers = 'W:\Theo\Scripts\mdw_servers.csv'
)
$query = "SELECT a.name AS 'DC Name',
collection_set_id AS 'Collection_set ID',
CASE collection_mode
WHEN 1 THEN 'non-cached'
WHEN 0 THEN 'cached'
END AS 'Collection Type' ,
days_until_expiration AS 'Retention Period' ,
b.name AS 'Schedule Name'
FROM msdb.dbo.syscollector_collection_sets a ,
msdb.dbo.sysschedules b
WHERE a.schedule_uid = b.schedule_uid
AND is_running = 1;"
$csvFilePath = "W:\Theo\Scripts\queryresults.csv"
$excelFilePath = "W:\Theo\Scripts\queryresults.xls"
# Run Query against multiple servers, combine results
$allServers = Get-Content -Path $fServers
foreach ($Server in $allServers) {
write-host "Executing query against server: " $Server
$results += Invoke-Sqlcmd -Query $query -ServerInstance $Server;
}
# Output to CSV
write-host "Saving Query Results in CSV format..."
$results | export-csv $csvFilePath -NoTypeInformation
# Convert CSV file to Excel
write-host "Converting CSV output to Excel..."
$excel = New-Object -ComObject excel.application
$excel.visible = $False
$excel.displayalerts=$False
$workbook = $excel.Workbooks.Open($csvFilePath)
$workSheet = $workbook.worksheets.Item(1)
$resize = $workSheet.UsedRange
$resize.EntireColumn.AutoFit() | Out-Null
$xlExcel8 = 56
$workbook.SaveAs($excelFilePath,$xlExcel8)
$workbook.Close()
$excel.quit()
$excel = $null
write-host "Results are saved in Excel file: " $excelFilePath
Any input is appreciated!
have you tried
SELECT ##SERVERNAME AS 'Server Name'
https://msdn.microsoft.com/en-us/library/ms187944.aspx

How to create only specific delete statements using Scripter

I am using the Scripter class to give me a script for the data out of an existing database. I want to script a dataset that can be inserted into a production database. We are doing this to test if an installation of our Software is correct.
Unfortunately the dataset has to be removed later without any entries left behind so that it does not interfere with the data of our customers. So what I need are INSERT and DELTE statements. These are maintained manually at the moment which is too much of a burden.
Very well so I just went and executed the Scripter twice (once for INSERT, once for DELETE)
Problem is that when setting ScriptDrops to true then the output is in the form
DELETE FROM [dbo].[TableName]
What I would like is something of the form:
DELETE FROM [dbo].[TableName] WHERE ID = 'GUID'
Technically this would be possible since there are Primary Keys on all the tables.
The Scripter class must also in some form know of that things since it also gets the order of the DELETE-statements (dependencies) correct via foreign keys.
Any help on this would be appreciated.
Following are the 2 PowerShell-scripts I am using to export the data:
ScriptRepositoryData.ps1
$scriptPath = $MyInvocation.MyCommand.Path
$scriptDirectory = Split-Path $scriptPath -Parent
. $scriptDirectory\DatabaseScripting.ps1
$filepath='c:\data.sql'
$database='ECMS_Repository'
$tablesToExclude = #(
"SomeUnwantedTable"
)
$tablesListFromDatabase = GetTableList $database
$tablesArray = #()
$tablesListFromDatabase |% {
if (-not $tablesToExclude.Contains($_.Name.ToString()))
{
$tablesArray += $_.Name
}
}
ScriptInsert $database $tablesArray $filepath
DatabaseScripting.ps1
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") | out-null
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMOExtended") | out-null
Function GetTableList ($database)
{
Invoke-SqlCmd -Database $database -query "SELECT * FROM sys.tables"
}
Function ScriptInsert ($database, $tables, $destination)
{
try {
$serverMO = new-object ("Microsoft.SqlServer.Management.Smo.Server") "localhost"
if ($serverMO.Version -eq $null) {Throw "Can't find the instance localhost"}
$urnsToScript = New-Object Microsoft.SqlServer.Management.Smo.UrnCollection
$databaseMO = $serverMO.Databases.Item("ECMS_Repository")
if ($databaseMO.Name -ne $database) {Throw "Can't find the database $database"}
$tables |% {
$tableListMO = $databaseMO.Tables.Item($_, "dbo")
$tableListMO |% {
$urnsToScript.Add($_.Urn)
}
}
$scripter = new-object ('Microsoft.SqlServer.Management.Smo.Scripter') $serverMO
$scripter.Options.ScriptSchema = $False;
$scripter.Options.ScriptData = $true;
$scripter.Options.ScriptDrops = $true;
$scripter.Options.ScriptAlter = $true;
$scripter.Options.NoCommandTerminator = $true;
$scripter.Options.Filename = $destination;
$scripter.Options.ToFileOnly = $true
$scripter.Options.Encoding = [System.Text.Encoding]::UTF8
$scripter.EnumScript($urnsToScript)
Write-Host -ForegroundColor Green "Done"
}
catch {
Write-Host
Write-Host -ForegroundColor Red "Error occured"
Write-Host
Write-Host $_.Exception.ToString()
Write-Host
}
}
Unfortunately I did not find a way to do this using the Sql Management Objects.
Anyhow I now use the output of the Scripter and select the IDs of each table. I then use the IDs to change every line that looks like
DELETE FROM [dbo].[tableName]
to this:
DELETE FROM [dbo].[tableName] WHERE ID IN ('guid1', 'guid2')
Here is how I did it:
$content = Get-Content $destination
Clear-Content $destination
$content |% {
$line = $_
$table = $line.Replace("DELETE FROM [dbo].[","").Replace("]","")
$query = "SELECT ID, ClassID FROM" + $_
$idsAsQueryResult = Invoke-SqlCmd -Database $database -query $query
$ids = $idsAsQueryResult | Select-Object -Expand ID
if ($ids -ne $null) {
$joinedIDs = [string]::Join("','",$ids)
$newLine = $line + " WHERE ID IN ('" + $joinedIDs + "')"
Add-Content $destination $newLine
}
}
Where $destination is the script that has been generated with the Scripter class and $database is a string containing the database name.
I had to select a second column (ClassID which is there on all tables due to our OR mapper re-store) because of some weird error in Select-Object which I do not fully understand.
This of course only works because all tables have primary keys and all primary keys are named ID and are not combined primary keys or something.
You could of course achieve the same thing for other more complicated database schemas by extracting primary key information via SQL management objects.

Resources