Download data into SQL Server - sql-server

I have a file without any extension and I need to download data into a SQL Server table.
Here's an example of one string opened in Notepad:
7600 20160701 20160701 20160630 20160630 20160630 ZSO ### 5501 850170371
In Excel it looks like one string without any spaces.
How to break string over columns when I have no comma or something like that?

I recommend you to check SQL Server Import wizard which is smart enough to figure out delimiter for you. You may need to validate column width and type, though.
If that is a regular activity you can create a SSIS package (you can actually create SSIS package at the end of import-export wizard) and can implement advance error handling features.
If you are looking for TSQL solution only please post more data/specification so we can work on that. it can be done in any of the following way:
Bulk Insert
OPENQUERY
OPENDATASOURCE
OPENROWSET

For one-off tasks like this with small datasets and variable data quality, my personal preference is to use Excel to create a script which can be run in SSMS.
To accomplish this I would do the following steps:
Split the data using Excels "Text to columns"
There are similar options here as in SQL Import Wizard as to how columns are calculated (fixed width/delimiters/...) with a pretty intuitive and immediate interface
Copy and paste special Transpose to shift the data into rows
Possibly do some manual data cleanup
Add formula to "scriptify" the values
Create and store script

You can use Excel (there is a function for this in german it is called (text in spalten, should be something like "text in rows") or LibreOffice Calc (it should provide a wizard) for importing the data.
If it looks good in one of these programmes, save it as csv ore another format that you can import to MSSQL.

Related

What is the most efficient way to remove a space from a string when importing a csv file to SQL Server using SSIS?

I will be importing records from thousands of CSV files using SSIS. These CSV files will contain a Postal Code column, which has the format A5A 5A5, where "A" is any letter and "5" is any number from 0 to 9.
There is a space between the "A5A" and "5A5" that I want to remove, so that all Postal Codes appear as "A5A5A5".
I am reviewing the documentation and see several options, and I'm trying to narrow down the best one, i.e. the one that requires the least number of steps. So far I am looking at the Derived Column transformation, but that would involve adding another column to my SQL Table.
Is there a way I can trim the space without having to add an extra column?
As #Larnu answers via comments, a Derived Column is likely the most appropriate component to use here.
The expression you're looking for is a REPLACE. Syntax ought to be
REPLACE([PostalCode], " ", "")
You have 10 columns from your CSV. The Derived Column can either replace and existing or add a new column to row buffer. I would advocate adding a new column. PostalCodeStripped or something like that. At some point, something weird is going to happen with the data and you'll get an A5A 5A5 that didn't get the space stripped. Having both the original and the parsed value available in debugging can help sort out problems (Oh, this has a non-breaking space or a tab instead of a space, or in addition to)
But, just because a column is in the buffer does not mean you need to create a column for that in the destination table. Just unmap the PostalCode from the row buffer and map PostalCodeStripped to the PostalCode column in the database. You'll see what I'm talking about in the destination component. By default, they'll map based on name matching but you're welcome to wire them up however you see fit.
ETL is an alternate option. Bulk load the data into a staging table. Then do a simple select into the destination to do the transformation. I might be tempted to not use SSIS. BCP or Import-DbaCsv (DBATools powershell module) would both be a quick alternates. If you know PowerShell and want to process the files in a pipe, you can pipe the files into Import-DbaCsv. The PowerShell script can also execute Invoke-DbaQuery to run update or insert queries to do the transformation.
SSIS can also just do the bulk load and then run the T-SQL to do the transformations. I don't like the overhead of maintaining and upgrading SSIS packages. I'd take T-SQL jobs over SSIS jobs any day. (We have about 1/2 year for a FTE to upgrade our SSIS packages to SQL 2019. The T-SQL jobs just keep working when moved to a new version.)
Or go the ETL route and do the transformation in the SSIS data flow. A Derived Column transformation between a flat file source and a OLE DB destination should do the trick.
To handle multiple files, you can use the Foreach Loop Container. There's an enumerator for files using a wildcard path. (The initial T-SQL task just truncates the table for testing.)
You'll need to parameterize the thing to get the file source to be each file.
For PowerShell it might be something like (no transformation yet) the script below.
Get-ChildItem 'C:\TestFolder\*.csv' |
import-dbacsv -SqlInstance 'localhost\DEV' -Database 'Test' -Schema 'dbo' -Table 'Test' -AutoCreateTable -verbose
If you run this in the ISE, be aware of a bug where the connection might not be released after calling import-dbacsv that will cause it to hang. This is not an issue in the command line from what I can tell. (If this happens to you, you might have to kill the ISE process - closing it is not enough.)

Ignoring column from Excel file while importing to SQL Server

I have multiple Excel files that have the same format. I need to import them into SQL Server.
The issue I currently have is that there are two text columns that I need to ignore completely as they are free text and the character length for some rows exceeds what the server allows me to import which results in a truncation error.
Because I don't need these columns for my analysis, the table I'm importing to doesn't include these columns but for some reason the SSIS packages still picks up those columns and cuts the import job halfway through.
I tried using max character length for those columns which still results in the truncation error.
I need to create an SSIS package that ignores the two columns completely without deleting the columns from Excel.
You can specify which columns you need to ignore from the Edit Mappings dialog.
I have added the image for your reference:
If you just create the SSIS package in SSDT the Excel file can be queried to return only the required columns. In the package, create an Excel Connection Manager using the Excel file. Then on the Control Flow of the package add a Data Flow Task that has an Excel Source component in it. On this source, change the data access mode to SQL command and the file can then be queried similar to SQL. In the following example TabName is the name of the Excel tab containing the data that will be returned. If either the tab or any column names contain spaces they will need to be enclosed in square brackets, i.e. TabName would be [Tab Name].
Import/Export Wizard
Since you mentioned in the comments that you are using SQL Server Import/Export Wizard. You can solve that if you have a fixed columns (range) that you are looking to import (example: first 10 columns).
In Import/Export wizard, after selecting destination options you will be asked if you want to read from tables or query:
Select the query option, then use a simple select query and specify the columns range after the sheet name. As example:
SELECT * FROM [Sheet1$A:C]
The query above will read from the first 3 columns in Sheet1 since A:C represent the range between first column A and third column C.
Now, you can check the columns from the Edit Mappings dialog:
SSIS
You can use the same logic within SSIS package, just write the same SQL command in the Excel Source after changing the Access Mode to SQL Command.
The solution is simple. I needed to write a query that will exclude the columns. So instead of selecting "Copy data from one or more tables" you select "write a query" and exclude the columns you don't need. This one worked 100%

Excel Source SSIS

I have an SSIS package with an Excel Source reads an Excel table. I currently am using the Table or View Data Access Mode and it is literally reading every row in the worksheet, 1,048,576 which is the maximum.
The source worksheet has an Excel table on it named PSA_DATA. Why isn't this table in the Table or View drop down? There is an option for the worksheet followed by _FilterDatabase but this fails when I run the package even though it pulls the correct data when I press Preview. Wouldn't this make more sense than using the SQL Command and SELECT * FROM [fact_PSA$Ax:Bx]? The whole reason we use Named Ranges and Tables in Excel is because they are dynamic! Now I have to hard code the range in every time with rows numbers?
What am I missing here? Is there an easier way I am missing? I just want to move an Excel table into a SQL table! Why don't doesn't the most ubiquitous piece of software in the world easily talk to the second most ubiquitous piece of software in the world!?!?!
If the sheet name is not shown in Table or view combobox, it is not a bad idea to use a Sql Command.
But When using SQL Comand to read from excel it is not necessary to specify a range, OLEDB will take used range by default just use the following command
SELECT * FROM [fact_PSA$]
Workaround
you can try reading your excel file from a script task or a script component, you can follow one of the following links to achieve this:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/2d45f180-9fd0-4224-a298-cb99e2b2100a/how-to-read-the-contents-of-excel-file-through-ssis-script-task-without-the-headers?forum=sqlintegrationservices
https://msdn.microsoft.com/en-us/library/ms403358.aspx
http://billfellows.blogspot.com/2013/04/ssis-excel-source-via-script.html
Side Note: there are many links you can follow to import data from excel to SQL using SSIS:
http://www.sqlshack.com/using-ssis-packages-import-ms-excel-data-database/
https://www.mssqltips.com/sqlservertip/2770/importing-data-from-excel-using-ssis--part-1/
https://www.simple-talk.com/sql/ssis/moving-data-from-excel-to-sql-server-10-steps-to-follow/
https://www.simple-talk.com/sql/ssis/importing-excel-data-into-sql-server-via-ssis-questions-you-were-too-shy-to-ask/
I appreciate the links to work-arounds, but I didn't really get an answer to my question. Why can't we reference an EXCEL TABLE (not a worksheet) from the SSIS Excel Source???
I ended up using the SQL Command data access mode with this query:
SELECT * FROM [fact_PSA$A:W]
WHERE fact_PSA_ID IS NOT NULL
Somehow, using SQL stopped it from reading every possible row in the worksheet even though the range provided is set for "A:W" which is every row. I guess the "WHERE fact_PSA_ID" limits the rows read before it hits the SSIS source.

Export large amounts of binary data from one SQL database and import it into another database of the same schema

I have one database with an image table that contains just over 37,000 records. Each record contains an image in the form of binary data. I need to get all of those 37,000 records into another database containing the same table and schema that has about 12,500 records. I need to insert these images into the database with an IF NOT EXISTS approach to make sure that there are no duplicates when I am done.
I tried exporting the data into excel and format it into a script. (I have doe this before with other tables.) The thing is, excel does not support binary data.
I also tried the "generate scripts" wizard in SSMS which did not work because the .sql file was well over 18GB and my PC could not handle it.
Is there some other SQL tool to be able to do this? I have Googled for hours but to no avail. Thanks for your help!
I have used SQL Workbench/J for this.
You can either use WbExport and WbImport through text files (the binary data will be written as separate files and the text file contains the filename).
Or you can use WbCopy to copy the data directly without intermediate files.
To achieve your "if not exists" approache you could use the update/insert mode, although that would change existing row.
I don't think there is a "insert only if it does not exist mode", but you should be able to achieve this by defining a unique index and ignore errors (although that wouldn't be really fast, but should be OK for that small number of rows).
If the "exists" check is more complicated, you could copy the data into a staging table in the target database, and then use SQL to merge that into the real table.
Why don't you try the 'Export data' feature? This should work.
Right click on the source database, select 'Tasks' and then 'Export data'. Then follow the instructions. You can also save the settings and execute the task on a regular basis.
Also, the bcp.exe utility could work to read data from one database and insert into another.
However, I would recommend using the first method.
Update: In order to avoid duplicates you have to be able to compare images. Unfortunately, you cannot compare images directly. But you could cast them to varbinary(max) for comparison.
So here's my advice:
1. Copy the table to the new database under the name tmp_images
2. use the merge command to insert new images only.
INSERT INTO DB1.dbo.table_name
SELECT * FROM DB2.dbo.table_name
WHERE column_name NOT IN
(
SELECT column_name FROM DB1.dbo.table_name
)

Export tables from SQL Server to be imported to Oracle 10g

I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.

Resources