BCP - Include terminator in data insert - sql-server

I'm using BCP to load a json file to SQL Server (yes I know there are better ways, but need to try this)
The problem is, the json document is not formed properly because the terminator in the format file is being removed, but I want it included
bcp db.dbo.test IN G:\JSON\json.out -f G:\JSON\formatfile.out -T
format file terminator:
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="] }" COLLATION="Latin1_General_CI_AI"/>
How can I end the file without truncating the json closing tags?

BCP is not designed for importing a file into a single column, so you run into these problems. To import a file as a single object, use the OPENROWSET(... SINGLE_BLOB) functionality, like this:
INSERT INTO JsonTable(jsonColumn)
SELECT BulkColumn
FROM OPENROWSET (BULK ‘TextFile Path’, SINGLE_BLOB) FileName

If you absolutely, positively must use BCP, there is one trick that is often used for XML files that should work for JSON files as well.
Do not add a Row terminator value
Make your Field terminator value something that absolutely, positively, cannot exist in the JSON file, such as '\0~\0\0~' (which is NULL + ~ + two NULLs + ~). If this could exist in the JSON, try some other value. Just ensure that it can't exist in the file.
By default, this imports an entire XML file as a unit. It should work on JSON files as well, but I cannot guarantee that.

Related

NULLs when importing flat file - SQL Server

I am trying to import some data from a .csv file in SSMS using the "Import Flat File" option. However, not all the data is being copied.
Both data types are set to nvarchar(50).
The lines containing just a single value after the semi-colon is imported. Lines with multiple values are imported as NULL. I've tried separating the values with forward slashes and commas. The result is still the same.
How can I get the values imported instead of these NULL values?
I tested on my local machine with your example and everything is ok.
Maybe there is an issue in your flat file, have you checked your file is coherent (appropriate < CR >< LF > for example) in Notepad++ ?
Also, during the import, Is the result you see in the "preview" window correct ?

Import csv to SQLServer when there are spaces after the text qualifier

I have a csv file with a column GeoCodes. This uses " as text qualifier.
I am trying to import this into SQLServer using the SQL Server Import Wizard.
The problem with the data is, if there is no GeoCode the csv file will sometimes output the GeoCode as " " and then several spaces. This errors when trying to import the data as it picks up the data within the text qualifier and then there are these spaces before the next comma delimiter.
An example of the data below. The Pontypandy row is the row that errors.
Place ,Geo Codes ,Type
Northpole ,"90.0000,0.0000 ",Pole
Southpole ,"-90.0000,0.0000 ",Pole
Pyramids ,"29.9765,31.1313 ",BigTriangle
France ," ",Country
Pontypandy ," " ,City
I have to use the text qualifiers as there is a comma in the GeoCodes.
I have no say on how the data is sent to me and therefore have to deal with the data as is.
As a work around I have to do a find and replace on the data in notepad first before importing. This adds an extra step to the job that hopefully isn't needed.
Is there anyway I can get around the " " spaces during the import?
As an extra note, I don't currently have access to SSIS but if it can be done in there any easier then please answer with that as it could help me justify getting SSIS (I might have to remove this comment later if I have to show it to my manager).
If your data really is the way you show above you can use fixed width format. Import the data as is and replace the " afterwards. This is not the best solution.
Much better: pipe the import file through sed before importing. This is not only much faster, but in all cases, when data is larger than your RAM the only easy way (OK, there are some other). All you need is sed at operation system level. If you can copy the executable somewhere it's all you need. If you want to replace "[any number of blanks], with ", this is the regex should be:
cat myfile.txt|sed -b -e "s/\" *,/\",/">yournewfile.txt
The regex is easy once you get the idea:
- s means Substitute,
- /first /second/ means look for first and replace with second,
- \" is the escaped " (because of DOS)
- Space and * means any number of spaces
- , means ,
On a lot of systems sed is still there (cygwin). Have fun!
Two methods of Bulk Insert
Row-based Bulk Insert
Most Useful when you have string-qualified columns in CSV
You will need to first create a table with two-fields: identity & varchar(max); identity will signify the row-count & varchar(max) will be your row data
Create a view that only selects the varchar(max) field from the table above
Bulk Insert syntax will look something like this:
BULK INSERT AdventureWorks2012.Sales.v_SalesOrderDetail
FROM 'f:\orders\lineitem.csv'
WITH (
ROWTERMINATOR =' |\n'
);
Columnar-based Insert:
Most use this widely but is only useful and reliable when there are no string qualified columns.
Use most common Bulk Insert syntax with RowTerminator and LineTerminator options
References:
Bulk-Insert Syntax: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql#examples
Bulk-Insert with View: https://technet.microsoft.com/en-us/library/ms179250(v=sql.105).aspx
Bulk-Insert with Table: https://technet.microsoft.com/en-us/library/ms187086(v=sql.105).aspx

Error importing data from CSV with OpenRowset in SQL Server - Mysterious value of "S7"

I have a file dump which needs to be imported into SQL Server on a daily basis, which I have created a scheduled task to do this without any attendant. All CSV files are decimated by ',' and it's a Windows CR/LF file encoded with UTF-8.
To import data from these CSV files, I mainly use OpenRowset. It works well until I ran into a file in which there's a value of "S7". If the file contains the value of "S7" then that column will be recognized as datatype of numeric while doing the OpenRowset import and which will lead to a failure for other alphabetic characters to be imported, leaving only NULL values.
This is by far I had tried:
Using IMEX=1: openrowset('Microsoft.ACE.OLEDB.15.0','text;IMEX=1;HDR=Yes;
Using text driver: OpenRowset('MSDASQL','Driver=Microsoft Access Text Driver (*.txt, *.csv);
Using Bulk Insert with or without a format file.
The interesting part is that if I use Bulk Insert, it will give me a warning of unexpected end of file. To solve this, I have tried to use various row terminator indicators like '0x0a','\n', '\r\n' or not designated any, but they all failed. And finally I managed to import some of the records which using a row terminator of ',\n'. However the original file contains like 1000 records and only 100 will be imported, without any notice of errors or warnings.
Any tips or helps would be much appreciated.
Edit 1:
The file is ended with a newline character, from which I can tell from notepad++. I managed to import files which give me an error of unexpected end of file by removing the last record in those files. However even with this method, that I still can not import all records, only a partial of which can be imported.

Import CSV data into SQL Server

I have data in the csv file similar to this:
Name,Age,Location,Score
"Bob, B",34,Boston,0
"Mike, M",76,Miami,678
"Rachel, R",17,Richmond,"1,234"
While trying to BULK INSERT this data into a SQL Server table, I encountered two problems.
If I use FIELDTERMINATOR=',' then it splits the first (and sometimes the last) column
The last column is an integer column but it has quotes and comma thousand separator whenever the number is greater than 1000
Is there a way to import this data (using XML Format File or whatever) without manually parsing the csv file first?
I appreciate any help. Thanks.
You can parse the file with http://filehelpers.sourceforge.net/
And with that result, use the approach here: SQL Bulkcopy YYYYMMDD problem or straight into SqlBulkCopy
Use MySQL load data:
LOAD DATA LOCAL INFILE 'path-to-/filename.csv' INTO TABLE `sql_tablename`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
IGNORE 1 LINES;
The part optionally enclosed by '\"', or escape character and quote, will keep the data in the first column together for the first field.
IGNORE 1 LINES will leave the field name row out.
UTF8 line is optional but good to use if names have diacritics, like in José.

Commas within CSV Data

I have a CSV file which I am directly importing to a SQL server table. In the CSV file each column is separated by a comma. But my problem is that I have a column "address", and the data in this column contains commas. So what is happening is that some of the data of the address column is going to the other columns will importing to SQL server.
What should I do?
For this problem the solution is very simple.
first select => flat file source => browse your file =>
then go to the "Text qualifier" by default its none write here double quote like (") and follow the instruction of wizard.
Steps are -
first select => flat file source => browse your file => Text qualifier (write only ") and follow the instruction of wizard.
Good Luck
If there is a comma in a column then that column should be surrounded by a single quote or double quote. Then if inside that column there is a single or double quote it should have an escape charter before it, usually a \
Example format of CSV
ID - address - name
1, "Some Address, Some Street, 10452", 'David O\'Brian'
New version supports the CSV format fully, including mixed use of " and , .
BULK INSERT Sales.Orders
FROM '\\SystemX\DiskZ\Sales\data\orders.csv'
WITH ( FORMAT='CSV');
I'd suggest to either use another format than CSV or try using other characters as field separator and/or text delimiter. Try looking for a character that isn't used in your data, e.g. |, #, ^ or #. The format of a single row would become
|foo|,|bar|,|baz, qux|
A well behave parser must not interpret 'baz' and 'qux' as two columns.
Alternatively, you could write your own import voodoo that fixes any problems. For the later, you might find this Groovy skeleton useful (not sure what languages you're fluent in though)
Most systems, including Excel, will allow for the column data to be enclosed in single quotes...
col1,col2,col3
'test1','my test2, with comma',test3
Another alternative is to use the Macintosh version of CSV, which uses TAB's as delimiters.
The best, quickest and easiest way to resolve the comma in data issue is to use Excel to save a comma separated file after having set Windows' list separator setting to something other than a comma (such as a pipe). This will then generate a pipe (or whatever) separated file for you that you can then import. This is described here.
I don't think adding quote could help.The best way I suggest is replacing the comma in the content with other marks like space or something.
replace(COLUMN,',',' ') as COLUMN
Appending a speech mark into the select column on both side works. You must also cast the column as a NVARCVHAR(MAX) to turn this into a string if the column is a TEXT.
SQLCMD -S DB-SERVER -E -Q "set nocount on; set ansi_warnings off; SELECT '""' + cast ([Column1] as nvarchar(max)) + '""' As TextHere, [Column2] As NormalColumn FROM [Database].[dbo].[Table]" /o output.tmp /s "," -W

Resources