I'm successfully reading numbers from a .csv file into SQL Server with below statement, assuming that I've created a linked server named CSV_IMPORT.
select *
from CSV_IMPORT...Sophos#csv
However, the problem is that if I have a comma with numbers as data, it will show NULL instead of a correct one. How can I read the "54,375" correctly into SQL Server? Thank you very much for your help.
Below is the data in csv file.
09/07/2017,52029,70813,10898,6691,6849,122,25,147427
09/08/2017,47165,61253,6840,5949,5517,75,2,126801
09/14/2017,"54,375","16944","15616","2592","3280",380,25,"96390"
This is the result from the statement:
2017-09-07 00:00:00.000 52029 70813 10898 6691 6849 122 25 147427
2017-09-08 00:00:00.000 47165 61253 6840 5949 5517 75 2 126801
2017-09-14 00:00:00.000 NULL 16944 15616 2592 3280 380 25 96390
One way to go would be using temporary table. Read all data as text, then replace every comma in whole table to dot (.), if you want it as decimal separator, or to empty string('') if it's a thousand separator, then load data to exisitng table converting everything (you don't have to do it explicitly, SQL does it implictly).
Last year I did a project for a client which involved importing csv files, which were meant to be the same format, but which came from different sources and hence were inconsistent (even to the point of using different separators according to source). I ended up writing a CLR routine, which read the csv line by line, parsed the content, and added it to a DataTable. This DataTable I then inserted to SQL Server using the SqlBulkCopy class.
The advantage of this approach, was that I was totally in control of dealing with all anomalies in the file. It was also much faster than the alternative of inserting the whole file into a temporary table of varchars and then parsing within SQL Server. Effectively I did one line by line parse in c# and one bulk insert of parsed data.
Related
I have a file dump which needs to be imported into SQL Server on a daily basis, which I have created a scheduled task to do this without any attendant. All CSV files are decimated by ',' and it's a Windows CR/LF file encoded with UTF-8.
To import data from these CSV files, I mainly use OpenRowset. It works well until I ran into a file in which there's a value of "S7". If the file contains the value of "S7" then that column will be recognized as datatype of numeric while doing the OpenRowset import and which will lead to a failure for other alphabetic characters to be imported, leaving only NULL values.
This is by far I had tried:
Using IMEX=1: openrowset('Microsoft.ACE.OLEDB.15.0','text;IMEX=1;HDR=Yes;
Using text driver: OpenRowset('MSDASQL','Driver=Microsoft Access Text Driver (*.txt, *.csv);
Using Bulk Insert with or without a format file.
The interesting part is that if I use Bulk Insert, it will give me a warning of unexpected end of file. To solve this, I have tried to use various row terminator indicators like '0x0a','\n', '\r\n' or not designated any, but they all failed. And finally I managed to import some of the records which using a row terminator of ',\n'. However the original file contains like 1000 records and only 100 will be imported, without any notice of errors or warnings.
Any tips or helps would be much appreciated.
Edit 1:
The file is ended with a newline character, from which I can tell from notepad++. I managed to import files which give me an error of unexpected end of file by removing the last record in those files. However even with this method, that I still can not import all records, only a partial of which can be imported.
I am trying to load data from an Excel .csv file to a flat file format to use as a datasource in a Data Services job data flow which then transfers the data to an SQL-Server (2012) database table.
I consistently lose 1 in 6 records.
I have tried various parameter values in the file format definition and settled on setting Adaptable file scheme to "Yes", file type "delimited", column delimeter "comma", row delimeter {windows new line}, Text delimeter ", language eng(English) and all else as defaults.
I have also set "write errors to file" to "yes" but it just creates an empty error file (I expected the 6,000 odd unloaded rows to be in here).
If we strip out three of the columns containing special characters (visible in XL) it loads a treat so I think these characters are the problem.
The thing is, we need the data in those columns and unfortunately, this .csv file is as good a data source as we are likely to get and it is always likely to contain special characters in these three columns so we need to be able to read it in if possible.
Should I try to specifically strip the columns in the Query source component of the dataflow? Am I missing a data-cleansing trick in the query or file format definition?
OK so didn't get the answer I was looking for but did get it to work by setting the "Row within Text String" parameter to "Row delimiter".
I have data in the csv file similar to this:
Name,Age,Location,Score
"Bob, B",34,Boston,0
"Mike, M",76,Miami,678
"Rachel, R",17,Richmond,"1,234"
While trying to BULK INSERT this data into a SQL Server table, I encountered two problems.
If I use FIELDTERMINATOR=',' then it splits the first (and sometimes the last) column
The last column is an integer column but it has quotes and comma thousand separator whenever the number is greater than 1000
Is there a way to import this data (using XML Format File or whatever) without manually parsing the csv file first?
I appreciate any help. Thanks.
You can parse the file with http://filehelpers.sourceforge.net/
And with that result, use the approach here: SQL Bulkcopy YYYYMMDD problem or straight into SqlBulkCopy
Use MySQL load data:
LOAD DATA LOCAL INFILE 'path-to-/filename.csv' INTO TABLE `sql_tablename`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
IGNORE 1 LINES;
The part optionally enclosed by '\"', or escape character and quote, will keep the data in the first column together for the first field.
IGNORE 1 LINES will leave the field name row out.
UTF8 line is optional but good to use if names have diacritics, like in José.
Introduction : I have Multiple Excel files which loop through a Foreach Loop Container in SSIS Package.
The first Excel file Excel1.xlsx contains the old data (for example :I have a column named EffectiveDate populated with 2001-01-01 to 2013-04-01of
The second Excel file Excel2.xlsx contains the new entries with EffectiveDate from 2013-05-01 and also contains some old data from Excel1.xlsx.
These two files loop through Foreach Loop Container.
Problem : Once the first Excel file Excel1.xlsx is loaded , i want to compare it with second Excel file Excel2.xlsx and update the EffectiveDate of old data in Excel2.xlsx with EffectiveDate of matching rows in Excel1.xlsx
And all other rows( or new Entires) of Excel2.xlsx with GetDate().
Is it possible to get it done in a single Data Flow Task?
And also how do i compare two excel files in a single container?
You can have 2 Excel sources within one data flow task. You could use a merge join to compare the values, and feed that to an excel output.
If you want to loop through 10 excel files, comparing 1 to another, I would suggest that your merge join output be the 2nd excel source, and map your container variable to the first excel source. That way, Everything from Excel file 1 will be put into the output file, then for each subsequent file only the entries not listed already in the output file will be added.
If you get hung up on any of the steps individually I'm sure myself or others can help you push through the sticking points.
I have created a very simple Data Flow in SSIS that is run inside a loop.
IMAGE 1 http://img407.imageshack.us/img407/1553/step1f.jpg
I have a simple OLE DB Source control which is connecting to a SQL Server and running quite a complex query to split daily data by 30 minute intervals as shown below.
IMAGE 2 http://img168.imageshack.us/img168/857/step2vs.jpg
I then have a Flat File Destination control which is taking the output from the OLE DB Source control and saving it as a comma-delimited CSV file. As you can see above the numbers are decimal numbers to two decimal places but in the CSV file below it is showing as ones and zeros.
IMAGE 3 http://img341.imageshack.us/img341/5494/step3w.jpg
What can I do to get the CSV output to match the query input? I have tried converting the numbers to varchar in the query but I got the same result. I also tried changing the column types in the Connection Manager too but got the same result.
I have managed to resolve this issue by changing all the DataType properties for each column of data I was importing. I had to change them to 'double-precision float [DT_R8]' and then it saved the CSV with the proper decimal values.
Very annoying, I hope that helps someone.
IMAGE 4 http://img687.imageshack.us/img687/3749/step4dp.jpg