Retaining Header name while importing flat file into sql - sql-server

I am trying to import a flat file into sql. My headers look like this in notepad
SCSItem.[Item],SCSItem.[PhaseOutItemType]
But when I import this into sql using "Import Data" it removes the period and the bracket. This is what it looks like after the import
Is there a way to retain the header info ?

Both periods . and square brackets [] are reserved syntax in SQL Server. If you want the field name to be:
SCSItem.[Item]
Then you need to use the other, ANSI Standard, identifier quote, which is ".
For example:
CREATE TABLE has_brackets
("SCSItem.[Item]" nvarchar(100)
,"SCSItem.[PhaseOutItemType]" nvarchar(100)
);

Related

Postgres table import gives "Invalid input syntax for double precision"

I exported a postgres table as CSV:
"id","notify_me","score","active","is_moderator","is_owner","is_creator","show_marks","course_id","user_id"
8,False,36,"A",False,True,True,True,2,8
29,False,0,"A",False,False,False,True,2,36
30,False,25,"A",False,False,False,True,2,37
33,False,2,"A",False,False,False,False,2,40
Then I tried to import it using pgadmin:
But I ended up getting following error:
I checked the values of Score column, but it doesnt contain value "A":
This is the existing data in the coursehistory table (for schema details):
Whats going wrong here?
PS:
Earlier there was grade column with all NULL values:
But it was giving me following error:
I got same error even using \copy
db=# \copy courseware_coursehistory FROM '/root/db_scripts/data/couse_cpp.csv' WITH (FORMAT csv)
ERROR: value too long for type character varying(2)
CONTEXT: COPY courseware_coursehistory, line 1, column grade: "NULL"
I felt that import utility will respect the order of column in the header of the csv, especially when there is header switch in the UI. Seems that it doesnt and just decides whether to start from first row or second.
This is your content, with an "A" as the fourth value:
8,False,36,"A",False,True,True,True,2,8
And the your table course_history, with the column "score" in fourth position, using a double precision.
The error message makes sense to me, an A is not a valid double precision.
Order of columns in the kind of import you are doing is relevant. If you need a more flexible way to do imports of csv files, you could use a python script that in fact takes into account your header; and column order is not relevant as long as names, types and no nulls are correct (for existing tables).
Like this:
import pandas as pd
from sqlalchemy import create_engine
engine=create_engine('postgresql://user:password#ip_host:5432/database_name')
data_df= pd.read_csv('course_cpp_courseid22.csv', sep=',', header=0)
data_df.to_sql('courseware_coursehistory', engine, schema='public', if_exists='append', index=False)
I ended up copying this CSV (also shown in postscript of original question; this also contains grade column and has no header row):
using \copy command in psql prompt.
Start psql prompt:
root#50ec9abb3214:~# psql -U user_role db_name
Copy from csv as explained here:
db_name=# \copy db_table FROM '/root/db_scripts/data/course_cpp2.csv' delimiter ',' NULL AS 'NULL' csv

Importing data from excel to SAS

One interview question--I didn't get answer of this, please help me to solve this.
In excel file variable name having space in between (e.g- Shop Name), if we will bring excel data to the sas. How we will bring as it is, bcz in sas dataset, space is not allowed between the variable name?
Code:
proc import datafile='/home/roshnigupta16020/test (2).xlsx' out=testexcel dbms=xlsx replace; getnames=yes; run;
Your PROC IMPORT syntax is good.
proc import datafile='/home/roshnigupta16020/test (2).xlsx'
out=testexcel dbms=xlsx replace
;
getnames=yes;
run;
Depending on the setting of the VALIDVARNAME option PROC IMPORT will create different names for the variables.
With VALIDVARNAME=ANY the names will include the spaces. Which means that to use the name in your SAS code you will need to use name literals, like 'Column 1'n.
With other settings, like VALIDVARNAME=V7, then PROC IMPORT will replace invalid characters, like spaces, with underscores. Then the name will not exactly match the column header in the spreadsheet. But the name will be something like Column_1 which is easier to include in your SAS code.

How to import a .csv file with double quotes in column values to SQL table

I am trying to import the data from a .csv file to SQL table using SSIS data flow task. One row in my .csv file is like
Col1,Col2,Col3
1200,"ABC","Value is \"greater\" than expected"
While creating the Flat file connection, I have given Comma as Delimiter and " as Qualifier. And created a derived column (REPLACE(Col3,"\"","")) as the second step to remove \" from column3.
But as soon as I start running the package I get an error in the Flat file source itself as "Column delimiter for col3 was not found".
Can someone please guide me in solving this issue?
You may need to escape the slash too, try this please and let us know:
(REPLACE(Col3,"\\\"",""))

Import csv file via SSMS where text fields contain extra quotes

I'm trying to import a customer csv file via SSMS Import Wizard, file contains 1 million rows and I'm having trouble importing where the field has extra quotes e.g. the file has ben populated freehand so could contain anything.
Name, Address
"John","Liverpool"
"Paul",""New York"""
"Ringo","London|,"
"George","India"""
Before I press on looking into SSMS should SSMS 2016 handle this now or do I have to do in SSIS, it is a one off load to check something?
In SSMS Import/Export Wizard, when configuring the Flat File Source you have to set:
Text Qualifier = "
Column Delimiter = ,
This will import the file as the following:
Name Address
John Liverpool
Paul "New York""
Ringo London|,
George India
The remaining double quotes must be removed after import is done using SQL, or you have to create an SSIS package manually using Visual Studio and add some transformation to clean data.

Import csv to SQLServer when there are spaces after the text qualifier

I have a csv file with a column GeoCodes. This uses " as text qualifier.
I am trying to import this into SQLServer using the SQL Server Import Wizard.
The problem with the data is, if there is no GeoCode the csv file will sometimes output the GeoCode as " " and then several spaces. This errors when trying to import the data as it picks up the data within the text qualifier and then there are these spaces before the next comma delimiter.
An example of the data below. The Pontypandy row is the row that errors.
Place ,Geo Codes ,Type
Northpole ,"90.0000,0.0000 ",Pole
Southpole ,"-90.0000,0.0000 ",Pole
Pyramids ,"29.9765,31.1313 ",BigTriangle
France ," ",Country
Pontypandy ," " ,City
I have to use the text qualifiers as there is a comma in the GeoCodes.
I have no say on how the data is sent to me and therefore have to deal with the data as is.
As a work around I have to do a find and replace on the data in notepad first before importing. This adds an extra step to the job that hopefully isn't needed.
Is there anyway I can get around the " " spaces during the import?
As an extra note, I don't currently have access to SSIS but if it can be done in there any easier then please answer with that as it could help me justify getting SSIS (I might have to remove this comment later if I have to show it to my manager).
If your data really is the way you show above you can use fixed width format. Import the data as is and replace the " afterwards. This is not the best solution.
Much better: pipe the import file through sed before importing. This is not only much faster, but in all cases, when data is larger than your RAM the only easy way (OK, there are some other). All you need is sed at operation system level. If you can copy the executable somewhere it's all you need. If you want to replace "[any number of blanks], with ", this is the regex should be:
cat myfile.txt|sed -b -e "s/\" *,/\",/">yournewfile.txt
The regex is easy once you get the idea:
- s means Substitute,
- /first /second/ means look for first and replace with second,
- \" is the escaped " (because of DOS)
- Space and * means any number of spaces
- , means ,
On a lot of systems sed is still there (cygwin). Have fun!
Two methods of Bulk Insert
Row-based Bulk Insert
Most Useful when you have string-qualified columns in CSV
You will need to first create a table with two-fields: identity & varchar(max); identity will signify the row-count & varchar(max) will be your row data
Create a view that only selects the varchar(max) field from the table above
Bulk Insert syntax will look something like this:
BULK INSERT AdventureWorks2012.Sales.v_SalesOrderDetail
FROM 'f:\orders\lineitem.csv'
WITH (
ROWTERMINATOR =' |\n'
);
Columnar-based Insert:
Most use this widely but is only useful and reliable when there are no string qualified columns.
Use most common Bulk Insert syntax with RowTerminator and LineTerminator options
References:
Bulk-Insert Syntax: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql#examples
Bulk-Insert with View: https://technet.microsoft.com/en-us/library/ms179250(v=sql.105).aspx
Bulk-Insert with Table: https://technet.microsoft.com/en-us/library/ms187086(v=sql.105).aspx

Resources