I'm trying to load data from sqoop to sql server. I'm writing:
sqoop export --connect "jdbc:sqlserver://<server name>;username=<user>;
password=<pass>;database=<db>" --table test_out --input-fields-terminated-by ~
--export-dir /user/test.out
but I get error when row has blank string in test.out:
1~a
<nul>~b
<blank>~c
In this example, the third line returns an error:
Failed map tasks=1
Any ideas?
I would suggest to take a look into the Failed map task log as it usually contains quite detail information about the failure.
Sqoop will always expect that the number of columns on every line in the exported data will be equal to the number of columns in target table. While exporting from text file, the number of columns will be determined by the number of separators present on the line. Based on provided example it seems that the target table have 2 columns, whereas the third line have zero separators and thus it's assume that it's one single column. This discrepancy would cause Sqoop to fail.
Related
I exported a postgres table as CSV:
"id","notify_me","score","active","is_moderator","is_owner","is_creator","show_marks","course_id","user_id"
8,False,36,"A",False,True,True,True,2,8
29,False,0,"A",False,False,False,True,2,36
30,False,25,"A",False,False,False,True,2,37
33,False,2,"A",False,False,False,False,2,40
Then I tried to import it using pgadmin:
But I ended up getting following error:
I checked the values of Score column, but it doesnt contain value "A":
This is the existing data in the coursehistory table (for schema details):
Whats going wrong here?
PS:
Earlier there was grade column with all NULL values:
But it was giving me following error:
I got same error even using \copy
db=# \copy courseware_coursehistory FROM '/root/db_scripts/data/couse_cpp.csv' WITH (FORMAT csv)
ERROR: value too long for type character varying(2)
CONTEXT: COPY courseware_coursehistory, line 1, column grade: "NULL"
I felt that import utility will respect the order of column in the header of the csv, especially when there is header switch in the UI. Seems that it doesnt and just decides whether to start from first row or second.
This is your content, with an "A" as the fourth value:
8,False,36,"A",False,True,True,True,2,8
And the your table course_history, with the column "score" in fourth position, using a double precision.
The error message makes sense to me, an A is not a valid double precision.
Order of columns in the kind of import you are doing is relevant. If you need a more flexible way to do imports of csv files, you could use a python script that in fact takes into account your header; and column order is not relevant as long as names, types and no nulls are correct (for existing tables).
Like this:
import pandas as pd
from sqlalchemy import create_engine
engine=create_engine('postgresql://user:password#ip_host:5432/database_name')
data_df= pd.read_csv('course_cpp_courseid22.csv', sep=',', header=0)
data_df.to_sql('courseware_coursehistory', engine, schema='public', if_exists='append', index=False)
I ended up copying this CSV (also shown in postscript of original question; this also contains grade column and has no header row):
using \copy command in psql prompt.
Start psql prompt:
root#50ec9abb3214:~# psql -U user_role db_name
Copy from csv as explained here:
db_name=# \copy db_table FROM '/root/db_scripts/data/course_cpp2.csv' delimiter ',' NULL AS 'NULL' csv
Whenever I try to import a CSV file into sql server with more than one column I get an error (well, nothing is imported). I know the file is terminated fine because it works with 1 column ok if I modify the file and table. I am limiting the rows so it never gets to the end, the line terminator is the correct and valid one (also shown by working when having 1 column only).
All I get is this and no errors
0 rows affected
I've also check all the other various questions like this and they all point to a bad end of file or line terminator, but all is well here...
I have tried quotes and no quotes. For example, I have a table with 2 columns of varchar(max).
I run:
bulk insert mytable from 'file.csv' WITH (FIRSTROW=2,lastrow=4,rowterminator='\n')
My sample file is:
name,status
TEST00040697,OK
TEST00042142,OK
TEST00042782,OK
TEST00043431,BT
If I drop a column then delete the second column in the csv ensuring it has the same line terminator \n, it works just fine.
I have also tried specifying the 'errorfile' parameter but it never seems to write anything or even create the file.
Well, that was embarrassing.
SQL Server in it's wisdom is using \t as the default field terminator for a CSV file, but I guess when the documentation says 'FORMAT = 'CSV'' it's an example and not the default.
If only it produced actual proper and useful error messages...
I have data in the csv file similar to this:
Name,Age,Location,Score
"Bob, B",34,Boston,0
"Mike, M",76,Miami,678
"Rachel, R",17,Richmond,"1,234"
While trying to BULK INSERT this data into a SQL Server table, I encountered two problems.
If I use FIELDTERMINATOR=',' then it splits the first (and sometimes the last) column
The last column is an integer column but it has quotes and comma thousand separator whenever the number is greater than 1000
Is there a way to import this data (using XML Format File or whatever) without manually parsing the csv file first?
I appreciate any help. Thanks.
You can parse the file with http://filehelpers.sourceforge.net/
And with that result, use the approach here: SQL Bulkcopy YYYYMMDD problem or straight into SqlBulkCopy
Use MySQL load data:
LOAD DATA LOCAL INFILE 'path-to-/filename.csv' INTO TABLE `sql_tablename`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
IGNORE 1 LINES;
The part optionally enclosed by '\"', or escape character and quote, will keep the data in the first column together for the first field.
IGNORE 1 LINES will leave the field name row out.
UTF8 line is optional but good to use if names have diacritics, like in José.
I am trying to import some data into SQL Server using the Bulk insert command--
This is the error I am getting--
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 6 (NClaims).
Now, I created a test file with only one row of data which I was able to import successfully--
00000005^^18360810^408^30^0
However when I added 2 more rows of data (which are very similar to the row above) I got the error message that I have given above. These are the 2 additional rows of data--
00000003^^18360801^142^42^0
00000004^^18360000^142^10^0
As you can see there does not seem to be any difference (in terms of data length or data types for the 2 rows above compared with the single row given previously)... So why am I getting this error? How do I fix it?
EDIT--
This is the command I am executing--
BULK INSERT GooglePatentsIndividualDec2012.dbo.patent
FROM 'C:\Arvind Google Patents Data\patents\1patents_test.csv'
WITH ( FIELDTERMINATOR = '^', ROWTERMINATOR='\n');
Be patient and make experiments excluding one thing at a time. For example:
Remove third row and check if everything ok.
If yes, return this row but change 10^0 to 42^0, check again.
Repeat step 2 with all data changing it to values in row 2 which is ok.
You will find the piece of data which causes error.
Summary : Is there a limit to the number of columns which can be Imported/Loaded from a CSV file? If yes, what is the workaround? Thanks
I am very new to DB2, and I am supposed to import a | (pipe) delimited csv file which contains 532 columns into a DB2 table which also has 532 columns in exact positions as the csv. I also have a smaller file with only 27 columns in both csv and table. I am using the following command:
IMPORT FROM "C:\myfile.csv" OF DEL MODIFIED BY COLDEL| METHOD P (1, 2,....27) MESSAGES "C:\messages.txt" INSERT INTO PRE_SUBS_GPRS2_1010 (col1,col2,....col27);
This works fine.
But in the second file, which is like:
IMPORT FROM "C:\myfile.csv" OF DEL MODIFIED BY COLDEL| METHOD P (1, 2,....532) MESSAGES "C:\messages.txt" INSERT INTO PRE_SUBS_GPRS_1010 (col1,col2,....col532);
It does not work. It gives me an error that says:
SQL3037N An SQL error "-206" occurred during Import processing.
Explanation:
An SQL error occurred during processing of the Action String (for
example, "REPLACE into ...") parameter.
The command cannot be processed.
User Response:
Look at the SQLCODE (message number) in the message for more
information. Make changes and resubmit the command.
I am using the Control Center to run the query, not command prompt.
The problem was because one of the column names in the list of columns of the INSERT statement was more than 30 characters long. It was getting truncated and was not recognized.
Hope this helps others in future. Please let me know if you need further details.
The specific error code is SQL0206 and the documentation about this error is here
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.messages.sql.doc/doc/msql00206n.html
For the limits, I think the maximal quantity of columns in an import should be the maximal quantity permitted for a Table. Take a look in the information center
Database fundamentals > SQL > SQL and XML limits
Maximum number of columns in a table 7 1012
Try to import just one row. If you have problems, probably is due to incompatibility of types, column order, duplicated rows with the already present in the table.