Loading 532 columns from a CSV file into a DB2 table - database

Summary : Is there a limit to the number of columns which can be Imported/Loaded from a CSV file? If yes, what is the workaround? Thanks
I am very new to DB2, and I am supposed to import a | (pipe) delimited csv file which contains 532 columns into a DB2 table which also has 532 columns in exact positions as the csv. I also have a smaller file with only 27 columns in both csv and table. I am using the following command:
IMPORT FROM "C:\myfile.csv" OF DEL MODIFIED BY COLDEL| METHOD P (1, 2,....27) MESSAGES "C:\messages.txt" INSERT INTO PRE_SUBS_GPRS2_1010 (col1,col2,....col27);
This works fine.
But in the second file, which is like:
IMPORT FROM "C:\myfile.csv" OF DEL MODIFIED BY COLDEL| METHOD P (1, 2,....532) MESSAGES "C:\messages.txt" INSERT INTO PRE_SUBS_GPRS_1010 (col1,col2,....col532);
It does not work. It gives me an error that says:
SQL3037N An SQL error "-206" occurred during Import processing.
Explanation:
An SQL error occurred during processing of the Action String (for
example, "REPLACE into ...") parameter.
The command cannot be processed.
User Response:
Look at the SQLCODE (message number) in the message for more
information. Make changes and resubmit the command.
I am using the Control Center to run the query, not command prompt.

The problem was because one of the column names in the list of columns of the INSERT statement was more than 30 characters long. It was getting truncated and was not recognized.
Hope this helps others in future. Please let me know if you need further details.

The specific error code is SQL0206 and the documentation about this error is here
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.messages.sql.doc/doc/msql00206n.html
For the limits, I think the maximal quantity of columns in an import should be the maximal quantity permitted for a Table. Take a look in the information center
Database fundamentals > SQL > SQL and XML limits
Maximum number of columns in a table 7 1012
Try to import just one row. If you have problems, probably is due to incompatibility of types, column order, duplicated rows with the already present in the table.

Related

Postgres table import gives "Invalid input syntax for double precision"

I exported a postgres table as CSV:
"id","notify_me","score","active","is_moderator","is_owner","is_creator","show_marks","course_id","user_id"
8,False,36,"A",False,True,True,True,2,8
29,False,0,"A",False,False,False,True,2,36
30,False,25,"A",False,False,False,True,2,37
33,False,2,"A",False,False,False,False,2,40
Then I tried to import it using pgadmin:
But I ended up getting following error:
I checked the values of Score column, but it doesnt contain value "A":
This is the existing data in the coursehistory table (for schema details):
Whats going wrong here?
PS:
Earlier there was grade column with all NULL values:
But it was giving me following error:
I got same error even using \copy
db=# \copy courseware_coursehistory FROM '/root/db_scripts/data/couse_cpp.csv' WITH (FORMAT csv)
ERROR: value too long for type character varying(2)
CONTEXT: COPY courseware_coursehistory, line 1, column grade: "NULL"
I felt that import utility will respect the order of column in the header of the csv, especially when there is header switch in the UI. Seems that it doesnt and just decides whether to start from first row or second.
This is your content, with an "A" as the fourth value:
8,False,36,"A",False,True,True,True,2,8
And the your table course_history, with the column "score" in fourth position, using a double precision.
The error message makes sense to me, an A is not a valid double precision.
Order of columns in the kind of import you are doing is relevant. If you need a more flexible way to do imports of csv files, you could use a python script that in fact takes into account your header; and column order is not relevant as long as names, types and no nulls are correct (for existing tables).
Like this:
import pandas as pd
from sqlalchemy import create_engine
engine=create_engine('postgresql://user:password#ip_host:5432/database_name')
data_df= pd.read_csv('course_cpp_courseid22.csv', sep=',', header=0)
data_df.to_sql('courseware_coursehistory', engine, schema='public', if_exists='append', index=False)
I ended up copying this CSV (also shown in postscript of original question; this also contains grade column and has no header row):
using \copy command in psql prompt.
Start psql prompt:
root#50ec9abb3214:~# psql -U user_role db_name
Copy from csv as explained here:
db_name=# \copy db_table FROM '/root/db_scripts/data/course_cpp2.csv' delimiter ',' NULL AS 'NULL' csv

SQL Import Wizard - Error

I have a CSV file that I'm trying to import into SQL Management Server Studio.
In Excel, the column giving me trouble looks like this:
Tasks > import data > Flat Source File > select file
I set the data type for this column to DT_NUMERIC, adjust the DataScale to 2 in order to get 2 decimal places, but when I click over to Preview, I see that it's clearly not recognizing the numbers appropriately:
The column mapping for this column is set to type = decimal; precision 18; scale 2.
Error message: Data Flow Task 1: Data conversion failed. The data conversion for column "Amount" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
(SQL Server Import and Export Wizard)
Can someone identify where I'm going wrong here? Thanks!
I believe I figured it out... the CSV Amount column was formatted such that the numbers still contained commas separating at the thousands mark. I adjusted XX,XXX.XX to XXXXX.XX it seems to have worked. –

How to reference sub-objects in talend schema

So I have the following SOQL query that includes the ActivityHistories relationship of the Account object:
SELECT Id, Name, ParentId, (SELECT Description FROM ActivityHistories)
FROM Account
WHERE Name = '<some client>'
This query works just in in SOQLXplorer and returns 5 nested rows under the ActivityHistories key. In Talend, I am following the instructions from this page to access the sub-objects (although the example uses the query "up" syntax, not the query "down" syntax. My schema mapping is as follows:
The query returns the parent Account rows but not the ActivityHistory rows that are in the subquery:
Starting job GetActivities at 15:43 22/06/2016.
[statistics] connecting to socket on port XXXX
[statistics] connected
0X16000X00fQd61AAC|REI||
[statistics] disconnected
Job GetActivities ended at 15:43 22/06/2016. [exit code=0]
Is it possible to reference the subrows using Talend? If so, what is the syntax for the schema to do so? If not, how can I unpack this data in some ay to get to the Description fields for each Account? Any help is much appreciated.
Update: I have written a small python script to extract the ActivityHistory records and dump them in a file, then used a tFileInput to ingest the CSV and then continue through my process. But this seems very kludgey. Any better options out there?
I've done some debugging from the code perspective and it seems that if you specify correct column name, you will achieve the correct response. For your example, it should be: Account_ActivityHistories_records_Description
The output from tLogRow will be similar to:
00124000009gSHvAAM|Account1|tests;Lalalala
As you can see, the Description from all child elements are stored as 1 string, delimited by the semicolon. You can change the delimiter on Advanced Settings view on the SalesforceInput.
I have written a small python script (source gist here) to extract the ActivityHistory records and dump them in a file (command line argument), then used a tFileInput to ingest the CSV and then continue through my process.

Sqoop to SQL Server. Blank string error

I'm trying to load data from sqoop to sql server. I'm writing:
sqoop export --connect "jdbc:sqlserver://<server name>;username=<user>;
password=<pass>;database=<db>" --table test_out --input-fields-terminated-by ~
--export-dir /user/test.out
but I get error when row has blank string in test.out:
1~a
<nul>~b
<blank>~c
In this example, the third line returns an error:
Failed map tasks=1
Any ideas?
I would suggest to take a look into the Failed map task log as it usually contains quite detail information about the failure.
Sqoop will always expect that the number of columns on every line in the exported data will be equal to the number of columns in target table. While exporting from text file, the number of columns will be determined by the number of separators present on the line. Based on provided example it seems that the target table have 2 columns, whereas the third line have zero separators and thus it's assume that it's one single column. This discrepancy would cause Sqoop to fail.

"Data conversion error" with BULK INSERT

I am trying to import some data into SQL Server using the Bulk insert command--
This is the error I am getting--
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 6 (NClaims).
Now, I created a test file with only one row of data which I was able to import successfully--
00000005^^18360810^408^30^0
However when I added 2 more rows of data (which are very similar to the row above) I got the error message that I have given above. These are the 2 additional rows of data--
00000003^^18360801^142^42^0
00000004^^18360000^142^10^0
As you can see there does not seem to be any difference (in terms of data length or data types for the 2 rows above compared with the single row given previously)... So why am I getting this error? How do I fix it?
EDIT--
This is the command I am executing--
BULK INSERT GooglePatentsIndividualDec2012.dbo.patent
FROM 'C:\Arvind Google Patents Data\patents\1patents_test.csv'
WITH ( FIELDTERMINATOR = '^', ROWTERMINATOR='\n');
Be patient and make experiments excluding one thing at a time. For example:
Remove third row and check if everything ok.
If yes, return this row but change 10^0 to 42^0, check again.
Repeat step 2 with all data changing it to values in row 2 which is ok.
You will find the piece of data which causes error.

Resources