I exported a postgres table as CSV:
"id","notify_me","score","active","is_moderator","is_owner","is_creator","show_marks","course_id","user_id"
8,False,36,"A",False,True,True,True,2,8
29,False,0,"A",False,False,False,True,2,36
30,False,25,"A",False,False,False,True,2,37
33,False,2,"A",False,False,False,False,2,40
Then I tried to import it using pgadmin:
But I ended up getting following error:
I checked the values of Score column, but it doesnt contain value "A":
This is the existing data in the coursehistory table (for schema details):
Whats going wrong here?
PS:
Earlier there was grade column with all NULL values:
But it was giving me following error:
I got same error even using \copy
db=# \copy courseware_coursehistory FROM '/root/db_scripts/data/couse_cpp.csv' WITH (FORMAT csv)
ERROR: value too long for type character varying(2)
CONTEXT: COPY courseware_coursehistory, line 1, column grade: "NULL"
I felt that import utility will respect the order of column in the header of the csv, especially when there is header switch in the UI. Seems that it doesnt and just decides whether to start from first row or second.
This is your content, with an "A" as the fourth value:
8,False,36,"A",False,True,True,True,2,8
And the your table course_history, with the column "score" in fourth position, using a double precision.
The error message makes sense to me, an A is not a valid double precision.
Order of columns in the kind of import you are doing is relevant. If you need a more flexible way to do imports of csv files, you could use a python script that in fact takes into account your header; and column order is not relevant as long as names, types and no nulls are correct (for existing tables).
Like this:
import pandas as pd
from sqlalchemy import create_engine
engine=create_engine('postgresql://user:password#ip_host:5432/database_name')
data_df= pd.read_csv('course_cpp_courseid22.csv', sep=',', header=0)
data_df.to_sql('courseware_coursehistory', engine, schema='public', if_exists='append', index=False)
I ended up copying this CSV (also shown in postscript of original question; this also contains grade column and has no header row):
using \copy command in psql prompt.
Start psql prompt:
root#50ec9abb3214:~# psql -U user_role db_name
Copy from csv as explained here:
db_name=# \copy db_table FROM '/root/db_scripts/data/course_cpp2.csv' delimiter ',' NULL AS 'NULL' csv
I have a CSV file that I'm trying to import into SQL Management Server Studio.
In Excel, the column giving me trouble looks like this:
Tasks > import data > Flat Source File > select file
I set the data type for this column to DT_NUMERIC, adjust the DataScale to 2 in order to get 2 decimal places, but when I click over to Preview, I see that it's clearly not recognizing the numbers appropriately:
The column mapping for this column is set to type = decimal; precision 18; scale 2.
Error message: Data Flow Task 1: Data conversion failed. The data conversion for column "Amount" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
(SQL Server Import and Export Wizard)
Can someone identify where I'm going wrong here? Thanks!
I believe I figured it out... the CSV Amount column was formatted such that the numbers still contained commas separating at the thousands mark. I adjusted XX,XXX.XX to XXXXX.XX it seems to have worked. –
So I have the following SOQL query that includes the ActivityHistories relationship of the Account object:
SELECT Id, Name, ParentId, (SELECT Description FROM ActivityHistories)
FROM Account
WHERE Name = '<some client>'
This query works just in in SOQLXplorer and returns 5 nested rows under the ActivityHistories key. In Talend, I am following the instructions from this page to access the sub-objects (although the example uses the query "up" syntax, not the query "down" syntax. My schema mapping is as follows:
The query returns the parent Account rows but not the ActivityHistory rows that are in the subquery:
Starting job GetActivities at 15:43 22/06/2016.
[statistics] connecting to socket on port XXXX
[statistics] connected
0X16000X00fQd61AAC|REI||
[statistics] disconnected
Job GetActivities ended at 15:43 22/06/2016. [exit code=0]
Is it possible to reference the subrows using Talend? If so, what is the syntax for the schema to do so? If not, how can I unpack this data in some ay to get to the Description fields for each Account? Any help is much appreciated.
Update: I have written a small python script to extract the ActivityHistory records and dump them in a file, then used a tFileInput to ingest the CSV and then continue through my process. But this seems very kludgey. Any better options out there?
I've done some debugging from the code perspective and it seems that if you specify correct column name, you will achieve the correct response. For your example, it should be: Account_ActivityHistories_records_Description
The output from tLogRow will be similar to:
00124000009gSHvAAM|Account1|tests;Lalalala
As you can see, the Description from all child elements are stored as 1 string, delimited by the semicolon. You can change the delimiter on Advanced Settings view on the SalesforceInput.
I have written a small python script (source gist here) to extract the ActivityHistory records and dump them in a file (command line argument), then used a tFileInput to ingest the CSV and then continue through my process.
I'm trying to load data from sqoop to sql server. I'm writing:
sqoop export --connect "jdbc:sqlserver://<server name>;username=<user>;
password=<pass>;database=<db>" --table test_out --input-fields-terminated-by ~
--export-dir /user/test.out
but I get error when row has blank string in test.out:
1~a
<nul>~b
<blank>~c
In this example, the third line returns an error:
Failed map tasks=1
Any ideas?
I would suggest to take a look into the Failed map task log as it usually contains quite detail information about the failure.
Sqoop will always expect that the number of columns on every line in the exported data will be equal to the number of columns in target table. While exporting from text file, the number of columns will be determined by the number of separators present on the line. Based on provided example it seems that the target table have 2 columns, whereas the third line have zero separators and thus it's assume that it's one single column. This discrepancy would cause Sqoop to fail.
I am trying to import some data into SQL Server using the Bulk insert command--
This is the error I am getting--
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 6 (NClaims).
Now, I created a test file with only one row of data which I was able to import successfully--
00000005^^18360810^408^30^0
However when I added 2 more rows of data (which are very similar to the row above) I got the error message that I have given above. These are the 2 additional rows of data--
00000003^^18360801^142^42^0
00000004^^18360000^142^10^0
As you can see there does not seem to be any difference (in terms of data length or data types for the 2 rows above compared with the single row given previously)... So why am I getting this error? How do I fix it?
EDIT--
This is the command I am executing--
BULK INSERT GooglePatentsIndividualDec2012.dbo.patent
FROM 'C:\Arvind Google Patents Data\patents\1patents_test.csv'
WITH ( FIELDTERMINATOR = '^', ROWTERMINATOR='\n');
Be patient and make experiments excluding one thing at a time. For example:
Remove third row and check if everything ok.
If yes, return this row but change 10^0 to 42^0, check again.
Repeat step 2 with all data changing it to values in row 2 which is ok.
You will find the piece of data which causes error.