Pentaho: Cannot import Boolean Value to table with PostgreSQL Bulk Loader? - database

I am currently trying to import some data (from a csv file) to a postgreSQL database. For this, I am using the CSV file input step to import the csv file into Kettle. Second, I am using the Modified Java Script Value step for altering some values and I am also adding a new column named VALID. This column should always be true. I added the column VALID to the fields in the lower half of the step window. My step looks like following:
To import the data from kettle to the PostgreSQL database table, I am using the PostgreSQL Bulk Loader (as there are millions of rows to import). This steps looks like following:
As you can see in this image, the table column name is valid and the stream field is VALID (which is coming from the Javascript Value step). Both boolean. Should be working. But instead, I am getting the following error message if I run the transformation:
2018/02/12 14:52:50 - PostgreSQL Bulk Loader.0 - Caused by:
org.postgresql.util.PSQLException: ERROR: invalid input syntax for type
boolean: "1.0"
Wobei: COPY adac_test, line 1, column valid: "1.0"
Any suggestions on how to fix this?

cast "valid" as string. In postgres the boolean values are "kind of" strings, not really but boolean values on postgres can be inserted as string:
https://www.postgresql.org/docs/9.6/static/datatype-boolean.html
based on this documentation , this should work:
var VALID = 't';
and select type string instead boolean below the code window

Related

Postgres table import gives "Invalid input syntax for double precision"

I exported a postgres table as CSV:
"id","notify_me","score","active","is_moderator","is_owner","is_creator","show_marks","course_id","user_id"
8,False,36,"A",False,True,True,True,2,8
29,False,0,"A",False,False,False,True,2,36
30,False,25,"A",False,False,False,True,2,37
33,False,2,"A",False,False,False,False,2,40
Then I tried to import it using pgadmin:
But I ended up getting following error:
I checked the values of Score column, but it doesnt contain value "A":
This is the existing data in the coursehistory table (for schema details):
Whats going wrong here?
PS:
Earlier there was grade column with all NULL values:
But it was giving me following error:
I got same error even using \copy
db=# \copy courseware_coursehistory FROM '/root/db_scripts/data/couse_cpp.csv' WITH (FORMAT csv)
ERROR: value too long for type character varying(2)
CONTEXT: COPY courseware_coursehistory, line 1, column grade: "NULL"
I felt that import utility will respect the order of column in the header of the csv, especially when there is header switch in the UI. Seems that it doesnt and just decides whether to start from first row or second.
This is your content, with an "A" as the fourth value:
8,False,36,"A",False,True,True,True,2,8
And the your table course_history, with the column "score" in fourth position, using a double precision.
The error message makes sense to me, an A is not a valid double precision.
Order of columns in the kind of import you are doing is relevant. If you need a more flexible way to do imports of csv files, you could use a python script that in fact takes into account your header; and column order is not relevant as long as names, types and no nulls are correct (for existing tables).
Like this:
import pandas as pd
from sqlalchemy import create_engine
engine=create_engine('postgresql://user:password#ip_host:5432/database_name')
data_df= pd.read_csv('course_cpp_courseid22.csv', sep=',', header=0)
data_df.to_sql('courseware_coursehistory', engine, schema='public', if_exists='append', index=False)
I ended up copying this CSV (also shown in postscript of original question; this also contains grade column and has no header row):
using \copy command in psql prompt.
Start psql prompt:
root#50ec9abb3214:~# psql -U user_role db_name
Copy from csv as explained here:
db_name=# \copy db_table FROM '/root/db_scripts/data/course_cpp2.csv' delimiter ',' NULL AS 'NULL' csv

Converting data in column in SSIS

I'm writing an SSIS package to load data from a .csv into a db.
There's a column in the csv file that is supposed to have a count, but the records sometimes have text, so I can't just load the data in as an integer. It looks something like this:
I want the data to land in the db destination as an integer instead of a string. I want the transformation to change any text to a 1, any blank value to a 1, and leave all the other numbers as-is.
My attempts have so far included using the Derived Column functionality, which I couldn't get the right expression(s) for it seems, and creating a temp table to run a sql query through the data, which kept breaking my data flow.
There are three approaches you can follow.
(1) Using a derived column
You should add a derived column with the following expression to check if the values are numeric or not:
(DT_I4)[count] == (DT_I4)[count] ? [count] : 1
Then in the derived column editor, go to the error output configuration and set the error handling event to Ignore failure.
Now add another derived column to replace null values with 1 :
REPLACENULL([count_derivedcolumn],1)
You can refer to the following article for a step-by-step guide:
Validate Numeric or Non-Numeric Data in SQL Server Integration Services without the Script Task
(2) Using a script component
If you know C# or Visual Basic.NET, you can add a script component to check if the value is numeric and replace nulls and string values with 1
(3) Update data in SQL
You can stage data in its initial form into the SQL database and use an update query to replace nulls and string values with 1 as follows:
UPDATE [staging_table]
SET [count] = 1
WHERE [count] IS NULL or ISNUMERIC([count]) = 0

Scientific Notation Issue while loading data from Excel (xlsx) file to SQL Tables via SSIS

I'm loading data from excel file (.xlsx) to SQL table using SSIS package. For one column it's adding scientific notations in the data, it's already there in the excel file. But it's actual value is not loading to SQL table. I tried multiple option of derived columns, expressions etc. But I couldn't get the proper value.
This column has data of numeric and nvarchar values. Below is the example of the column.
ApplicationNumber
1.43E+15
923576663
25388447
TXY020732087
18794588
TXAP0000140343
**Actual Values -**
ApplicationNumber
1425600000000000
923576663
25388447
TXY020732087
18794588
TXAP0000140343
There is no issue with data coming from Business to Excel. But how we can handle this scenario in SSIS ?
I also tried (DT_I8)ApplicationNumber==(DT_I8)ApplicationNumber, But it giving values for the above
1.43E+15 -> 1.430000000000000 and not the 1425600000000000
One thing you can do is set the output in advanced editor of the excel source as decimal with a large scale, 20 digits for example:
UPDATE
to consider also strings in the same column you may need to redirect the error output as these will throw a conversion error:
in advanced editor:
Default output:
Error output:
Then you can update your database from both the default and the error output.
I faced this problem recently using SSIS too.
1- Change the column type in Excel to "Number"
2- Remove the decimal positions.
3- Upload the file using SSIS

Import CSV to Microsoft SQL Server 2014 Wizard

I have a very simple (but big) CSV file and I want to import it to my database in Microsoft SQL Server 2014 (Database/Tasks/Import Data). But I receive the following error :
The conversion returned status value 2 and status text "The value could not be converted because of a potential loss of data".
Here is a sample of my CSV file (containing ~ 9 million rows) :
1393013,297884,'20150414 15:46:25'
1393010,301242,'20150414 15:46:58'
Ideally my first and second columns are big-int and the third is datetime. In the wizard, I choose 'unsigned 8 byte integer' for first two and 'timestamp' for the third and I receive the error. Even I try to use string for all three columns as data type and still I receive the same error.
I also tried using bcp command in command line. It errs nothing and inserts nothing! Also using "bulk insert" command errors me that :
the column is too long! verify your terminators
But they are correctly fixed!
I appreciate any idea you have as a solution to this simple-looking problem.
You are trying to change the input types: unsigned 8 byte integer is a setting on the source.
You don't need to change source setting at all. 'string [DT_STR]' and the default length of 50 will work.
'timestamp' is a binary type. I believe the type you are after is datetime, but that set is on the destination, not the source. The source is still a string regardless.
You still will not be able to import your date value as a datetime data type.
This would work though (added dashes) -> 2015-04-14 15:46:25. Import what you have as string and fix it after import unless you can get your text file changed.

SSIS Execute a Stored Procedure with the parameters from .CSV file SQL Server 2005

I'm learning SSIS and this seems like an easy task but I'm stuck.
I have a CSV file Orders.csv with this data:
ProductId,Quantity,CustomerId
1,1,104
2,1,105
3,2,106
I also have a stored procedure ssis_createorder that takes as input parameters:
#productid int
#quantity int
#customerid int
What I want to do is create an SSIS package that takes the .csv file as input and calls ssis_createorder three times for each row in the .csv file (the first row contains column names).
Here is what I have done so far.
I have created an SSIS package (Visual Studio 2005 & SQL Server 2005).
In Control Flow I have a Data Flow Task.
The Data Flow has a Flat File source of my .csv file. All of of the columns are mapped.
I have created a variable named orders of type Object. I also have variables CustomerId, ProductId, & Quantity of type int32.
Next I have a Recordset Destination that is assigning the contents of the .csv file into the varialbe orders. I'm not sure about how to use this tool. I'm setting the VariableName (under Customer Properties) to User::orders. I think that now orders holds an ADO record set made up of the contents from the original .csv file.
Next I'm adding a ForEach Loop Container on the Control Flow tag and linking it to the Data Flow Task.
Inside of the ForEach Loop Container I'm setting the Enumerator to "ForEach ADO Enumerator". I'm setting "ADO object source variable" to User::orders". For Enumeration mode I'm selecting "Rows in the first table".
In the Variable Mapping tab I have User::ProductId index 0, User::Quantity index 1, User::CustomerId index 2. I'm not sure if this is correct.
Next I have a Script Task inside of the ForEach Loop Container.
I have ReadOnlyVariables set to ProductId.
In the Main method this is what I'm doing:
Dim sProductId As String = Dts.Variables("ProductId").Value.ToString
MsgBox("sProductId")
When I run the package my ForEach Loop Container turns Bright Red and I get the following error messages
Error: 0xC001F009 at MasterTest: The type of the value being assigned to variable "User::ProductId" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
Error: 0xC001C012 at Foreach Loop Container: ForEach Variable Mapping number 1 to variable "User::ProductId" cannot be applied.
Error: 0xC001F009 at MasterTest: The type of the value being assigned to variable "User::Quantity" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
Error: 0xC001C012 at Foreach Loop Container: ForEach Variable Mapping number 2 to variable "User::Quantity" cannot be applied.
Error: 0xC001F009 at MasterTest: The type of the value being assigned to variable "User::CustomerId" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
Error: 0xC001C012 at Foreach Loop Container: ForEach Variable Mapping number 3 to variable "User::CustomerId" cannot be applied.
Warning: 0x80019002 at MasterTest: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (12) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "Package.dtsx" finished: Failure.
Dts.TaskResult = Dts.Results.Success
Any help would be appreciated
One of my coworkers just give me the answer.
You don't need the the ForEach Loop Container or the RecordSet Container.
All you need is the Flat File Source and an OLE DB Command. Connect to your database and inside the OLE DB Command select the appropriate connection.
In the Component Properties enter the following SQLCommand:
exec ssis_createorder ?, ?, ?
The "?" are place holders for the parameters.
Next under the Column Mappings tab map the .csv file columns to the stored procedure parameters.
You are finished go ahead and run the package.
Thanks Gary if you were on StackOverFlow I would give you an upvote and accept your answer.
If I understand correctly, what you want to do is execute a stored procedure 3 times for each row in the data source.
What if you just create a data flow with a flat file data source and pipe the data through 3 execute sql command tasks? Just map the columns in the data to the input params of your stored procedure.
Maybe I'm not seeing it correctly in your question and I'm thinking too simple, but in my experience you need to avoid using the foreach task in SSIS as much as possible.
I suspect that you need to look at your Data Flow task. It's likely that the values from the source CSV file are being interpreted as string values. You will probably need a Derived Column component or a Data Conversion component to convert your input values to the desired data type.
And, I think #StephaneT's solution would be good for executing the SP.
I'm not sure if this answers your question. But I was looking to do this and I achieved it using the BULK INSERT command. I created a staging table with all of the columns in the csv file, and instead of a stored procedure I used a INSTEAD OF INSERT trigger to handle the logic of inserting it into many tables.

Resources