Loop inside Pentaho Data Integration Transformation - loops

I have a transformation as
where the text file is in the following format:
For each of the t_cmp(the number of t_cmp is not known prior) in the text file, I want to execute Read Company
But it is giving error as
Can anyone please tell me where am I going wrong?

You need to pass 3 rows, each with 1 field, instead of a single row with 3 fields.
The number of fields must match the number of parameters of your query.
So, in short, transpose your data. Either:
read line as a single field then use Split field to rows
or read as now and use Row normalizer
Both approaches should work.

Related

How to transform a list of numbers into seperate elements talend tmap

I have a list of codes for each state as an input stored in a table.
What I want as output is this , using tmap transformations
This is the job I made but it doesn't seem to work correctly like I want. The output should have 1000 rows.
Does anybody know how to solve this?
One way to do this is to convert your list to a string of comma-separated values (using tConvertType for instance) and then use tNormalize to split this string into individual rows.

Handling truncation error in derived column in data flow task

I have a data flow task which contains a derived column. The derived column transforms a CSV file column, lets say A which is order number, to a data type char with length 10.
This works perfectly fine when the text file column is equal to or less than 10 characters. Of course, it throws an error when column A order number is more than 10 characters.
The column A (error prone).
12PR567890
254W895X98
ABC 56987K5239
485P971259 SPTGER
459745WERT
I would like to catch the error prone records and extract the order number only.
I already can configure error output from the derived column. But, this just ignores the error records and processes the others.
The expected output will process ABC 56987K5239, 485P971259 SPTGER order numbers as 56987K5239, 485P971259 respectively. The process removal of unexpected characters are not important, rather how to achieve this during the run time of the derived column (stripping and processing the data in case of error).
If the valid order number always starts with a number, and the length of it equal to 10. You could use Script Component (Transformation) together with Regular Expression to transform the source data.
Drag and drop the Script Component as Transformation
Connect the source to the Script Component
From the Script Component Edit window, checked the Order from the Input columns, and make it as Read and Write
In the script, add:using System.Text.RegularExpressions;
The full code needs to be added in the Input process method:
string pattern = "[0-9].{9}";
Row.Order = Regex.Match(Row.Order, pattern).Groups[1].ToString();
The output going to the destination should be the matched 10 characters starting with the number.

SSIS Script Component - get raw row data in data flow

I am processing a flat file in SSIS and one of the requirements is that if a given row contains an incorrect number of delimiters, fail the row but continue processing the file.
My plan is to load the rows into a single column in SQL server, but during the load, I’d like to test each row during the data flow to see if it has the right number of delimiters, and add a derived column value to store the result of that comparison.
I’m thinking I could do that with a script task component, but I’m wondering if anyone has done that before and what would be the best method? If a script task component would be the way to go, how do I access the raw row with its delimiters inside the script task?
SOLUTION:
I ended up going with a modified version of Holder's answer as I found that TOKENCOUNT() will not count null values per this SO answer. When two delimiters are not separated by a value, it will result in an incorrect count (at least for my purposes).
I used the following expression instead:
LEN(EntireRow) - LEN(REPLACE(EntireRow, "|", ""))
This results in the correct count of delimiters in the row, regardless of whether there's a value in a given field or not.
My suggestion is to use Derrived Column to do your test
And then add a Conditional Split to decide if you want to insert the rows or not.
Something like this:
Use the TokenCount function in the Derrived Column box to get number of columns like this: TOKENCOUNT(EntireRow,"|")

SQL Server - Text column contains number between

Can someone tell me how I code in SQL Server so that I am looking in a varchar text column to see if it contains a numerical range within the text?
For example, I'm looking for columns that contain anything between 100000 and 999999. The column may have a value like
this field contains a number `567391`
so I want to select that one, but not if it had
this field contains a number `5391`
For your given example, you can check the digits:
where col like '%[^0-9][1-9][0-9][0-9][0-9][0-9][0-9][^0-9]%'
This is not a generic solution, but it works for your example. In general, parsing strings in SQL Server is difficult. It is better to extract the values you are interested in when loading the data, so the relevant values are correctly in their own columns.

is there a way to specify variable number of columns for a CSV uploader in solr?

i'm using the CSVupdateHandler to index CSV files into Solr. my Csv files have variable number of fields in every line ( eg 4 fields in line one 6 in line 2 and so on ... ).
line1:field1,field2,field3,field4
line2:field1,field2,field3,field4,field5,field6
line3:field1,field2,field3,field4
So is there a way to specify variable no of fieldnames ?? what i want it to do is to index 4 colums if there are four fields and index 6 if there are six. any other alternative way to achieve this is appreciated too :) thanks !
UPDATE :
let me describe the situation ....
i have a file with CSV data like show above. i use the fieldnames parameter to specify the field names that Solr has to use. since every LINE in my file does not have a Set number of CSValues i cannot have a standard header set for this file without me having to pad some lines with null values. Eg. when i upload the above file with 6 header fields defined lines 1 and 3 will throw an error and if i use 4 header fields line 2 throws an error.. iwant to know if there is a way to specify the header fields such that the above condition works ...or do i have to transform my file into eqal length fields with padded dummy values ??
What do you want column 5 and 6 map to? One way or another you need to let Solr know? In which case, you just do empty comas for missing items.
On the other hand, if you are trying to provide multi-values into a single field, than maybe you should have your field separators set to something else, and have comas as value separators.
Try thinking what you want Solr to see and work backwards from that.
solved this : specify custom fields with default values in schema.xml. to account for the extra two fields in some of the lines ! the schema.xml provided has plenty of examples !!
ALTERNATE : u can also define a custom updateRequestProcessor and add fields based on conditions using java . and specify this processor as a part of the update processor chain in your request handler.

Resources