How concatenate 2 column that contains multiple rows of data - concatenation

I work with Talend 6.3. I want to concatenate 2 columns in tmap. But in specifics rows, there are multiple data and i want match them the first with the first of the row in the other column.
Example :
2 columns : Name and Surname
In Name, I have : Kevin,Zoe,Alan
In Surname, I have : Monta,Rey,Zom
I want an other row that concatenates Kevin with Monta, Zoe with Rey, Alan with Zom.
How do that with talend? Because in tmap if I concatenate classic, I will have only one success concatenation.
I don't know if I explained that correctly but tell me if someone need more information.
Thanks in advance
The job :
Add other data -> Login is the ID

So we have a flow with an ID and 2 columns (Name and Surname), each contain n elements.
n can vary from row to row.
The goal is to have a final flow with the concatenation of all Name + Surname with the ID intact.
It's not a super Talend-y way, but you can use tFlowToIterate to access each row individually and do the pairing of Name + Surname.
After, we access the resulting list and use tNormalize to split it :
Code for the tJava component :
List<String> nameList = Arrays.asList(((String)globalMap.get("row5.Name")).split("\\s*,\\s*"));
List<String> surnameList = Arrays.asList(((String)globalMap.get("row5.Surname")).split("\\s*,\\s*"));
for (int index = 0; index < nameList.size(); index++) {
((ArrayList<String>) context.concat).add(((Integer)globalMap.get("row5.id")) + ";" + nameList.get(index) + ";" + surnameList.get(index));
}
Code for the tFixedFlowInput "Use context (list)" :
StringHandling.EREPLACE(StringHandling.EREPLACE(context.concat.toString(),"\\[",""),"\\]","")
Result :

Sorry like that :
The concatenation work when i have only one data in the column. Only Kevin and only Monta in cels.
Maybe it's cause of my tDernormalize in the schema.

That the code to set the global var with the list
That the code to get my list
And i have this result (the first have organisation empty, the first and the second have to have santeffi, don't look the first column, the second is the good one)
that data before javaRow (code and libelle are in different column)

Related

Not able to match string id in cypher

I loaded my dataset by csv file name moviestoactors.csv with this particular command
LOAD CSV FROM 'file:///desktop-csv-import/moviestoactors.csv' AS row
WITH row[0] AS movieId, row[1] AS actorId, row[2] AS as_character, row[3] AS leading
MERGE (m:moviestoactors {movieId:movieId })
SET m.actorId = actorId, m.as_character = as_character, m.leading = leading
RETURN count(m)
The first problem i faced was that i was not able to load actorId as toInteger(row[1]) as actorId since it was showing syntax error like it is out of bound since the value of integer is too much large for integer to handle, and i did not find any solution for that so i decided to load actor id as string.
Now if i want to match some id with anysort of id, i am not able to i dont know what is it but it is not working properly even though i checked my csv file and it had that id 9 times,
for instance this is actorId 244663, this is my query which is not working, what should work then?
MATCH(ma:moviestoactors)
WITH ma.movieId AS movieId WHERE ma.actorId = '244663'
return movieId
To summarize the whole thing, first problem is how to get my id to be in integer form and the second thing if it is not in integer why it is not even matching in string form.

AT NEW with substring access?

I have a solution that includes a LOOP which I would like to spare. So I wonder, whether you know a better way to do this.
My goal is to loop through an internal, alphabetically sorted standard table. This table has two columns: a name and a table, let's call it subtable. For every subtable I want to do some stuff (open an xml page in my xml framework).
Now, every subtable has a corresponding name. I want to group the subtables according to the first letter of this name (meaning, put the pages of these subtables on one main page -one main page for every character-). By grouping of subtables I mean, while looping through the table, I want to deal with the subtables differently according to the first letter of their name.
So far I came up with the following solution:
TYPES: BEGIN OF l_str_tables_extra,
first_letter(1) TYPE c,
name TYPE string,
subtable TYPE REF TO if_table,
END OF l_str_tables_extra.
DATA: ls_tables_extra TYPE l_str_tables_extra.
DATA: lt_tables_extra TYPE TABLE OF l_str_tables_extra.
FIELD-SYMBOLS: <ls_tables> TYPE str_table."Like LINE OF lt_tables.
FIELD-SYMBOLS: <ls_tables_extra> TYPE l_str_tables_extra.
*"--- PROCESSING LOGIC ------------------------------------------------
SORT lt_tables ASCENDING BY name.
"Add first letter column in order to use 'at new' later on
"This is the loop I would like to spare
LOOP AT lt_tables ASSIGNING <ls_tables>.
ls_tables_extra-first_letter = <ls_tables>-name+0(1). "new column
ls_tables_extra-name = <ls_tables>-name.
ls_tables_extra-subtable = <ls_tables>-subtable.
APPEND ls_tables_extra TO lt_tables_extra.
ENDLOOP.
LOOP AT lt_tables_extra ASSIGNING <ls_tables_extra>.
AT NEW first_letter.
"Do something with subtables with same first_letter.
ENDAT.
ENDLOOP.
I wish I could use
AT NEW name+0(1)
instead of
AT NEW first_letter
, but offsets and lengths are not allowed.
You see, I have to inlcude this first loop to add another column to my table which is kind of unnecessary because there is no new info gained.
In addition, I am interested in other solutions because I get into trouble with the framework later on for different reasons. A different way to do this might help me out there, too.
I am happy to hear any thoughts about this! I could not find anything related to this here on stackoverflow, but I might have used not optimal search terms ;)
Maybe the GROUP BY addition on LOOP could help you in this case:
LOOP AT i_tables
INTO DATA(wa_line)
" group lines by condition
GROUP BY (
" substring() because normal offset would be evaluated immediately
name = substring( val = wa_line-name len = 1 )
) INTO DATA(o_group).
" begin of loop over all tables starting with o_group-name(1)
" loop over group object which contains
LOOP AT GROUP o_group
ASSIGNING FIELD-SYMBOL(<fs_table>).
" <fs_table> contains your table
ENDLOOP.
" end of loop
ENDLOOP.
why not using a IF comparison?
data: lf_prev_first_letter(1) type c.
loop at lt_table assigning <ls_table>.
if <ls_table>-name(1) <> lf_prev_first_letter. "=AT NEW
"do something
lf_prev_first_letter = <ls_table>-name(1).
endif.
endloop.

D3.js unknown number of columns and rows

I m currently creating a chart(data from an external csv file) but I dont know beforehand the number of columns and rows. Could you maybe point me in the right direction as to where I could find some help(or some examples) with this issue?
Thank you
d3.csv can help you here:
d3.csv('myCSVFile.csv', function(data){
//the 'data' argument will be an array of objects, one object for each row so...
var numberOfRows = data.length, // we can easily get the number of rows (excluding the title row)
columns = Object.keys( data[0] ), // then taking the first row object and getting an array of the keys
numberOfCOlumns = columns.length; // allows us to get the number of columns
});
Note that this method assumes that the first row (and only the first row) of your spreadsheet is column titles.
In addition to Tom P's advice, it's worth noting that version 4 of D3 introduced a columns property, which you can use to create an array of column headers (i.e. the dataset's 'keys').
This is useful because (a) it's simpler code and (b) the headers in the array are in the same order that they appear in the dataset.
So, for the above dataset:
headers = data.columns
... creates the same array as:
headers = Object.keys(data[0])
... but the array of column names is in a predictable order.

How can I add fields to a clientdataset at runtime?

I have a TClientDataSet, which is provided by a TTableā€™s dataset.
The dataset has two fields: postalcode (string, 5) and street (string, 20)
At runtime I want to display a third field (string, 20). The routine of this field is getting the postalcode as a parameter and gives back the city belongs to this postalcode.
The problem is only about adding a calculated field to the already existing ones. Filling the data itself is not the problem.
I tried:
cds.SetProvider(Table1);
cds.FieldDefs.Add('city', ftString, 20);
cds.Open;
cds.Edit;
cds.FieldByName('city').AsString := 'Test'; // --> errormessage (field not found)
cds.Post;
cds is my clientdataset, Table1 is a paradox Table, but the problem is the same with other databases.
Thanks in advance
If you want to add additional fields other than those exist in the underlying data, you need to also add the existing fields manually as well. The dataset needs to be closed when you're adding fields, but you can have the necessary metadata with FieldDefs.Update if you don't want to track all field details manually. Basically something like this:
var
i: Integer;
Field: TField;
begin
cds.SetProvider(Table1);
// add existing fields
cds.FieldDefs.Update;
for i := 0 to cds.FieldDefs.Count - 1 do
cds.FieldDefs[i].CreateField(cds);
// add calculated field
Field := TStringField.Create(cds);
Field.FieldName := 'city';
Field.Calculated := True;
Field.DataSet := cds;
cds.Open;
end;
Also see this excellent article by Cary Jensen.
Well i found a simpler solution, as i have 24 fields in my sql i didnt wanted to add them all manually so i added a dummy field to the sql statement instead like:
select ' ' as city, the rest of the fields ...
which i can modify in my program OnAfterOpen event.
Well i had to define in the sql how long that field should be by leaving enough empty spaces, for instance 5 empty spaces for 5 characters, so i must know how long the city name could be.
You should use CreateDataset after add field:
cds.SetProvider(Table1);
cds.FieldDefs.Add('city', ftString, 20);
cds.CreateDataset;
cds.Open;
cds.Edit;
cds.FieldByName('city').AsString := 'Test';
cds.Post;
Would like to share more accurate Query for unexisting fields. I bet it's better to use cast, neither spaces!
select E.NAME, E.SURNAME, cast(null as varchar(20)) as CITY
from EMPLOYEE E
e.g. | Marc'O | Polo | <NULL> |
It's more accurate, can definetly see field size, understandable, easy, safe!
if you want to combine already existing "dynamic" data fields (from provider side) with additional client side persistent fields (calculated, lookup, internalcalc, aggregate) you should subclass CDS. just introduce extra boolean property CombineFields and either override BindFields (in newer delphi versions) or the entire InternalOpen (as I did in d2006/2007) with the following line
if DefaultFields or CombineFields then CreateFields; { TODO -ovavan -cSIC : if CombineFields is true then persistent fields will coexist with Default ones }
that will allow you to avoid all that runtime mess with FieldDefs/CreateField

Best database design (model) for user tables

I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?
Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()
Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.
You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.
This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.
Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;

Resources