We are importing data on Salesforce through Talend and we have multiple items with the same internal id.
Such import fails with error "Duplicate external id specified" because of how upsert works in Salesforce. At the moment, we worked that around by using the commit size of the tSalesforceOutput to 1, but that works only for small amount of data or it would exhaust Salesforce API Limits.
Is there a known approach to it in Talend? For example, to ensure items that have same external ID ends up in different "commits" of tSalesforceOutput?
Here is the design for the solution I wish to propose:
tSetGlobalVar is here to initialize the variable "finish" to false.
tLoop starts a while loop with (Boolean)globalMap.get("finish") == false as an end condition.
tFileCopy is used to copy the initial file (A for example) to a new one (B).
tFileInputDelimited reads file B.
tUniqRow eliminates duplicates. Uniques records go to tLogRow you have to replace by tSalesforceOutput. Duplicates records if any go to tFileOutputDelimited called A (same name as the original file) with the option "Throw an error if the file already exist" unchecked.
OnComponent OK after tUniqRow activates the tJava which set the new value for the global finish with the following code:
if (((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0) globalMap.put("finish", true);
Explaination with the following sample data:
line 1
line 2
line 3
line 2
line 4
line 2
line 5
line 3
On the 1st iteration, 5 uniques records are pushed into tLogRow, 3 duplicates are pushed into file A and "finish" is not changed as there is duplicates.
On the 2nd iteration, operations are repeated for 2 uniques records and 1 duplicate.
On the 3rd iteration, operations are repeated for 1 unique and as there not anymore duplicate, "finish" is set to true and the loop automatically finishes.
Here is the final result:
You can also decide to use an other global variable to set the salesforce commit level (using the syntax (Integer)globalMap.get("commitLevel")). This variable will be set to 200 by default and to 1 in the tJava if any duplicates. At the same time, set "finish" to true (without testing the number of duplicates) and you'll have a commit level to 200 for the 1st iteration and to 1 for the 2nd (and no need more than 2 iterations).
You'll decide the better choice depending on the number of potential duplicates, but you can notice that you can do it whitout any change to the job design.
I think it should solve your problem. Let me know.
Regards,
TRF
Do you mean you have the same record (the same account for example) twice or more in the input?If so, can't you try to eliminate the duplicates and keep only the record you need to push to Salesforce?Else, if each record has specific informations (so you need all the input records to have a complete one in Salesforce), consider to merge the records before to push the result into Salesforce.
And finally, if you can't do that, push the doublons in a temporary space, push the records but the doublons into Salesforce and iterate other this process until there is no more doublons.Personally, if you can't just eliminate the doublons, I prefer the 2nd approach as it's the solution to have less Salesforce API calls.
Hope this helps.
TRF
Related
I'm developing an app which needs to record a list of a users recent video uploads. Importantly it needs to only remember the last two videos associated with the user so I'm trying to find a way to just keep the last two records in a database.
What I've got so far is the below, which creates a new record correctly, however I then want to delete all records that are older than the previous 2, so I've got the below.
The problem is that this seems to delete ALL records even though, by my understanding, the skip should miss out the two most recent records,
private function saveVideoToUserProfile($userId, $thumb ...)
{
RecentVideos::create([
'user_id'=>$userId,
'thumbnail'=>$thumb,
...
]);
RecentVideos::select('id')->where('user_id', $userId)->orderBy('created_at')->skip(2)->delete();
}
Can anyone see what I'm doing wrong?
Limit and offset do not work with delete, so you can do something like this:
$ids = RecentVideos::select('id')->where('user_id', $userId)->orderByDesc('created_at')->skip(2)->take(10000)->pluck('id');
RecentVideos::whereIn('id', $ids)->delete();
First off, skip() does not skip the x number of recent records, but rather the x number of records from the beginning of the result set. So in order to get your desired result, you need to sort the data in the correct order. orderBy() defaults to ordering ascending, but it accepts a second direction argument. Try orderBy('created_at', 'DESC'). (See the docs on orderBy().)
This is how I would recommend writing the query.
RecentVideos::where('user_id', $userId)->orderBy('created_at', 'DESC')->skip(2)->delete();
When we fetch data from db with skip and limit then it's very high chance that data may be redundant.
Let me explain you with an example
suppose you are fetching those student records which belongs to some state x and you already fetched 10 student records. Between the time of first and second request one more student record is inserted, deleted or updated then in next query either one data row will come again or inserted data row will be skiped.
How to solve such case?
Method - 1
It can be resolved by using sending 'Created_by' and 'Updated_by' time from UI and filter the data according to it and send the data.
Method - 2
In the second fetch request and there after skip one less then you want and increase limit by 1(Yes, its correct, think about it), and pass those data to UI now UI check the there current list data's last item should match the first item of fetched request's response(just compare by ids), then it mean that no single data added or removed after first query. If it's not match then fetch complete data from first to last in a single query.
If you use individual method then its not rock solid(Yes, some corner case will miss in each case) but if you combine both methods then it will be rock solid.
First of all I'm new to power query, so I'm taking the first steps. But I need to try to deliver sometime at work so I can gain some breathing time to learn.
I have the following table (example):
Orig_Item Alt_Item
5.7 5.10
79.19 79.60
79.60 79.86
10.10
And I need to create a column that will loop the table and display the final Alt_Item. So the result would be the following:
Orig_Item Alt_Item Final_Item
5.7 5.10 5.10
79.19 79.60 79.86
79.60 79.86 79.86
10.10
Many thanks
Actually, this is far too complicated for a first Power Query experience.
If that's what you've got to do, then so be it, but you should be aware that you are starting with a quite difficult task.
Small detail: I would expect the last Final_Item to be 10.10. According to the example, the Final_Item will be null if Alt_Item is null. If that is not correct, well that would be a nice first step for you to adjust the code below accordingly.
You can create a new blank query, copy and paste this code in the Advanced Editor (replacing the default code) and adjust the Source to your table name.
let
Source = Table.Buffer(Table1),
AddedFinal_Item =
Table.AddColumn(
Source,
"Final_Item",
each if [Alt_Item] = null
then null
else List.Last(
List.Generate(
() => [Final_Item = [Alt_Item], Continue = true],
each [Continue],
each [Final_Item =
Table.First(
Table.SelectRows(
Source,
(x) => x[Orig_Item] = [Final_Item]),
[Alt_Item = "not found"]
)[Alt_Item],
Continue = Final_Item <> "not found"],
each [Final_Item])))
in
AddedFinal_Item
This code uses function List.Generate to perform the looping.
For performance reasons, the table should always be buffered in memory (Table.Buffer), before invoking List.Generate.
List.Generate is one of the most complex Power Query functions.
It requires 4 arguments, each of which is a function in itself.
In this case the first argument starts with () and the other 3 with each (it should be clear from the outline above: they are aligned).
Argument 1 defines the initial values: a record with fields Final_Item and Continue.
Argument 2 is the condition to continue: if an item is found.
Argument 3 is the actual transformation in each iteration: the Source table is searched (with Table.SelectRows) for an Orig_Item equal to Alt_Item. This is wrapped in Table.First, which returns the first record (if any found) and accepts a default value if nothing found, in this case a record with field Alt_Item with value "not found", From this result the value of record field [Alt_Item] is returned, which is either the value of the first record, or "not found" from the default value.
If the value is "not found", then Continue becomes false and the iterations will stop.
Argument 4 is the value that will be returned: Final_Item.
List.Generate returns a list of all values from each iteration. Only the last value is required, so List.Generate is wrapped in List.Last.
Final remark: actual looping is rarely required in Power Query and I think it should be avoided as much as possible. In this case, however, it is a feasible solution as you don't know in advance how many Alt_Items will be encountered.
An alternative for List.Generate is using a resursive function.
Also List.Accumulate is close to looping, but that has a fixed number of iterations.
This can be solved simply with a self-join, the open question is how many layers of indirection you'll be expected to support.
Assuming just one level of indirection, no duplicates on Orig_Item, the solution is:
let
Source = #"Input Table",
SelfJoin1 = Table.NestedJoin( Source, {"Alt_Item"}, Source, {"Orig_Item"}, "_tmp_" ),
Expand1 = ExpandTableColumn( SelfJoin1, "_tmp_", {"Alt_Item"}, {"_lkp_"} ),
ChkJoin1 = Table.AddColumn( Expand1, "Final_Item", each (if [_lkp_] = null then [Alt_Item] else [_lkp_]), type number)
in
ChkJoin1
This is doable with the regular UI, using Merge Queries, then Expand Column and adding a custom column.
If yo want to support more than one level of indirection, turn it into a function to be called X times. For data-driven levels of indirection, you wrap the calls in a list.generate that drop the intermediate tables in a structured column, though that's a much more advanced level of PQ.
I have a question in regards to preparing my dataset for research.
I have a dataset in SPSS 20 in long format as I am researching on individual level over multiple years. However some individuals were added twice to my dataset because there were differences in some variables matched to those individuals (5000 individuals with 25 variables per individual). I would like to merge those duplicates so that I can run my analysis over time. For those variables that differ between the duplicates I would like spss to make additional variables when all the duplicates are merged.
Is this at all possible and if yes HOW?
I suggest following steps>
create auxiliary variable "PrimaryLast" with procedure Data->Identify Duplicate Cases by... , set "Define matching cases by" to your case ID
create 2 new auxiliary datasets with Data->Select Cases with condition "PrimaryLast = 0" and "PrimaryLast = 1" and selection "Copy selected cases to new dataset"
merge both auxiliary datasets with procedure Data -> Merge Files-> Add Variables, rename duplicated variable names in left box and move them in right box and select your case ID as key
don't forget to control if you made "full outer join", in case you lost non-duplicated cases and have only duplicated cases in your dataset, just merge datasets from step 2. in different order in step 3.
Try this:
sort cases by caseID otherVar.
compute ind=1.
if $casenum>1 and caseID=lag(caseID) ind=lag(ind)+1.
casestovars /id=caseID /index=ind.
If a caseID is repeated more then once, after restructure there will be only one line for that case, while all the variables will be repeated with indexes.
If the order of the caseID repeats, replace the otherVar in the sort command with the corresponding variable (e.g. date). This way your new variables will also be indexed accordingly.
I have a student who has gathered data in a survey online whereby each response was given a variable, rather than the variable having whatever the response was. We need a scoring algorithm which reads the statements and integrates. I can do this with IF statements per item, e.g.,
if Q1_1=1 var1=1.
if Q1_2=1 var1=2.
if Q1_3=1 var1=3.
if Q1_4=1 var1=4.
Doing this for a 200 item survey (now more like 1000) will be a drag and subject to many typos unless automated. I have no experience of vectors and loops in SPSS, but some reading suggests this is the way to approach the problem.
I would like to run if statements as something like (pseudocode):
for items=1 1 to 30
for responses=1 to 4
if Q1_2_1=1 a=1.
if Q1_2=1 a=2.
if Q1_3=1 a=3.
if Q1_4=1 a=4.
compute newitem(items)=a.
next response.
next item.
Which I would hope would produce a new variable (newitem1 to 30) which has one of the 4 responses for it's original corresponding 4 variable information.
Never written serious spss code before: please advise!
This will do the Job:
* creating some sample data.
data list free (",")/Item1_1 to Item1_4 Item2_1 to Item2_4 Item3_1 to Item3_4.
begin data
1,,,,,1,,,,,1,,
,1,,,1,,,,1,,,,
,,,1,,,1,,,,,1,
end data.
* now looping over the items and constructing the "NewItems".
do repeat Item1=Item1_1 to Item1_4
/Item2=Item2_1 to Item2_4
/Item3=Item3_1 to Item3_4
/Val=1 to 4.
if Item1=1 NewItem1=Val.
if Item2=1 NewItem2=Val.
if Item3=1 NewItem3=Val.
end repeat.
execute.
In this way you run all you loops simultaneously.
Note that "ItemX_1 to ItemX_4" will only work if these four variables are consecutive in the dataset. If they aren't, you have to name each of them separately - "ItemX_1 ItemX_2 ItemX_3 itemX_4".
Now if you have many such item sets, all named regularly as in the example, the following macro can shorten the process:
define !DoItems (ItemList=!cmdend)
!do !Item !in (!ItemList)
do repeat !Item=!concat(!Item,"_1") !concat(!Item,"_2") !concat(!Item,"_3") !concat(!Item,"_4")/Val=1 2 3 4.
if !item=1 !concat("New",!Item)=Val.
end repeat.
!doend
execute.
!enddefine.
* now you just have to call the macro and list all your Item names:
!DoItems ItemList=Item1 Item2 Item3.
The macro will work with any item name, as long as the variables are named ItemName_1, ItemName_2 etc'.