Power Query M loop table / lookup via a self-join

Power Query M loop table / lookup via a self-join - loops

First of all I'm new to power query, so I'm taking the first steps. But I need to try to deliver sometime at work so I can gain some breathing time to learn.
I have the following table (example):
Orig_Item Alt_Item
5.7 5.10
79.19 79.60
79.60 79.86
10.10
And I need to create a column that will loop the table and display the final Alt_Item. So the result would be the following:
Orig_Item Alt_Item Final_Item
5.7 5.10 5.10
79.19 79.60 79.86
79.60 79.86 79.86
10.10
Many thanks

Actually, this is far too complicated for a first Power Query experience.
If that's what you've got to do, then so be it, but you should be aware that you are starting with a quite difficult task.
Small detail: I would expect the last Final_Item to be 10.10. According to the example, the Final_Item will be null if Alt_Item is null. If that is not correct, well that would be a nice first step for you to adjust the code below accordingly.
You can create a new blank query, copy and paste this code in the Advanced Editor (replacing the default code) and adjust the Source to your table name.
let
Source = Table.Buffer(Table1),
AddedFinal_Item =
Table.AddColumn(
Source,
"Final_Item",
each if [Alt_Item] = null
then null
else List.Last(
List.Generate(
() => [Final_Item = [Alt_Item], Continue = true],
each [Continue],
each [Final_Item =
Table.First(
Table.SelectRows(
Source,
(x) => x[Orig_Item] = [Final_Item]),
[Alt_Item = "not found"]
)[Alt_Item],
Continue = Final_Item <> "not found"],
each [Final_Item])))
in
AddedFinal_Item
This code uses function List.Generate to perform the looping.
For performance reasons, the table should always be buffered in memory (Table.Buffer), before invoking List.Generate.
List.Generate is one of the most complex Power Query functions.
It requires 4 arguments, each of which is a function in itself.
In this case the first argument starts with () and the other 3 with each (it should be clear from the outline above: they are aligned).
Argument 1 defines the initial values: a record with fields Final_Item and Continue.
Argument 2 is the condition to continue: if an item is found.
Argument 3 is the actual transformation in each iteration: the Source table is searched (with Table.SelectRows) for an Orig_Item equal to Alt_Item. This is wrapped in Table.First, which returns the first record (if any found) and accepts a default value if nothing found, in this case a record with field Alt_Item with value "not found", From this result the value of record field [Alt_Item] is returned, which is either the value of the first record, or "not found" from the default value.
If the value is "not found", then Continue becomes false and the iterations will stop.
Argument 4 is the value that will be returned: Final_Item.
List.Generate returns a list of all values from each iteration. Only the last value is required, so List.Generate is wrapped in List.Last.
Final remark: actual looping is rarely required in Power Query and I think it should be avoided as much as possible. In this case, however, it is a feasible solution as you don't know in advance how many Alt_Items will be encountered.
An alternative for List.Generate is using a resursive function.
Also List.Accumulate is close to looping, but that has a fixed number of iterations.

This can be solved simply with a self-join, the open question is how many layers of indirection you'll be expected to support.
Assuming just one level of indirection, no duplicates on Orig_Item, the solution is:
let
Source = #"Input Table",
SelfJoin1 = Table.NestedJoin( Source, {"Alt_Item"}, Source, {"Orig_Item"}, "_tmp_" ),
Expand1 = ExpandTableColumn( SelfJoin1, "_tmp_", {"Alt_Item"}, {"_lkp_"} ),
ChkJoin1 = Table.AddColumn( Expand1, "Final_Item", each (if [_lkp_] = null then [Alt_Item] else [_lkp_]), type number)
in
ChkJoin1
This is doable with the regular UI, using Merge Queries, then Expand Column and adding a custom column.
If yo want to support more than one level of indirection, turn it into a function to be called X times. For data-driven levels of indirection, you wrap the calls in a list.generate that drop the intermediate tables in a structured column, though that's a much more advanced level of PQ.

Related

MS Access, use query name as field default value

My department uses a software tool that can use a custom component library sourced from Tables or Queries in an MS Access database.
Table: Components
ID: AutoNumber
Type: String
Mfg: String
P/N: String
...
Query: Resistors
SELECT Components.*
FROM Components
WHERE Components.Type = "Resistors"
Query: Capacitors
SELECT Components.*
FROM Components
WHERE Components.Type = "Capacitors"
These queries work fine for SELECT. But when users add a row to the query, how can I ensure the correct value is saved to the Type field?

Edit #2:
Nope, can't be done. Sorry.
Edit #1:
As was pointed out, I may have misunderstood the question. It's not a wonky question after all, but perhaps an easy one?
If you're asking how to add records to your table while making sure that, for example, "the record shows up in a Resistors query if it's a Resistor", then it's a regular append query, that specifies Resisitors as your Type.
For example:
INSERT INTO Components ( ID, Type, Mfg )
SELECT 123, 'Resistors', 'Company XYZ'
If you've already tried that and are having problems, it could be because you are using a Reserved Word as a field name which, although it may work sometimes, can cause problems in unexpected ways.
Type is a word that Access, SQL and VBA all use for a specific purpose. It's the same idea as if you used SELECT and FROM as field or table names. (SELECT SELECT FROM FROM).
Here is a list of reserved words that should generally be avoided. (I realize it's labelled Access 2007 but the list is very similar, and it's surprisingly difficult to find an recent 'official' list for Excel VBA.)
Original Answer:
That's kind a a wonky way to do things. The point of databases is to organize in such a way as to prevent duplication of not only data, but queries and codes as well
I made up the programming rule for my own use "If you're doing anything more than once, you're doing it wrong." (That's not true in all cases but a general rule of thumb nonetheless.)
Are the only options "Resistors" and "Capacitors"? (...I hope you're not tracking the inventory of an electronics supply store...) If there are may options, that's even more reason to find an alternative method.
To answer your question, in the Query Design window, it is not possible to return the name of the open query.
Some alternative options:
As #Erik suggested, constrain to a control on a form. Perhaps have a drop-down or option buttons which the user can select the relevant type. Then your query would look like:
SELECT * FROM Components WHERE Type = 'Forms![YourFormName]![NameOfYourControl]'
In VBA, have the query refer to the value of a variable, foe example:
Dim TypeToDel as String
TypeToDel = "Resistor"
DoCmd.RunSQL "SELECT * FROM Components WHERE Type = '" & typeToDel'"
Not recommended, but you could have the user manually enter the criteria. If your query is like this:
SELECT * FROM Components WHERE Type = '[Enter the component type]'
...then each time the query is run, it will prompt:
Similarly, you could have the query prompt for an option, perhaps a single-digit or a code, and have the query choose the the appropriate criteria:
...and have an IF statement in the query criteria.
SELECT *
FROM Components
WHERE Type = IIf([Enter 1 for Resistors, 2 for Capacitors, 3 for sharks with frickin' laser beams attached to their heads]=1,'Resistors',IIf([Enter 1 for Resistors, 2 for Capacitors, 3 for sharks with frickin' laser beams attached to their heads]=2,'Capacitors','LaserSharks'));
Note that if you're going to have more than 2 options, you'll need to have the parameter box more than once, and they must be spelled identically.
Lastly, if you're still going to take the route of a separate query for each component type, as long as you're making separate queries anyway, why not just put a static value in each one (just like your example):
SELECT * FROM Components WHERE Type = 'Resistor'
There's another wonky answer here but that's just creating even more duplicate information (and more future mistakes).
Side note: Type is a reserved word in Access & VBA; you might be best to choose another. (I usually prefix with a related letter like cType.)
More Information:
Use parameters in queries, forms, and reports
Use parameters to ask for input when running a query
Microsoft Access Tips & Tricks: Parameter Queries
 • Frickin' Lasers

Talend avoid duplicate external ID with Salesforce Output

We are importing data on Salesforce through Talend and we have multiple items with the same internal id.
Such import fails with error "Duplicate external id specified" because of how upsert works in Salesforce. At the moment, we worked that around by using the commit size of the tSalesforceOutput to 1, but that works only for small amount of data or it would exhaust Salesforce API Limits.
Is there a known approach to it in Talend? For example, to ensure items that have same external ID ends up in different "commits" of tSalesforceOutput?

Here is the design for the solution I wish to propose:
tSetGlobalVar is here to initialize the variable "finish" to false.
tLoop starts a while loop with (Boolean)globalMap.get("finish") == false as an end condition.
tFileCopy is used to copy the initial file (A for example) to a new one (B).
tFileInputDelimited reads file B.
tUniqRow eliminates duplicates. Uniques records go to tLogRow you have to replace by tSalesforceOutput. Duplicates records if any go to tFileOutputDelimited called A (same name as the original file) with the option "Throw an error if the file already exist" unchecked.
OnComponent OK after tUniqRow activates the tJava which set the new value for the global finish with the following code:
if (((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0) globalMap.put("finish", true);
Explaination with the following sample data:
line 1
line 2
line 3
line 2
line 4
line 2
line 5
line 3
On the 1st iteration, 5 uniques records are pushed into tLogRow, 3 duplicates are pushed into file A and "finish" is not changed as there is duplicates.
On the 2nd iteration, operations are repeated for 2 uniques records and 1 duplicate.
On the 3rd iteration, operations are repeated for 1 unique and as there not anymore duplicate, "finish" is set to true and the loop automatically finishes.
Here is the final result:
You can also decide to use an other global variable to set the salesforce commit level (using the syntax (Integer)globalMap.get("commitLevel")). This variable will be set to 200 by default and to 1 in the tJava if any duplicates. At the same time, set "finish" to true (without testing the number of duplicates) and you'll have a commit level to 200 for the 1st iteration and to 1 for the 2nd (and no need more than 2 iterations).
You'll decide the better choice depending on the number of potential duplicates, but you can notice that you can do it whitout any change to the job design.
I think it should solve your problem. Let me know.
Regards,
TRF

Do you mean you have the same record (the same account for example) twice or more in the input?If so, can't you try to eliminate the duplicates and keep only the record you need to push to Salesforce?Else, if each record has specific informations (so you need all the input records to have a complete one in Salesforce), consider to merge the records before to push the result into Salesforce.
And finally, if you can't do that, push the doublons in a temporary space, push the records but the doublons into Salesforce and iterate other this process until there is no more doublons.Personally, if you can't just eliminate the doublons, I prefer the 2nd approach as it's the solution to have less Salesforce API calls.
Hope this helps.
TRF

Reduced Survey Frequency - Salesforce Workflow

Hoping you can help me review the logic below for errors. I am looking to create a workflow that will send a survey out to end users on a reduced frequency. Basically, it will check the Account object of the Case for a field, 'Reduced Survey Frequency', which contains a # and will not send a survey until that # of days has passed since the last date set on the Contact field 'Last Survey Date'. Please review the code and let me know any recommended changes!
AND( OR(ISPICKVAL(Status,"Closed"), ISPICKVAL(Status,"PM Sent")),
OR(CONTAINS(RecordType.Name,"Portal Case"),CONTAINS(RecordType.Name,"Standard Case"),
CONTAINS(RecordType.Name,"Portal Closed"),
CONTAINS(RecordType.Name,"Standard Closed")),
NOT( Don_t_sent_survey__c )
,
OR(((TODAY()- Contact.Last_Survey_Date__c) >= Account.Reduced_Survey_Frequency__c ),Account.Reduced_Survey_Frequency__c==0,
ISBLANK(Account.Reduced_Survey_Frequency__c),
ISBLANK(Contact.Last_Survey_Date__c)
))
Thanks,
Brian H.

Personally I prefer the syntax where && and || are used instead of AND(), OR()functions. It just reads bit nicer to me, no need to trace so many commas, keep track of indentation in the more complex logic... But if you're more used to this Excel-like flow - go for it. In the end it has to be readable for YOU.
Also I'd consider reordering this a bit - simple checks, most likely to fail first.
The first part - irrelevant to your question
Don't use RecordType.Name because these Names can be translated to say French and it will screw your logic up for users who will select non-English as their preferred language. Use RecordType.DeveloperName, it's safer.
CONTAINS - do you really have so many record types that share this part in their name? What's wrong with normal = comparison? You could check if the formula would be more readable with CASE() statement. Or maybe flip the logic if there are say 6 rec types and you've explicitly listed 4 (this might have to be reviewed though when you add new rec. type). If you find yourself copy-pasting this block of 4 checks frequently - consider making a helper formula field with it...
The second part
ISBLANK checks could be skipped if you'll properly use the "treat nulls as blanks / as zeroes" setting at the bottom of formula editor. Because you're making check like
OR(...,
Account.Reduced_Survey_Frequency__c==0,
ISBLANK(Account.Reduced_Survey_Frequency__c),
...
)
which is essentially what this thing was designed for. I'd flip it to "treat nulls as zeroes" (but that means the ISBLANK check will never "fire"). If you're not comfortable with that - you can also "safely compare or substract" by using
BLANKVALUE(Account.Reduced_Survey_Frequency__c,0)
Which will have the similar "treat null as zero" effect but only in this one place.
So... I'd end up with something like this:
(ISPICKVAL(Status,'Closed') || ISPICKVAL(Status, 'PM Sent')) &&
(RecordType.DeveloperName = 'Portal_Case' ||
RecordType.DeveloperName = 'Standard_Case' ||
RecordType.DeveloperName = 'Portal_Closed' ||
RecordType.DeveloperName = 'Standard_Closed'
) &&
NOT(Don_t_sent_survey__c) &&
(Contact.Last_Survey_Date__c + Account.Reduced_Survey_Frequency__c < TODAY())
No promises though ;)
You can easily test them by enabling debug logs. You'll see there the workflow formula together with values that are used to evaluate it.
Another option is to make a temporary formula field with same logic and observe (in a report?) where it goes true/false for mass spot check.

Entity Framework: Max. number of "subqueries"?

My data model has an entity Person with 3 related (1:N) entities Jobs, Tasks and Dates.
My query looks like
var persons = (from x in context.Persons
select new {
PersonId = x.Id,
JobNames = x.Jobs.Select(y => y.Name),
TaskDates = x.Tasks.Select(y => y.Date),
DateInfos = x.Dates.Select(y => y.Info)
}).ToList();
Everything seems to work fine, but the lists JobNames, TaskDates and DateInfos are not all filled.
For example, TaskDates and DateInfos have the correct values, but JobNames stays empty. But when I remove TaskDates from the query, then JobNames is correctly filled.
So it seems that EF can only handle a limited number of these "subqueries"? Is this correct? If so, what is the max. number of these "subqueries" for a single statement? Is there a way to work around these issue without having to make more than one call to the database?
(ps: I'm not entirely sure, but I seem to remember that this query worked in LINQ2SQL - could it be?)
UPDATE
I'm getting crazy about this. I tried to repro the issue from ground up using a fresh, simple project (to post the entire piece of code here, not only an oversimplified example) - and I found I wasn't able to repro it. It still happens within our existing code base (apparently there's more behind this problem, but I cannot share this closed code base, unfortunately).
After hours and hours of playing around I found the weirdest behavior:
It works great when I don't SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; before calling the LINQ statement
It also works great (independent of the above) when I don't use a .Take() to only get the first X rows
It also works great when I add an additional .Where() statements to cut the the number of rows returned from SQL Server
I didn't find any comprehensible reason why I see this behavior, but I started to look at the SQL: Although EF generates the exact same SQL, the execution plan is different when I use READ UNCOMMITTED. It returns more rows on a specific index in the middle of the execution plan, which curiously ends in less rows returned for the entire SQL statement - which in turn results in the missing data, that is the reason for my question to begin with.
This sounds very confusing and unbelievable, I know, but this is the behavior I see. I don't know what else to do, I don't even know what to google for at this point ;-).
I can fix my problem (just don't use READ UNCOMMITTED), but I have no idea why it occurs and if it is a bug or something I don't know about SQL Server. Maybe there's some "magic max number of allowed results in sub-queries" in SQL Server? At least: As far as I can see, it's not an issue with EF itself.

A little late, but does calling ToList() on each subquery produce the required effect?
var persons = (from x in context.Persons
select new {
PersonId = x.Id,
JobNames = x.Jobs.Select(y => y.Name.ToList()),
TaskDates = x.Tasks.Select(y => y.Date).ToList(),
DateInfos = x.Dates.Select(y => y.Info).ToList()
}).ToList();

Lua string library choices for finding and replacing text

I'm new to Lua programming, having come over from python to basically make a small addon for world of warcraft for a friend. I'm looking into various ways of finding a section of text from a rather large string of plain text. I need to extract the information from the text that I need and then process it in the usual way.
The string of text could be a number of anything, however the below is what we are looking to extract and process
-- GSL --
items = ["itemid":"qty" ,"itemid":"qty" ,"itemid":"qty" ,]
-- ENDGSL --
We want to strip the whole block of text from a potentially large block of text surrounding it, then remove the -- GSL -- and -- ENDGSL -- to be left with:
items = ["itemdid":"qty …
I've looked into various methods, and can't seem to get my head around any of them.
Anyone have any suggestions on the best method to tackle this problem?
EDIT: Additional problem,
Based on the accepted answer I've changed the code slightly to the following.
function GuildShoppingList:GUILDBANKFRAME_OPENED()
-- Actions to be taken when guild bank frame is opened.
if debug == "True" then self:Print("Debug mode on, guild bank frame opened") end
gslBankTab = GetCurrentGuildBankTab()
gslBankInfo = GetGuildBankText(gslBankTab)
p1 = gslBankInfo:match('%-%- GSL %-%-%s+(.*)%s+%-%- ENDGSL %-%-')
self:Print(p1)
end
The string has now changed slightly the information we are parsing is
{itemid:qty, itemid:qty, itemid:qty, itemid:qty}
Now, this is a string that's being called in p1. I need to update the s:match method to strip the { } also, and iterate over each item and its key seperated by, so I'm left with
itemid:qty
itemid:qty
itemid:qty
itemid:qty
Then I can identify each line individually and place it where it needs to go.

try
s=[[-- GSL --
items = ["itemid":"qty" ,"itemid":"qty" ,"itemid":"qty" ,]
-- ENDGSL --]]
print(s:match('%-%- GSL %-%-%s+(.*)%s+%-%- ENDGSL %-%-'))
The key probably is that - is a pattern modifier that needs quoting if you want a literal hyphen. More info on patterns in the Lua Reference Manual, chapter 5.4.1
Edit:
To the additional problem of looping through keys of what is almost an array, you could do 2 things:
Either loop over it as a string, assuming both key and quantity are integers:
p="{1:10, 2:20, 3:30}"
for id,qty in p:gmatch('(%d+):(%d+)') do
--do something with those keys:
print(id,qty)
end
Or slightly change the string, evaluate it as a Lua table:
p="{1:10, 2:20, 3:30}"
p=p:gsub('(%d+):','[%1]=') -- replace : by = and enclose keys with []
t=loadstring('return '..p)() -- at this point, the anonymous function
-- returned by loadstring get's executed
-- returning the wanted table
for k,v in pairs(t) do
print(k,v)
end
If the formats of keys or quantities is not simply integer, changing it in the patterns should be trivial.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Power Query M loop table / lookup via a self-join - loops

Related

MS Access, use query name as field default value

Talend avoid duplicate external ID with Salesforce Output

Reduced Survey Frequency - Salesforce Workflow

Entity Framework: Max. number of "subqueries"?

Lua string library choices for finding and replacing text

Categories

Resources