C: Regular expression optimization

C: Regular expression optimization | Saving state - c

Lets suppose i'd like to search through "AAAAAAAABBBBBBB" and "AAAAAAAACCCCCCCCCCC".
I search for a pattern "(AB|AC)".
Is there a way to saving the state of the search after searching the "AAAAAAAA" part and then continuing with the [B..] and [C..] part seperatly? so i need to search in [A..] only once.
i write a short pseudo code example to be more clear.
step one:
pattern = "(AB|AC)"
match("AAAAAAAA", pattern)
save_state()
step two:
match("BBBBBBB", pattern)
should the find match "AB"
step three:
restore_state()
match("CCCCCCCCCCC", pattern)
should find match "AC"

If you use a regex flavor that uses a (real) NFA/DFA approach (like RE2), you don't have to store something since every input character is (or should be) used only once (this could not get better).
If your flavor uses a backtracking algorithm, you might have luck. Some of these engines allow you to introduce a non-backtracking (aka possessive) part by using (?>x) or x{1}+ (where {1} could be any quantifier)
So in your case, it could be (if it's allowed)
(?>A)(B|C)
or
A{1}+(B|C)

Related

Can't get Logic App Contains to work with array or comma separated string

I'm trying to look for specific keywords inside of text from a for each loop.
var text = "The lazy fox jumped over the brown dog."
var keywords = "fox,dog,sun";
If true, I want to do something with the text. If false, I want to ignore the text.
Does anyone know how to use an Array filter, Function, Select, Condition or inline code to check for this? If so, specific examples would be great.
By the way, I have a C# function that handles this extremely well in an ASP.net Core app.
UPDATE 1:
This doesn't work.
UPDATE 2:
The Condition is always false after the for each loop even after changing the settings and parallelism to 1.
Azure Logic App Condition does not work in loop if based on changing values
Thanks in advance!

There are so many ways to achieve what you need. Here are the 3 options that came to my mind within a minute.
The first one does use a For each loop, but I wouldn't recommend using it as it's not very efficient.
The For each parameter looks like this:
The Condition parameter looks like this:
The second option is much easier - no need for a loop, just filter the array straight away, then you can check whether it's empty or it has some items:
The Filter array parameters look as follows.
The split function is identical to the one used in option 1.
If you know JavaScript, you might decide to use regular expressions in inline code instead, e.g.:
Then you'd just need to check the output of the inline code. JavaScript code used in the example above:
var text = workflowContext.actions.Compose_text.outputs;
var keywords = workflowContext.actions.Compose_keywords.outputs;
return text.match(new RegExp("(" + keywords.split(",").join("|") + ")", "gi"));
My personal preference is option 2. However, please note that all 3 options above would find "sun" in text "The weather was sunny" even though there's no word "sun" in the text. If you do need "sun" to match only word "sun" - not "sunny", "asunder" or "unsung" - then go for option 3, just use a different, more complex regular expression.

One of the workaround would be use of Condition Connector. I have initialized the sentence in a string and then used Condition Connector which will be checking the conditions.
Finally, In the true section you can add the connectors accordingly.

Placing a Compose behind the for each loop and referencing the Output in the Condition is what finally worked for me. I used the toLower() function in my Compose. The Compose looks like this.
toLower(items('For_each_2')?['day']?['longPhrase'])

How do I accommodate for change of "targets" in a GOAP AI model?

Say the agent is looking to perform a series of actions requiring different "targets" (picking up an item, eating a food, etc.). The way we chose to implement this is for each agent to store its current target as a field which can then be represented as key-value state (along with the other state) to be fed to the GOAP planner.
The problem arises if a series of actions requires the agent to let's say first eat a mushroom m and then go pick up a sword s. Ideally, the planner might find an action path similar to this:
locate m -> go to m -> pick up m -> eat m -> locate s -> go to s -> pick up s
Of course, we would like to generalize our actions as much as possible, so our current design has actions like goTo, pickUp, eat, etc. generalized to simply trust the preceding "locate x" action to have located a valid target.
In other words, locate x will have a promise state of target == x whereas an action like goTo will have the required state of hasTarget == true and a promise state of isNearTarget == true. A similar "generalized" set of requirements and goals are present for pickUp. The eat action will then have a requirement akin to holdingTarget == true and target == Food, while also setting target to null after the food has been consumed.
The big problem then is that what happens when m is eaten? How can the planner know that the next thing to locate is a sword and not something else? How can this be represented in GOAP-states in a way that ensures that the following actions will behave as expected?
One idea that came up was to divide actions into 3 categories:
Designating - Actions that promise to set the target to a thing (i.e. locateFood)
Intermediary - Actions that make generalized target promises (i.e. goTo)
Terminal - Actions that "consume" the target, nulling it (i.e. eat)
This approach then comes with the question of knowing what actions are terminal and which aren't, which seems like a nasty problem on its own.
I'm sorry if this is too abstracted and hard to understand - I'm trying to generalize the problem away from our specific code since I don't think it's something specific to our implementation, but likely a misunderstanding on our part of how state is supposed to be represented in GOAP. I can provide code as well as any clarification if needed.

First of all, it would be good if you showed some code.
Second, I'm hoping you've already looked at this goap demo
This should answer your question. Preconditions are to be met before an action is presented. So for example, if you require the AI to eat the muchroom m before picking up a sword, I would do something like this:
Eat mushroom action:
effect: "mushroomEaten" == true
Pick up sword action:
precondition: if "mushroomEaten" == true
then
effect: "goPickUpSword"
I have to generalize since your question is also general with no specific code example given. Look at the link provided and you will understand how actions can be chained together to accomplish a goal.

Is there YAML syntax for sharing part of a list or map?

So, I know I can do something like this:
sitelist: &sites
- www.foo.com
- www.bar.com
anotherlist: *sites
And have sitelist and anotherlist both contain www.foo.com and www.bar.com. However, what I really want is for anotherlist to also contain www.baz.com, without having to repeat www.foo.com and www.baz.com.
Doing this gives me a syntax error in the YAML parser:
sitelist: &sites
- www.foo.com
- www.bar.com
anotherlist: *sites
- www.baz.com
Just using anchors and aliases it doesn't seem possible to do what I want without adding another level of substructure, such as:
sitelist: &sites
- www.foo.com
- www.bar.com
anotherlist:
- *sites
- www.baz.com
Which means the consumer of this YAML file has to be aware of it.
Is there a pure YAML way of doing something like this? Or will I have to use some post-YAML processing, such as implementing variable substitution or auto-lifting of certain kinds of substructure? I'm already doing that kind of post-processing to handle a couple of other use-cases, so I'm not totally averse to it. But my YAML files are going to be written by humans, not machine generated, so I would like to minimise the number of rules that need to be memorised by my users on top of standard YAML syntax.
I'd also like to be able to do the analogous thing with maps:
namedsites: &sites
Foo: www.foo.com
Bar: www.bar.com
moresites: *sites
Baz: www.baz.com
I've had a search through the YAML spec, and couldn't find anything, so I suspect the answer is just "no you can't do this". But if anyone has any ideas that would be great.
EDIT: Since there have been no answers, I'm presuming that no one has spotted anything I haven't in the YAML spec and that this can't be done at the YAML layer. So I'm opening up the question to idea for post-processing the YAML to help with this, in case anyone finds this question in future.

The merge key type is probably what you want. It uses a special << mapping key to indicate merges, allowing an alias to a mapping (or a sequence of such aliases) to be used as an initializer to merge into a single mapping. Additionally, you can still explicitly override values, or add more that weren't present in the merge list.
It's important to note that it works with mappings, not sequences as your first example. This makes sense when you think about it, and your example looks like it probably doesn't need to be sequential anyway. Simply changing your sequence values to mapping keys should do the trick, as in the following (untested) example:
sitelist: &sites
? www.foo.com # "www.foo.com" is the key, the value is null
? www.bar.com
anotherlist:
<< : *sites # merge *sites into this mapping
? www.baz.com # add extra stuff
Some things to notice. Firstly, since << is a key, it can only be specified once per node. Secondly, when using a sequence as the value, the order is significant. This doesn't matter in the example here, since there aren't associated values, but it's worth being aware.

As the previous answers have pointed out, there is no built-in support for extending lists in YAML. I am offering yet another way to implement it yourself. Consider this:
defaults: &defaults
sites:
- www.foo.com
- www.bar.com
setup1:
<<: *defaults
sites+:
- www.baz.com
This will be processed into:
defaults:
sites:
- www.foo.com
- www.bar.com
setup1:
sites:
- www.foo.com
- www.bar.com
- www.baz.com
The idea is to merge the contents of a key ending with a '+' to the corresponding key without a '+'. I implemented this in Python and published here.

(Answering my own question in case the solution I'm using is useful for anyone who searches for this in future)
With no pure-YAML way to do this, I'm going to implement this as a "syntax transformation" sitting between the YAML parser and the code that actually uses the configuration file. So my core application doesn't have to worry at all about any human-friendly redundancy-avoidance measures, and can just act directly on the resulting structures.
The structure I'm going to use looks like this:
foo:
MERGE:
- - a
- b
- c
- - 1
- 2
- 3
Which would be transformed to the equivalent of:
foo:
- a
- b
- c
- 1
- 2
- 3
Or, with maps:
foo:
MERGE:
- fork: a
spoon: b
knife: c
- cup: 1
mug: 2
glass: 3
Would be transformed to:
foo:
fork: a
spoon: b
knife: c
cup: 1
mug: 2
glass: 3
More formally, after calling the YAML parser to get native objects from a config file, but before passing the objects to the rest of the application, my application will walk the object graph looking for mappings containing the single key MERGE. The value associated with MERGE must be either a list of lists, or a list of maps; any other substructure is an error.
In the list-of-lists case, the entire map containing MERGE will be replaced by the child lists concatenated together in the order they appeared.
In the list-of-maps case, the entire map containing MERGE will be replaced by a single map containing all of the key/value pairs in the child maps. Where there is overlap in the keys, the value from the child map occurring last in the MERGE list will be used.
The examples given above are not that useful, since you could have just written the structure you wanted directly. It's more likely to appear as:
foo:
MERGE:
- *salt
- *pepper
Allowing you to create a list or map containing everything in nodes salt and pepper being used elsewhere.
(I keep giving that foo: outer map to show that MERGE must be the only key in its mapping, which means that MERGE cannot appear as a top-level name unless there are no other top level names)

To clarify something from the two answers here, this is not supported directly in YAML for lists (but it is supported for dictionaries, see kittemon's answer).

To piggyback off of Kittemon's answer, note that you can create mappings with null values using the alternative syntax
foo:
<< : myanchor
bar:
baz:
instead of the suggested syntax
foo:
<< : myanchor
? bar
? baz
Like Kittemon's suggestion, this will allow you to use references to anchors within the mapping and avoid the sequence issue. I found myself needing to do this after discovering that the Symfony Yaml component v2.4.4 doesn't recorgnize the ? bar syntax.

Calling functions from plain text descriptions

I have an app which has common maths functions behind the scenes:
add(x, y)
multiply(x, y)
square(x)
The interface is a simple google- style text field. I want the user to be able to enter a plain text description -
'2*3'
'2 times 3'
'multiply 2 and 3'
'take the product of 2 and 3'
and get a answer mathematical answer
Question is, how should I map the text descriptions to the functions ? I'm guessing I need to
tokenise the text
identify key tokens (function names, arguments)
try and map token combinations to function signatures
However I'm guessing this is already a 'solved problem' in the machine learning space. Should I be using Natural Language Processing ? Plain text search ? Something else ?
All ideas gratefully received, plus implementation suggestions [I'm using Python/AppEngine; I know about NLTK and Whoosh]
[PS I understand Google does this already, at least for the first two queries on the list. I'm guessing they also go it statistically, having a very large amount of search data. I don't have a large amount of data available, so will need an alternative approach].

After you tokenise the text, you need parsing to get a syntax tree of your natural language phrase. Once you have this, you can map the parse tree to a mathematical expression, and then evaluate the expression. I do not think this is a solved problem. I would start with several templates, say the first two, and experiment. The larger the domain of possible descriptions, the harder the task is.

I would recommend some tool for provide grammar/patterns on text like SimpleParse for python http://www.ibm.com/developerworks/linux/library/l-simple.html. As java programmer I would prefer GATE or graph-expression.

MD5 code kata and BDD

I was thinking to implement MD5 as a code kata and wanted to use BDD to drive the design (I am a BDD newb).
However, the only test I can think of starting with is to pass in an empty string, and the simplest thing that will work is embedding the hash in my program and returning that.
The logical extension of this is that I end up embedding the hash in my solution for every test and switching on the input to decide what to return. Which of course will not result in a working MD5 program.
One of my difficulties is that there should only be one public function:
public static string MD5(input byte[])
And I don't see how to test the internals.
Is my approach completely flawed or is MD5 unsuitable for BDD?

I believe you chose a pretty hard exercise for a BDD code-kata. The thing about code-kata, or what I've understood about it so far, is that you somehow have to see the problem in small incremental steps, so that you can perform these steps in red, green, refactor iterations.
For example, an exercise of finding an element position inside an array, might be like this:
If array is empty, then position is 0, no matter the needle element
Write test. Implementation. Refactor
If array is not empty, and element does not exist, position is -1
Write test. Implementation. Refactor
If array is not empty, and element is the first in list, position is 1
Write test. Implementation. Refactor
I don't really see how to break the MD5 algorithm in that kind of steps. But that may be because I'm not really an algorithm guy. If you better understand the steps involved in the MD5 algorithm, then you may have better chances.

It depends on what you mean with unsuitable... :-) It is suitable if you want to document a few examples that describes your implementation. It should also be possible to have the algorithm emerge from your specifciation if you add one more character for each test.
By just adding a switch statement you're just trying to "cheat the system". Using BDD/TDD does not mean you have to implement stupid things. Also the fact that you have hardcoded hash values as well as a switch statement in your code are clear code smells and should be refactored and removed. That is how your algorithm should emerge because when you see the hard coded values you first remove them (by calculating the value) and then you see that they are all the same so you remove the switch statement.
Also if your question is about finding good katas I would recommend lokking in the Kata catalogue.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight