I was wondering how inserting an equation such as 5 * 4 + 2 / 3 into a binary tree would work. I have tried doing this on my own however I can only make the tree grow to one side.
I am not an expert in the field, but I wrote basic expression parsers in the past.
You will need to tokenize your expression. Transform it from a string of characters to a list of understandable chunks.
You may want to create two structures, one for operators and one for operands. Hint, operators have a priority associated with them.
You can apply an algorithm to transform your operators/operands to an abstract syntax tree (AST) it's basically just a set of rules, generally used with a queue and a stack.
Related
For my university task I must design a Deterministic Finite Automata which recognises basic arithmetic. We're basically building a very basic lexical analyzer.
The DFA uses the operators "+,-,*,/".
The DFA has only positive numbers so expressions like "-1+1","+1+1" aren't accepted.
It can accept decimals but only when they start with 0. so "0.3415" is accepted while "1.3415" is not.
Finally it can accept just a "0" by itself.
I'm confused about the best way to approach this. I have a basic foundation of DFAs and NFAs so can someone please just give me some hints as to how I should start?
My current approach is to draw some small DFAs. One for decimal numbers, one for whole numbers, one for operators, and one that's just a 0. Then I want to concatenate them and do the union of the smaller DFAs to create one big NFA and end it by converting back to a DFA.
PREMISE
So lately i have been thinking of a problem that is common to databases: trying to optimize insertion, search, deletion and update of data.
Usually i have seen that most of the databases nowadays use the BTree or B+Tree to solve such a problem, but they are usually used to store data inside the disk and i wanted to work with in-memory data, so i thought about using the AVLTree (the difference should be minimal because the purpose of the BTrees is kind of the same of the AVLTree but the implementation is different and so are the effects).
Before continuing with the reasoning behind this i would like to get in a deeper level of what i am trying to solve.
So in a modern database data stored in a table with a PRIMARY KEY which tends to be INDEXED (i am not very experienced in indexing so what i will say is basic reasoning i put into this problem), usually the PRIMARY KEY is an increasing number (even though nowadays is a bad practice) starting from 1.
Using normally an AVLTree should be more then enough to solve the problem cause this particular tree is always balanced and offers O(log2(n)) operations, BUT i wanted to reach this on a deeper level trying to optimize it even more then needed.
THEORY
So as the title of the question suggests i am trying to optimize the AVLTree merging it with a Btree.
Basically every node of this new Tree is lets say an array of ten elements every node as also the corresponding height in the tree and every element of the array is ordered ascending.
INSERTION
The insertion initally fills the array of the root node when the root node is full it generates the left and right children which also contains an array of 10 elements.
Whenever a new node is added the Tree autorebalances the nodes based on the first key of the vectors of the left and right child using also their height (note that this is actually how the AVLTree behaves but the AVLTree only has 2 nodes and no vector just the values).
SEARCH
Searching an element works this way: staring from the root we compare the value we are searching K with the first and last key of the array of the current node if the value is in between, we know that it surely will be in the array of the current node so we can start using a binarySearch with O(log2(n)) complexity into this array of ten elements, otherise we go on the left if the key we are searcing is smaller then the first key or we go to the right if it is bigger.
DELETION
The same of the searching but we delete the value.
UPDATE
The same of the searching but we update the value.
CONCLUSION
If i am not wrong this should have a complexity of O(log10(log2(10))) which is always logarithmic so we shouldn't care about this optimization, but in my opinion this could make the height of the tree so much smaller while providing also quick time on the search.
B tree and B+ tree are indeed used for disk storage because of the block design. But there is no reason why they could not be used also as in-memory data structure.
The advantages of a B tree include its use of arrays inside a single node. Look-up in a limited vector of maybe 10 entries can be very fast.
Your idea of a compromise between B tree and AVL would certainly work, but be aware that:
You need to perform tree rotations like in AVL in order to keep the tree balanced. In B trees you work with redistributions, merges and splits, but no rotations.
Like with AVL, the tree will not always be perfectly balanced.
You need to describe what will be done when a vector is full and a value needs to be added to it: the node will have to split, and one half will have to be reinjected as a leaf.
You need to describe what will be done when a vector gets a very low fill-factor (due to deletions). If you leave it like that, the tree could degenerate into an AVL tree where every vector only has 1 value, and then the additional vector overhead will make it less efficient than a genuine AVL tree. To keep the fill-factor of a vector above a minimum you cannot easily apply the redistribution mechanism with a sibling node, as would be done in B-trees. It would work with leaf nodes, but not with internal nodes. So this needs to be clarified...
You need to describe what will be done when a value in a vector is updated. Of course, you would insert it in its sorted position: but if it becomes the first or last value in that vector, this may violate the order with regards to left and right children, and so also there you may need to define more precisely the algorithm.
Binary search in a vector of 10 may be overkill: a simple left-to-right scan may be faster, as CPUs are optimised to read consecutive memory. This does not impact the time complexity, since we set that the vector size is limited to 10. So we are talking about doing either at most 4 comparisons (3-4 on average depending on binary search implementation) or at most 10 comparisons (5 on average).
If I am not wrong this should have a complexity of O(log10(log2(n))) which is always logarithmic
Actually, if that were true, it would be sub-logarithmic, i.e. O(loglogn). But there is a mistake here. The binary search in a vector is not related to n, but to 10. Also, this work comes in addition to finding the node with that vector. So it is not a logarithm of a logarithm, but a sum of logarithms:
O(log10n + log210) = O(log n)
Therefore the time complexity is no different than the one for AVL or B-tree -- provided that the algorithm is completed with the missing details, keeping within the logarithmic complexity.
You should maybe also consider to implement a pure B tree or B+ tree: that way you also benefit from some of the advantages that neither the AVL, nor the in-between structure has:
The leaves of the tree are all at the same level
No rotations are needed
The tree height only changes at one spot: the root.
B+ trees provide a very fast mean for iterating all values in their order.
Good morning, I'm trying to perform a 2D FFT as 2 1-Dimensional FFT.
The problem setup is the following:
There's a matrix of complex numbers generated by an inverse FFT on an array of real numbers, lets call it arr[-nx..+nx][-nz..+nz].
Now, since the original array was made up of real numbers, I exploit the symmetry and reduce my array to be arr[0..nx][-nz..+nz].
My problem starts here, with arr[0..nx][-nz..nz] provided.
Now I should come back in the domain of real numbers.
The question is what kind of transformation I should use in the 2 directions?
In x I use the fftw_plan_r2r_1d( .., .., .., FFTW_HC2R, ..), called Half complex to Real transformation because in that direction I've exploited the symmetry, and that's ok I think.
But in z direction I can't figure out if I should use the same transformation or, the Complex to complex (C2C) transformation?
What is the correct once and why?
In case of needing here, at page 11, the HC2R transformation is briefly described
Thank you
"To easily retrieve a result comparable to that of fftw_plan_dft_r2c_2d(), you can chain a call to fftw_plan_dft_r2c_1d() and a call to the complex-to-complex dft fftw_plan_many_dft(). The arguments howmany and istride can easily be tuned to match the pattern of the output of fftw_plan_dft_r2c_1d(). Contrary to fftw_plan_dft_r2c_1d(), the r2r_1d(...FFTW_HR2C...) separates the real and complex component of each frequency. A second FFTW_HR2C can be applied and would be comparable to fftw_plan_dft_r2c_2d() but not exactly similar.
As quoted on the page 11 of the documentation that you judiciously linked,
'Half of these column transforms, however, are of imaginary parts, and should therefore be multiplied by I and combined with the r2hc transforms of the real columns to produce the 2d DFT amplitudes; ... Thus, ... we recommend using the ordinary r2c/c2r interface.'
Since you have an array of complex numbers, you can either use c2r transforms or unfold real/imaginary parts and try to use HC2R transforms. The former option seems the most practical.Which one might solve your issue?"
-#Francis
I have an array of elements whose key is a regex.
I would like to come up with a fast algorithm that given a string (not a regex) will return in less than O(N) time what are the matching array values based on execution of the key regex.
Currently I do a linear scan of the array, for each element I execute the respective regex with posix regexec API, but this means that to find the matching elements I have to search across the whole array.
I understand if the aray was composed by only simple strings as key, I could have kept it orderer and use a bsearch style API, but with regex looks like is not so easy.
Am I missing something here?
Example follows
// this is mainly to be considered
// as pseudocode
typedef struct {
regex_t r;
... some other data
} value;
const char *key = "some/key";
value my_array[1024];
bool my_matches[1024];
for(int i =0; i < 1024; ++i) {
if(!regexec(&my_array[i].r, key, 0, 0, REG_EXTENDED))
my_matches[i] = 1;
else
my_matches[i] = 0;
}
But the above, as you can see, is linear.
Thanks
Addendum:
I've put together a simple executable which executed above algorithm and something proposed in below answer, where form a large regex it build a binary tree of sub-regex and navigates it to find all the matches.
Source code is here (GPLv3): http://qpsnr.youlink.org/data/regex_search.cpp
Compile with: g++ -O3 -o regex_search ./regex_search.cpp -lrt
And run with: ./regex_search "a/b" (or use --help flag for options)
Interestingly (and I would say as expected) when searching in the tree, it takes less number of regex to execute, but these are far more complex to run for each comparison, so eventually the time it takes balances out with the linear scan of vectors. Results are printed on std::cerr so you can see that those are the same.
When running with long strings and/or many token, watch out for memory usage; be ready to hit Ctrl-C to stop it from preventing your system to crash.
This is possible but I think you would need to write your own regex library to achieve it.
Since you're using posix regexen, I'm going to assume that you intend to actually use regular expressions, as opposed to the random collection of computational features which modern regex libraries tend to implement. Regular expressions are closed under union (and many other operations), so you can construct a single regular expression from your array of regular expressions.
Every regular expression can be recognized by a DFA (deterministic finite-state automaton), and a DFA -- regardless of how complex -- recognizes (or fails to recognize) a string in time linear to the length of the string. Given a set of DFAs, you can construct a union DFA which recognizes the languages of all DFAs, and furthermore (with a slight modification of what it means for a DFA to accept a string), you can recover the information about which subset of the DFAs matched the string.
I'm going to try to use the same terminology as the Wikipedia article on DFAs. Let's suppose we have a set of DFAs M = {M1...Mn} which share a single alphabet Σ. So we have:
Mi = (Qi, Σ, δi, qi0, Fi) where Qi = {qij} for 0 ≤ j < |Qi|, and Qi ⊂ Fi.
We construct the union-DFA M⋃ = (Q⋃, Σ, δ⋃, q⋃0) (yes, no F; I'll get to that) as follows:
q⋃0 = <q10,...,qn0>
δ⋃(<q1j1,...,qnjn>, α) = <δ1(q1j1, α),... , δn(qnjn, α)> for each α ∈ Σ
Q⋃ consists of all states reachable through δ⋃ starting from q⋃0.
We can compute this using a standard closure algorithm in time proportional to the product of the sizes of the δi transition functions.
Now to do a union match on a string α1...αm, we run the union DFA in the usual fashion, starting with its start symbol and applying its transition function to each α in turn. Once we've read the last symbol in the string, the DFA will be in some state <q1j1,...,qnjn>. From that state, we can extract the set of Mi which would have matched the string as: {Mi | qiji ∈ Fi}.
In order for this to work, we need the individual DFAs to be complete (i.e., they have a transition from every state on every symbol). Some DFA construction algorithms produce DFAs which are lacking transitions on some symbols (indicating that no string with that prefix is in the language); such DFAs must be augmented with a non-accepting "sink" state which has a transition to itself on every symbol.
I don't know of any regex library which exposes its DFAs sufficiently to implement the above algorithm, but it's not too much work to write a simple regex library which does not attempt to implement any non-regular features. You might also be able to find a DFA library.
Constructing a DFA from a regular expression is potentially exponential in the size of the expression, although such cases are rare. (The non-deterministic FA can be constructed in linear time, but in some cases, the powerset construction on the NFA will require exponential time and space. See the Wikipedia article.) Once the DFAs are constructed, however, the union FA can be constructed in time proportional to the product of the sizes of the DFAs.
So it should be easy enough to allow dynamic modification to the set of regular expressions, by compiling each regex to a DFA once, and maintaining the set of DFAs. When the set of regular expressions changes, it is only necessary to regenerate the union DFA.
Hope that all helps.
I'm having issues 'describing each step' when creating an NFA from a regular expression. The question is as follows:
Convert the following regular expression to a non-deterministic finite-state automaton (NFA), clearly describing the steps of the algorithm that you use:
(b|a)*b(a|b)
I've made a simple 3-state machine but it's very much from intuition.
This is a question from a past exam written by my lecturer, who also wrote the following explanation of Thompson's algorithm: http://www.cs.may.ie/staff/jpower/Courses/Previous/parsing/node5.html
Can anyone clear up how to 'describe each step clearly'? It just seems like a set of basic rules rather than an algorithm with steps to follow.
Maybe there's an algorithm I've glossed over somewhere but so far I've just created them with an educated guess.
Short version for general approach.
There's an algo out there called the Thompson-McNaughton-Yamada Construction Algorithm or sometimes just "Thompson Construction." One builds intermediate NFAs, filling in the pieces along the way, while respecting operator precedence: first parentheses, then Kleene Star (e.g., a*), then concatenation (e.g., ab), followed by alternation (e.g., a|b).
Here's an in-depth walkthrough for building (b|a)*b(a|b)'s NFA
Building the top level
Handle parentheses. Note: In actual implementation, it can make sense to handling parentheses via a recursive call on their contents. For the sake of clarity, I'll defer evaluation of anything inside of parens.
Kleene Stars: only one * there, so we build a placeholder Kleene Star machine called P (which will later contain b|a).
Intermediate result:
Concatenation: Attach P to b, and attach b to a placeholder machine called Q (which will contain (a|b). Intermediate result:
There's no alternation outside of parentheses, so we skip it.
Now we're sitting on a P*bQ machine. (Note that our placeholders P and Q are just concatenation machines.) We replace the P edge with the NFA for b|a, and replace the Q edge with the NFA for a|b via recursive application of the above steps.
Building P
Skip. No parens.
Skip. No Kleene stars.
Skip. No contatenation.
Build the alternation machine for b|a. Intermediate result:
Integrating P
Next, we go back to that P*bQ machine and we tear out the P edge. We have the source of the P edge serve as the starting state for the P machine, and the destination of the P edge serve as the destination state for the P machine. We also make that state reject (take away its property of being an accept state). The result looks like this:
Building Q
Skip. No parens.
Skip. No Kleene stars.
Skip. No contatenation.
Build the alternation machine for a|b. Incidentally, alternation is commutative, so a|b is logically equivalent to b|a. (Read: skipping this minor footnote diagram out of laziness.)
Integrating Q
We do what we did with P above, except replacing the Q edge with the intermedtae b|a machine we constructed. This is the result:
Tada! Er, I mean, QED.
Want to know more?
All the images above were generated using an online tool for automatically converting regular expressions to non-deterministic finite automata. You can find its source code for the Thompson-McNaughton-Yamada Construction algorithm online.
The algorithm is also addressed in Aho's Compilers: Principles, Techniques, and Tools, though its explanation is sparse on implementation details. You can also learn from an implementation of the Thompson Construction in C by the excellent Russ Cox, who described it some detail in a popular article about regular expression matching.
In the GitHub repository below, you can find a Java implementation of Thompson's construction where first an NFA is being created from the regex and then an input string is being matched against that NFA:
https://github.com/meghdadFar/regex
https://github.com/White-White/RegSwift
No more tedious words. Check out this repo, it translates your regular expression to an NFA and visually shows you the state transitions of an NFA.