This is just to work out a problem which looks pretty interesting. I tried to think over it, but couldn't find the way to solve this, in efficient time. May be my concepts are still building up... anyways the question is as follows..
Wanted to find out all possible permutation of a given string....... Also, share if there could be any possible variations to this problem.
I found out a solution on net, that uses recursion.. but that doesn't satisfies as it looks bit erroneous.
the program is as follows:-
void permute(char s[], int d)
{
int i;
if(d == strlen(s))
printf("%s",s);
else
{
for(i=d;i<strlen(s);i++)
{
swap(s[d],s[i]);
permute(s,d+1);
swap(s[d],s[i]);
}
}
}
If this program looks good (it is giving error when i ran it), then please provide a small example to understand this, as i am still developing recursion concepts..
Any other efficient algorithm, if exists, can also be discussed....
And Please,, this is not a HW........
Thanks.............
The code looks correct, though you only have the core of the algorithm, not a complete program. You'll have to provide the missing bits: headers, a main function, and a swap macro (you could make swap a function by calling it as swap(s, d, i)).
To understand the algorithm, it would be instructive to add some tracing output, say printf("permute(%s, %d)", s, d) at the beginning of the permute function, and run the program with a 3- or 4-character string.
The basic principle is that each recursive call to permute successively places each remaining element at position d; the element that was at position d is saved by putting it where the aforementioned remaining element was (i.e. the elements are swapped). For each placement, permute is called recursively to generate all desired substrings after the position d. So the top-level call (d=0) to permute successively tries all elements in position 0, second-level calls (d=1) try all elements in position 1 except for the one that's already in position 0, etc. The next-to-deepest calls (d=n-1) have a single element to try in the last position, and the deepest calls (d=n) print the resulting permutation.
The core algorithm requires Θ(n·n!) running time, which is the best possible since that's the size of the output. However this implementation is less efficient that it could be because it recomputes strlen(s) at every iteration, for a Θ(n²·n!) running time; the simple fix of precomputing the length would yield Θ(n·n!). The implementation requires Θ(n) memory, which is the best possible since that's the size of the input.
For an explanation of the recursion see Gilles answer.
Your code has some problems. First it will be hard to implement the required swap as a function in C, since C lacks the concept of call by reference. You could try to do this with a macro, but then you'd either have to use the exclusive-or trick to swap values in place, or use a temporary variable.
Then your repeated use of strlen on every recursion level blows up your complexity of the program. As you give it this is done at every iteration of every recursion level. Since your string even changes (because of the swaps) the compiler wouldn't even be able to notice that this is always the same. So he wouldn't be able to optimize anything. Searching for the terminating '\0' in your string would dominate all other instructions by far if you implement it like that.
Related
My qsort using random pivot is too slow, so it can't pass all tests. Qsort with middle element as pivot is also too slow (because of special test). How can I improve my qsort? I don't really know what's wrong with it.
Here are some suggestions:
only call srand(time(NULL)); once in the main() function, not in the randompartition function.
use the same type int32_t for the array and the input/output methods. This will let you use a single call to fread and fwrite to load and store the data. Note however that this approach is non portable: it will fail if the file was produced with a different endianness.
the random pivot is not fully random: the last element will never be chosen because right is the offset of the last element, not the offset of the element past the end of the array. This convention is error prone and leads to confusing +1/-1 adjustments and off by one errors such as this one.
The pathological case for this approach is a array with all identical elements. You might want to handle this case by computing the slice of identical elements around p and only recurse on smaller slices at the left and right of these elements.
I am actually making an algorithm that takes as an input a file containing tetriminoses (figures from tetris) and arranges them in smallest possible square.
Still I encounter a weird problem :
The algorithm works for less than 10 tetriminoses (every time) but start crashing with 11, 12... (I concluded that it depends on how complicated the solution is, as it finds some 14 and 15 solutions).
But the thing is, if I add an optimisation flag like -Ofast (the program is written in C) it works for every input I give him no matter how much time it takes (sometimes more than a hour..).
First I had a lot of leaks (I was using double linked list) so I changed for an Int Array, no more leaks, but same problem.
I tried using the debugger but it makes no sense (see picture) :
Debugger says my variables do not exist anymore but all I do is increment or decrement them.
For this example it just stopped while everything is fine (values of variables are correct)
Here is the link of my main function (the one that do the backtracking):
https://github.com/Caribou123/fillitAG/blob/master/19canplace/solve.c
The rest of program (same repository) consist of functions to put tetriminoses in my array, remove them from it, or print the result.
Basically I try placing a tetri, if I have enough space, I place the next one, otherwise I remove the last one and place it to the next available position etc..
Also I first thought that I were trying to place something outside of the Array, so now my Array is way bigger than it should be and filled with -1 for invalid cases (so in the worst case I just rewrite a -1), 0 for free ones, and integers values from 1 to 26 for figures.
The fact that the program works with the flag -Ofast really troubles me, as the algorithm seems to work perfectly, what could cause my program to crash ?
Here is how I tracked the number of recursions, by adding two static variables
And here is the output
(In case you want to test it yourself, use the 19canplace folder, and compile with : gcc *.c libft/libft.a)
Thanks in advance for your time,
Artiom
Edit: If you fundamentally disagree with the Fedora guide here, please explain why this approach would be worse in an objective way than classic loops. As far as I know even the CERT standard doesn't make any statement on using index variables over pointers.
I'm currently reading the Fedora Defensive Coding Guide and it suggests the following:
Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer past the
last element of the array, and calculate the number of remaining
elements by substracting the current position from that pointer. The
alternative, updating a separate variable every time when the position
is advanced, is usually less obviously correct.
This means for a given array
int numbers[] = {1, 2, 3, 4, 5};
I should not use the classic
size_t length = 5;
for (size_t i = 0; i < length; ++i) {
printf("%d ", numbers[i]);
}
but instead this:
int *end = numbers + 5;
for (int *start = numbers; start < end; ++start) {
printf("%d ", *start);
}
or this:
int *start = numbers;
int *end = numbers + 5;
while (start < end) {
printf("%d ", *start++);
}
Is my understanding the recommendation correct?
Is my implementation correct?
Which of the last 2 is safer?
Your understanding of what the text recommends is correct, as is your implementation. But regarding the basis of the recommendation, I think you are confusing safe with correct.
It's not that using a pointer is safer than using an index. The argument is that, in reasoning about the code, it is easier to decide that the logic is correct when using pointers. Safety is about failure modes: what happens if the code is incorrect (references a location outside the array). Correctness is more fundamental: that the algorithm provably does what it sets out to do. We might say that correct code doesn't need safety.
The recommendation might have been influenced by Andrew Koenig's series in Dr. Dobbs a couple of years ago. How C Makes It Hard To Check Array Bounds. Koenig says,
In addition to being faster in many cases, pointers have another big advantage over arrays: A pointer to an array element is a single value that is enough to identify that element uniquely. [...] Without pointers, we need three parameters to identify the range: the array and two indices. By using pointers, we can get by with only two parameters.
In C, referencing a location outside the array, whether via pointer or index, is equally unsafe. The compiler will not catch you out (absent use of extensions to the standard). Koenig is arguing that with fewer balls in the air, you have a better shot at getting the logic right.
The more complicated the construction, the more obvious it is that he's right. If you want a better illustration of the difference, write strcat(3) both ways. Using indexes, you have two names and two indexes inside the loop. It's possible to use the index for one with the name for the other. Using pointers, that's impossible. All you have are two pointers.
Is my understanding the recommendation correct?
Is my implementation correct?
Yes, so it seems.
The method for(type_t start = &array; start != end; start++) is sometimes used when you have arrays of more complex items. It is mostly a matter of style.
This style is sometimes used when you already have the start and end pointers available for some reason. Or in cases where you aren't really interested in the size, but just want to repeatedly compare against the end of the array. For example, suppose you have a ring buffer ADT with a start pointer and an end pointer and want to iterate through all items.
This way of doing loops is actually the very reason why C explicitly allows pointers to point 1 item out-of-bounds of an array, you can set an end pointer to one item past the array without invoking undefined behavior (as long as that item isn't de-referenced).
(It is the very same method as used by STL iterators in C++, although there's more of a rationale in C++, since it has operator overload. For example iterator++ in C++ doesn't necessarily give an item adjacently allocated in the next memory cell. For example, iterators could be used for iterating through a linked list ADT, where the ++ would translate to node->next behind the lines.)
However, to claim that this form is always the preferred one is just subjective nonsense. Particularly when you have an array of integers and know the size. Your first example is the most readable form of a loop in C and therefore always preferred whenever possible.
On some compilers/systems, the first form could also give faster code than the second form. Pointer arithmetic might give slower code on some systems. (And I suppose that the first form might give faster data cache access on some systems, though I'd have to verify that assumption with some compiler guru.)
Which of the last 2 is safer?
Neither form is safer than the other. To claim otherwise would be subjective opinions. The statement "...is usually less obviously correct" is nonsense.
Which style to pick vary on case-to-case basis.
Overall, those "Fedora" guidelines you link seem to contain lots of questionable code, questionable rules and blatant opinions. Seems more like someone wanted to show off various C tricks than a serious attempt to write a coding standard. Overall, it smells like the "Linux kernel guidelines", which I would not recommended to read either.
If you want a serious coding standard for/by professionals, use CERT-C or MISRA-C.
I have to write a simple program in C that prints to the standard output triangle with two equal edges for given number n. Meaning that for n=3 the output would be:
x
xx
xxx
Now I'm supposed to do two version of this program:
1. Memory conservative.
2. Time conservative.
Now I'm not entirely sure, but I think that the first version would just print x one at a time, and the second would expand the char table one at a time and then print it.
But is printing a char* faster than printing multiple single chars?
You may not be able to observe but building the entire string in memory and then printing it at once is definitely faster in theory. Reason being you will be making less calls to printf function. Each time you call a function there are multiple things that happen in the background like pushing all the current method variables and current location to stack and popping them back after returning.
However as I mentioned you may not be able to observe this difference for smaller inputs because the time needed for each of these operations are small unless you use a computer from 1960s.
Edit: WHOOPS! Big admission, I screwed up the definition of the ? in fnmatch pattern syntax and seem to have proposed (and possibly solved) a much harder problem where it behaves like .? in regular expressions. Of course it actually is supposed to behave like . in regular expressions (matching exactly one character, not zero or one). Which in turn means my initial problem-reduction work was sufficient to solve the (now rather boring) original problem. Solving the harder problem is rather interesting still though; I might write it up sometime.
On the plus side, this means there's a much greater chance that something like 2way/SMOA needle factorization might be applicable to these patterns, which in turn could yield the better-than-originally-desired O(n) or even O(n/m) performance.
In the question title, let m be the length of the pattern/needle and n be the length of the string being matched against it.
This question is of interest to me because all the algorithms I've seen/used have either pathologically bad performance and possible stack overflow exploits due to backtracking, or required dynamic memory allocation (e.g. for a DFA approach or just avoiding doing backtracking on the call stack) and thus have failure cases that could also be dangerous if a program is using fnmatch to grant/deny access rights of some sort.
I'm willing to believe that no such algorithm exists for regular expression matching, but the filename pattern language is much simpler than regular expressions. I've already simplified the problem to the point where one can assume the pattern does not use the * character, and in this modified problem you're not matching the whole string but searching for an occurrence of the pattern in the string (like the substring match problem). If you further simplify the language and remove the ? character, the language is just composed of concatenations of fixed strings and bracket expressions, and this can easily be matched in O(mn) time and O(1) space, which perhaps can be improved to O(n) if the needle factorization techniques used in 2way and SMOA substring search can be extended to such bracket patterns. However, naively each ? requires trials with or without the ? consuming a character, bringing in a time factor of 2^q where q is the number of ? characters in the pattern.
Anyone know if this problem has already been solved, or have ideas for solving it?
Note: In defining O(1) space, I'm using the Transdichotomous_model.
Note 2: This site has details on the 2way and SMOA algorithms I referenced: http://www-igm.univ-mlv.fr/~lecroq/string/index.html
Have you looked into the re2 regular expression engine by Russ Cox (of Google)?
It's a regular expression matching engine based on deterministic finite automata, which is different than the usual implementations (Perl, PCRE) using backtracking to simulate a non-deterministic finite automaton. One of the specific design goals was to eliminate the catastrophic backtracking behaviour you mention.
It disallows some of the Perl extensions like backreferences in the search pattern, but you don't need that for glob matching.
I'm not sure if it guarantees O(mn) time and O(1) memory constraints specifically, but it was good enough to run the Google Code Search service while it existed.
At the very least it should be cool to look inside and see how it works. Russ Cox has written three articles about re2 - one, two, three - and the re2 code is open source.
Edit: WHOOPS! Big admission, I screwed up the definition of the ? in fnmatch pattern syntax and seem to have solved a much harder problem where it behaves like .? in regular expressions. Of course it actually is supposed to behave like . in regular expressions (matching exactly one character, not zero or one). Which in turn means my initial problem-reduction work was sufficient to solve the (now rather boring) original problem. Solving the harder problem is rather interesting still though; I might write it up sometime.
Possible solution to the harder problem follows below.
I have worked out what seems to be a solution in O(log q) space (where q is the number of question marks in the pattern, and thus q < m) and uncertain but seemingly better-than-exponential time.
First of all, a quick explanation of the problem reduction. First break the pattern at each *; it decomposes as a (possibly zero length) initial and final component, and a number of internal components flanked on both sided by a *. This means once we've determined if the initial/final components match up, we can apply the following algorithm for internal matches: Starting with the last component, search for the match in the string that starts at the latest offset. This leaves the most possible "haystack" characters free to match earlier components; if they're not all needed, it's no problem, because the fact that a * intervenes allows us to later throw away as many as needed, so it's not beneficial to try "using more ? marks" of the last component or finding an earlier occurrence of it. This procedure can then be repeated for every component. Note that here I'm strongly taking advantage of the fact that the only "repetition operator" in the fnmatch expression is the * that matches zero or more occurrences of any character. The same reduction would not work with regular expressions.
With that out of the way, I began looking for how to match a single component efficiently. I'm allowing a time factor of n, so that means it's okay to start trying at every possible position in the string, and give up and move to the next position if we fail. This is the general procedure we'll take (no Boyer-Moore-like tricks yet; perhaps they can be brought in later).
For a given component (which contains no *, only literal characters, brackets that match exactly one character from a given set, and ?), it has a minimum and maximum length string it could match. The minimum is the length if you omit all ? characters and count bracket expressions as one character, and the maximum is the length if you include ? characters. At each position, we will try each possible length the pattern component could match. This means we perform q+1 trials. For the following explanation, assume the length remains fixed (it's the outermost loop, outside the recursion that's about to be introduced). This also fixes a length (in characters) from the string that we will be comparing to the pattern at this point.
Now here's the fun part. I don't want to iterate over all possible combinations of which ? characters do/don't get used. The iterator is too big to store. So I cheat. I break the pattern component into two "halves", L and R, where each contains half of the ? characters. Then I simply iterate over all the possibilities of how many ? characters are used in L (from 0 to the total number that will be used based on the length that was fixed above) and then the number of ? characters used in R is determined as well. This also partitions the string we're trying to match into part that will be matched against pattern L and pattern R.
Now we've reduced the problem of checking if a pattern component with q ? characters matches a particular fixed-length string to two instances of checking if a pattern component with q/2 ? characters matches a particular smaller fixed-length string. Apply recursion. And since each step halves the number of ? characters involved, the number of levels of recursion is bounded by log q.
You can create a hash of both strings and then compare these. The hash computation will be done in O(m) while the search in O(m + n)
You can use something like this for calculating the hash of the string where s[i] is a character
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
As you said this is for file-name matching and you can't use this where you have wildcards in the strings. Good luck!
My feeling is that this is not possible.
Though I can't provide a bullet-proof argument, my intuition is that you will always be able to construct patterns containing q=Theta(m) ? characters where it will be necessary for the algorithm to, in some sense, account for all 2^q possibilities. This will then require O(q)=O(m) space to keep track of which of the possibilities you're currently looking at. For example, the NFA algorithm uses this space to keep track of the set of states it's currently in; the brute-force backtracking approach uses the space as stack (and to add insult to injury, it uses O(2^q) time in addition to the O(q) of space).
OK, here's how I solved the problem.
Attempt to match the initial part of the pattern up to the first * against the string. If this fails, bail out. If it succeeds, throw away this initial part of both the pattern and the string; we're done with them. (And if we hit the end of pattern before hitting a *, we have a match iff we also reached the end of the string.)
Skip all the way to end end of the pattern (everything after the last *, which might be a zero-length pattern if the pattern ends with a *). Count the number of characters needed to match it, and examine that many characters from the end of the string. If they fail to match, we're done. If they match, throw away this component of the pattern and string.
Now, we're left with a (possibly empty) sequence of subpatterns, all of which are flanked on both sides by *'s. We try searching for them sequentially in what remains of the string, taking the first match for each and discarding the beginning of the string up through the match. If we find a match for each component in this manner, we have a match for the whole pattern. If any component search fails, the whole pattern fails to match.
This alogorithm has no recursion and only stores a finite number of offsets in the string/pattern, so in the transdichotomous model it's O(1) space. Step 1 was O(m) in time, step 2 was O(n+m) in time (or O(m) if we assume the input string length is already known, but I'm assuming a C string), and step 3 is (using a naive search algorithm) O(nm). Thus the algorithm overall is O(nm) in time. It may be possible to improve step 3 to be O(n) but I haven't yet tried.
Finally, note that the original harder problem is perhaps still useful to solve. That's because I didn't account for multi-character collating elements, which most people implementing regex and such tend to ignore because they're ugly to get right and there's no standard API to interface with the system locale and obtain the necessary info to get them. But with that said, here's an example: Suppose ch is a multi-character collating element. Then [c[.ch.]] could consume either 1 or 2 characters. And we're back to needing the more advanced algorithm I described in my original answer, which I think needs O(log m) space and perhaps somewhat more than O(nm) time (I'm guessing O(n²m) at best). At the moment I have no interest in implementing multi-character collating element support, but it does leave a nice open problem...