Eiffel algorithm: First Repeated Character - eiffel

I'm programming in Eiffel for a school Lab, and one of the tasks is to find a bug in a given algorithm. The algorithm returns the first repeated character.
The algorithm works as follows:
word: STRING
first_repeated_character: CHARACTER
local
i: INTEGER
ch: CHARACTER
stop: BOOLEAN
do
from
i := 1
Result := '%U'
until
i > word.count or stop
loop
ch := word[i]
if ch = word[i + 1] then
Result := ch
stop := true
end
i := i + 1
end
end
I spent the last couple of hours trying to find the bug in this, but it always passes all tests.
Any help would be much appreciated. Thanks.

I think the bug is that the code forgets to check if i + 1 is a valid index for word.
What about this code?
first_repeated_character (word: STRING): CHARACTER
-- First repeated character in `word'.
-- If none, returns the null character '%U'.
local
i, n: INTEGER
ch: CHARACTER
do
from
i := 1
n := word.count
until
Result /= '%U' or i > n
loop
ch := word [i]
if i + 1 <= n and then ch = word [i + 1] then
Result := ch
end
i := i + 1
end
end

Whilst trying to find a bug by searching for tests is a legitimate approach (and Eiffel Studio includes AutoTest, which does this systematically for you), it's a brute-force approach.
Here is a more intelligent algorithm which will find the bug in this case:
Examine every feature call in the routine under consideration. List all its preconditions (plus class invariants). Consider if there are any conditions for this routine that will fail to meet any of these preconditions or invariants.
If you find such a set of conditions, then you have probably found a bug (if the preconditions are overly strict, then the routine might work anyway - but in that case you have found another bug). You can then write a test case for these particular conditions to demonstrate the bug (and validate your reasoning).
Using this algorithm, you will locate the bug in this case (just like Jocelyn did).

Related

My Python code pass simple tests but not extended ones due to performance constrains

I'm teaching myself to code with Python and I've been completing kata's (mainly 7 kyu for now) on Codewars for a while. While trying to level up stumbled on following 6 kyu kata, that looked to be achievable with my knowledge:
Complete the function that takes a string as an input, and return a list of all the unpaired characters (i.e. they show up an odd number of times in the string), in the order they were encountered as an array.
In case of multiple appearances to choose from, take the last occurence of the unpaired character.
Notes:
A wide range of characters is used, and some of them may not render properly in your browser.
Your solution should be linear in terms of string length to pass the last test.
Examples:
"Hello World" --> ["H", "e", " ", "W", "r", "l", "d"]
"Codewars" --> ["C", "o", "d", "e", "w", "a", "r", "s"]
"woowee" --> []
"wwoooowweeee" --> []
"racecar" --> ["e"]
"Mamma" --> ["M"]
"Mama" --> ["M", "m"]
The following Python 3.6.0 code pass sample tests:
def function_name(s):
result = []
lengthS = len(s)
dictResult = {}
for item in s[-1:(-1 * lengthS - 1):-1]:
numLetters = 0
for number in range(lengthS):
if item == s[number]:
numLetters = numLetters + 1
if numLetters % 2 != 0:
dictResult[item] = numLetters
for key in list(dictResult.keys())[-1:(-1 * (len(list(dictResult))) - 1):-1]:
result.append(key)
return result
Running a full round of tests fails due to :
Execution Timed Out (12000 ms)
Commenting some parts of code allows me to fail full round of tests quicker so I could see tests use very long strings.
It is clear that my code is not performance optimised. Read some articles on algorithms and I suspect my code lacks in performance due to for loop within another for loop, which is O(n²) complexity operation. And this is where I've stuck for a while. I'm not sure if this is the only piece of code that needs to be changed and if it is - just cannot get around understanding how this could be done.
Note. Please do not post answers with complete solution to this problem. I'd rather wish to figure the final code myself. So I would be glad if you could point to sections of my code that requires attention with suggestions, example code or further reading.
As per your note:
Note. Please do not post answers with complete solution to this
problem. I'd rather wish to figure the final code myself. So I would
be glad if you could point to sections of my code that requires
attention with suggestions, example code or further reading.
I will be posting only the bottlenecks, rest is up to you to figure out.
Your code has
for number in range(lengthS):
if item == s[number]:
numLetters = numLetters + 1
which basically means you are going through the entire string and checking where all item character exists and collect the count. This is making your code quadratic in nature(O(n^2)) because for each character, there is an entire scan in the string taking n^2 steps asymptotically.
To overcome this, you can collect the count while iterating the string itself by taking the help of the dictionary methods. So, your code can change from
for item in s[-1:(-1 * lengthS - 1):-1]:
numLetters = 0
for number in range(lengthS):
if item == s[number]:
numLetters = numLetters + 1
if numLetters % 2 != 0:
dictResult[item] = numLetters
to
for item in s[-1:(-1 * lengthS - 1):-1]:
dictResult[item] = dictResult.get(item,0) + 1
This would collect all counts for a particular character in a single pass, making it O(n) in nature.
Second observation I found is that there is no point(or any advantage) in traversing the string in a reverse way, like below:
for item in s[-1:(-1 * lengthS - 1):-1]:
You can instead try simply looping over them one by one, like below:
for item in s:
dictResult[item] = dictResult.get(item,0) + 1
Third is to observe whether you really need any item in dictResult if it's count is even. If any character's count comes out to be even at any point in iteration, you can simply remove it from dictResult. This way you would only have odd valued keys in dictResult. This also makes your code a bit space efficient(sometimes significant).
Note that the odd valued keys in dictResult will not be ordered in terms of the order of characters in which the output is expected. I leave this as an exercise for you to figure out.
Update:
It seems that from Python3.7, the keys in dictionary are stored in the order in which they are inserted. So, it's relatively easy to match the order of output.
Correct Snippet:
def odd_one_out(s):
result = []
dictResult = {}
for item in s:
dictResult[item] = dictResult.get(item,0) + 1
if dictResult[item] % 2 == 0:
dictResult.pop(item)
return list(dictResult.keys())
You are correct in figuring out that it's O(n^2) which is causing the time limit exceeded error.
Instead of using two for loops to check occurrences for each character, can you think of a way to do it in one simple pass through the string and store it?
That would significantly reduce the execution time.
You are correct in terms of space complexity.
One thing you could try is to use a dict to count the number of occurrences of each letter. this way you will only iterate once through your string.
sample example:
my_counter = {}
for letter in s:
my_counter[letter] = my_counter.get(letter, 0) + 1
Because you don't want the complete solution I let you figure the rest. Hope it helps.
doc of get() :
https://docs.python.org/3/library/stdtypes.html?highlight=get#dict.get
Here I give a compact solution using a little bit of knowledge of python and his standard library:
This works for python>=3.7 because before it Counter wasn't defined as ordered
from collections import Counter
def function_name(s):
count = Counter(s)
answer = [c for c, num in count.items() if num % 2]
return answer
For other versions a little change taking characters in order from the string:
from collections import Counter
def function_name(s):
count = Counter(s)
answer = [c for c in s if count[c] % 2]
return answer

Why does my while loop not terminate in this functional language?

I'm trying to build a standard ML program using both imperative and functional concepts, and write the contents to a file. But my while loop doesn't seem to terminate and instead prints the same value continuously.
fun writeImage image filename =
let val ap = TextIO.openOut filename
val (w,h) = size image
val row = ref 0
val stringToWrite = "";
in
while !row < h do
TextIO.output(ap,"A");
row := !row + 1;
TextIO.closeOut ap
end;
If I remove the first line after the while loop, the loop terminates. But if I include TextIO.output(ap,"A");, it doesn't. Why is this the case?
Let's write your program with correct indentation, and then it becomes clear where the mistake is:
...
while !row < h do
TextIO.output(ap,"A");
row := !row + 1;
TextIO.closeOut ap
...
You loop forever because the increment is outside of the body of the loop.
You intended to write this:
...
while !row < h do (
TextIO.output(ap,"A");
row := !row + 1
);
TextIO.closeOut ap
...
right?
I have long made a study of how people come to make mistakes when programming. I am curious to know how you came to make this mistake. If you believed that ; binds stronger than while then why did you believe that the TextIO.closeOut ap was after the loop? Surely if your belief was that the ; binds the increment to the loop then the ; should bind it to the loop as well. Did you perhaps think that ML is a language like Python, where the looping constructs use the whitespace as a guide to the extent of the body?
What are your beliefs about ; in ML more generally? Do you think of ; as a statement terminator, as it is in C-like languages? Or do you think of it as an infix sequencing operation on side-effecting expressions?
What was your thought process here, and how could the tooling have made it easier for you to solve your problem without having to ask for help?

Looping through all character combinations with increasing number of elements

What I want to achieve:
I have a function where I want to loop through all possible combinations of printable ascii-characters, starting with a single character, then two characters, then three etc.
The part that makes this difficult for me is that I want this to work for as many characters as I can (leave it overnight).
For the record: I know that abc really is 97 98 99, so a numeric representation is fine if that's easier.
This works for few characters:
I could create a list of all possible combinations for n characters, and just loop through it, but that would require a huge amount of memory already when n = 4. This approach is literally impossible for n > 5 (at least on a normal desktop computer).
In the script below, all I do is increment a counter for each combination. My real function does more advanced stuff.
If I had unlimited memory I could do (thanks to Luis Mendo):
counter = 0;
some_function = #(x) 1;
number_of_characters = 1;
max_time = 60;
max_number_of_characters = 8;
tic;
while toc < max_time && number_of_characters < max_number_of_characters
number_of_characters = number_of_characters + 1;
vectors = [repmat({' ':'~'}, 1, number_of_characters)];
n = numel(vectors);
combs = cell(1,n);
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1});
combs = cat(n+1, combs{:});
combs = reshape(combs, [], n);
for ii = 1:size(combs, 1)
counter = counter + some_function(combs(ii, :));
end
end
Now, I want to loop through as many combinations as possible in a certain amount of time, 5 seconds, 10 seconds, 2 minutes, 30 minutes, so I'm hoping to create a function that's only limited by the available time, and uses only some reasonable amount of memory.
Attempts I've made (and failed at) for more characters:
I've considered pre-computing the combinations for two or three letters using one of the approaches above, and use a loop only for the last characters. This would not require much memory, since it's only one (relatively small) array, plus one or more additional characters that gets looped through.
I manage to scale this up to 4 characters, but beyond that I start getting into trouble.
I've tried to use an iterator that just counts upwards. Every time I hit any(mod(number_of_ascii .^ 1:n, iterator) == 0) I increment the m'th character by one. So, the last character just repeats the cycle !"# ... ~, and every time it hits tilde, the second character increments. Every time the second character hits tilde, the third character increments etc.
Do you have any suggestions for how I can solve this?
It looks like you're basically trying to count in base-26 (or base 52 if you need CAPS). Each number in that base will account for a specific string of character. For example,
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,10,11,12,...
Here, cap A through P are just symbols that are used to represent number symbols for base-26 system. The above simply represent this string of characters.
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,ba,bb,bc,...
Then, you can simply do this:
symbols = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E',...
'F','G','H','I','J','K','L','M','N','O','P']
characters = ['a','b','c','d','e','f','g','h','i','j','k','l',...
'm','n','o','p','q','r','s','t','u','v','w','x','y','z']
count=0;
while(true)
str_base26 = dec2base(count,26)
actual_str = % char-by-char-lookup-of-str26 to chracter string
count=count+1;
end
Of course, it does not represent characters that begin with trailing 0's. But that should be pretty simple.
You were not far with your idea of just getting an iterator that just counts upward.
What you need with this idea is a map from the integers to ASCII characters. As StewieGriffin suggested, you'd just need to work in base 95 (94 characters plus whitespace).
Why whitespace : You need something that will be mapped to 0 and be equivalent to it. Whitespace is the perfect candidate. You'd then just skip the strings containing any whitespace. If you don't do that and start directly at !, you'll not be able to represent strings like !! or !ab.
First let's define a function that will map (1:1) integers to string :
function [outstring,toskip]=dec2ASCII(m)
out=[];
while m~=0
out=[mod(m,95) out];
m=(m-out(1))/95;
end
if any(out==0)
toskip=1;
else
toskip=0;
end
outstring=char(out+32);
end
And then in your main script :
counter=1;
some_function = #(x) 1;
max_time = 60;
max_number_of_characters = 8;
currString='';
tic;
while numel(currString)<=max_number_of_characters&&toc<max_time
[currString,toskip]=dec2ASCII(counter);
if ~toskip
some_function(currString);
end
counter=counter+1;
end
Some random outputs of the dec2ASCII function :
dec2ASCII(47)
ans =
O
dec2ASCII(145273)
ans =
0)2
In terms of performance I can't really elaborate as I don't know what you want to do with your some_function. The only thing I can say is that the running time of dec2ASCII is around 2*10^(-5) s
Side note : iterating like this will be very limited in terms of speed. With the function some_function doing nothing, you'd just be able to cycle through 4 characters in around 40 minutes, and 5 characters would already take up to 64 hours. Maybe you'd want to reduce the amount of stuff you want to pass through the function you iterate on.
This code, though, is easily parallelizable, so if you want to check more combinations, I'd suggest trying to do it in a parallel manner.

How do i get an access to an element in a string that is in the array in Pascal?

I have a programm that reads an array of letters (it can be any text). Then I need to compare the 1st and the 4th element of each line of code but the programm doesn't allow me to do this. How can I get an access to those elements in order to compare them?
Program acmp387;
uses crt;
var
n, i, answer : integer;
letters : array[1..1000] of string;
Begin
read(n);
for i:=1 to n do
begin
read(letters[i]);
if ord(letters[i][1]) = ord(letters[i][4])
then answer := answer + 1;
end;
writeln(answer);
readkey;
End.
I'm interested in this line:
if ord(letters[i][1]) = ord(letters[i][4])
Your access is OK (if all strings have at least four characters, for strings with 0 to 3 characters there may be an error/message). May be you have a problem to run your program and it does not behave as expected.
Your program will work as expected if you replace the read statements by readln. A read statements makes sense only in limited situations, in interactive programs you will almost always use readln. With these changes and the input
5
abcdef
abcabc
0101010101010101
10011001
123456
you will get the result display 2 (the lines/strings abcabc and 10011001 meet the criterion and will increment answer).

Find longest suffix of string in given array

Given a string and array of strings find the longest suffix of string in array.
for example
string = google.com.tr
array = tr, nic.tr, gov.nic.tr, org.tr, com.tr
returns com.tr
I have tried to use binary search with specific comparator, but failed.
C-code would be welcome.
Edit:
I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.
Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.
Let i = 0.
Let S = null
Let c = prefix[i]
Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.
Increment i.
Go to step 3.
Is that what you're looking for?
Example:
prefix = rt.moc.elgoog
array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt
Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.
Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.
Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.
Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.
etc.
Another naïve, pseudo-answer.
Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.
Not very efficient....
Naive, pseudo-answer:
Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
Iterate over array and see if suffix is in given string
If it is, exit the loop because you are done! If not, continue.
Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.
Edit 0:
Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.
For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.
Edit 1:
On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...
If your array of strings is something along the following:
char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";
etc, then you can simply do this:
int x, max = 0;
for (x = 0; x < STRINGS; x++) {
if (strlen(string[x]) > max) {
max = strlen(string[x]);
}
}
x = 0;
while(true) {
if (string[max][x] == ".") {
GOTO out;
}
x++;
}
out:
char output[MAX_STRING_LENGTH];
int y = 0;
while (string[max][x] != NULL) {
output[y++] = string[++x];
}
(The above code may not actually work (errors, etc.), but you should get the general idea.
Why don't you use suffix arrays ? It works when you have large number of suffixes.
Complexity, O(n(logn)^2), there are O(nlogn) versions too.
Implementation in c here. You can also try googling suffix arrays.

Resources