longest common subsequence: why is this wrong? - c

int lcs(char * A, char * B)
{
int m = strlen(A);
int n = strlen(B);
int *X = malloc(m * sizeof(int));
int *Y = malloc(n * sizeof(int));
int i;
int j;
for (i = m; i >= 0; i--)
{
for (j = n; j >= 0; j--)
{
if (A[i] == '\0' || B[j] == '\0')
X[j] = 0;
else if (A[i] == B[j])
X[j] = 1 + Y[j+1];
else
X[j] = max(Y[j], X[j+1]);
}
Y = X;
}
return X[0];
}
This works, but valgrind complains loudly about invalid reads. How was I messing up the memory? Sorry, I always fail at C memory allocation.

The issue here is with the size of your table. Note that you're allocating space as
int *X = malloc(m * sizeof(int));
int *Y = malloc(n * sizeof(int));
However, you are using indices 0 ... m and 0 ... n, which means that there are m + 1 slots necessary in X and n + 1 slots necessary in Y.
Try changing this to read
int *X = malloc((m + 1) * sizeof(int));
int *Y = malloc((n + 1) * sizeof(int));
Hope this helps!

Series of issues. First, as templatetypedef says, you're under-allocated.
Then, as paddy says, you're not freeing up your malloc'd memory. If you need the Y=X line, you'll need to store the original malloc'd space addresses in another set of variables so you can call free on them.
...mallocs...
int * original_y = Y;
int * original_x = X;
...body of code...
free(original_y);
free(original_x);
return X[0];
But this doesn't address your new question, which is why doesn't the code actually work?
I admit I can't follow your code (without a lot more study), but I can propose an algorithm that will work and be far more understandable. This may be somewhat pseudocode and not particularly efficient, but getting it correct is the first step. I've listed some optimizations later.
int lcs(char * A, char * B)
{
int length_a = strlen(A);
int length_b = strlen(B);
// these hold the position in A of the longest common substring
int longest_found_length = 0;
// go through each substring of one of the strings (doesn't matter which, you could pick the shorter one if you want)
char * candidate_substring = malloc(sizeof(char) * length_a + 1);
for (int start_position = 0; start_position < length_a; start_position++) {
for (int end_position = start_position; end_position < length_a; end_position++) {
int substring_length = end_position - start_position + 1;
// make a null-terminated copy of the substring to look for in the other string
strncpy(candidate_substring, &(A[start_position]), substring_length);
if (strstr(B, candidate_substring) != NULL) {
longest_found_length = substring_length;
}
}
}
free(candidate_substring);
return longest_found_length;
}
Some different optimizations you could do:
// if this can't be longer, then don't bother checking it. You can play games with the for loop to not have this happen, but it's more complicated.
if (substring_length <= longest_found_index) {
continue;
}
and
// there are more optimizations you could do to this, but don't check
// the substring if it's longer than b, since b can't contain it.
if (substring_length > length_b) {
continue;
}
and
if (strstr(B, candidate_substring) != NULL) {
longest_found_length = end_position - start_position + 1;
} else {
// if nothing contains the shorter string, then nothing can contain the longer one, so skip checking longer strings with the same starting character
break; // skip out of inner loop to next iteration of start_position
}
Instead of copying each candidate substring to a new string, you could do a character swap with the end_position + 1 and a NUL character. Then, after looking for that substring in b, swap the original character at end_position+1 back in. This would be much faster, but complicates the implementation a little.

NOTE: I don't normally write two answers and if you feel that it is tacky, feel free to comment on this one and note vote it up. This answer is a more optimized solution, but I wanted to give the most straightforward one I could think of first and then put this in another answer to not confuse the two. Basically they are for different audiences.
The key to solving this problem efficiently is to not throw away information you have about shorter common substrings when looking for longer ones. Naively, you check each substring against the other one, but if you know that "AB" matches in "ABC", and your next character is C, don't check to see if "ABC" is in "ABC", just check that the spot after "AB" is a "C".
For each character in A, you have to check up to all the letters in B, but because we stop looking through B once a longer substring is no longer possible, it greatly limits the number of checks. Each time you get a longer match up front, you eliminate checks on the back-end, because it will no longer be a longer substring.
For example, if A and B are both long, but contain no common letters, each letter in A will be compared against each letter in B for a runtime of A*B.
For a sequence where there are a lot of matches, but the match length isn't a large fraction of the length of the shorter string, you have A * B combinations to check against the shorter of the two strings (A or B) leading to either A*B*A or A*B*B, which is basically O(n^3) time for similar length strings. I really thought the optimizations in this solution would be better than n^3 even though there are triple-nested for loops, but it appears to not be as best as I can tell.
I'm thinking about this some more, though. Either the substrings being found are NOT a significant fraction of the length of the strings, in which case the optimizations don't do much, but the comparisons for each combination of A*B don't scale with A or B and drop out to be constants -- OR -- they are a significant fraction of A and B and it directly divides against the A*B combinations that have to be compared.
I just may ask this in a question.
int lcs(char * A, char * B)
{
int length_a = strlen(A);
int length_b = strlen(B);
// these hold the position in A of the longest common substring
int longest_length_found = 0;
// for each character in one string (doesn't matter which), look for incrementally larger strings in the other
for (int a_index = 0; a_index < length_a - longest_length_found; a_index++) {
for (int b_index = 0; b_index < length_b - longest_length_found; b_index++) {
// offset into each string until end of string or non-matching character is found
for (int offset = 0; A[a_index+offset] != '\0' && B[b_index+offset] != '\0' && A[a_index+offset] == B[b_index+offset]; offset++) {
longest_length_found = longest_length_found > offset ? longest_length_found : offset;
}
}
}
return longest_found_length;
}

In addition to what templatetypedef said, some things to think about:
Why aren't X and Y the same size?
Why are you doing Y = X? That's an assignment of pointers. Did you perhaps mean memcpy(Y, X, (n+1)*sizeof(int))?

Related

decreasing time it takes to run my program in c

I was writing a program that is reading from a file and then storing the data in two tables that are in a table of structure. I am expanding the tables with realloc and the time my program takes to run is ~ 0.7 s.
Can i somehow decrease this time?
typedef struct {
int *node;
int l;
int *waga;
} przejscie_t;
void czytaj(przejscie_t **graf, int vp, int vk, int waga) {
(*graf)[vp].node[(*graf)[vp].l - 1] = vk;
(*graf)[vp].waga[(*graf)[vp].l - 1] = waga;
(*graf)[vp].l++;
}
void wypisz(przejscie_t *graf, int i) {
printf("i=%d l=%d ", i, graf[i].l);
for (int j = 0; j < (graf[i].l - 1); j++) {
printf("vk=%d waga=%d ", graf[i].node[j], graf[i].waga[j]);
}
printf("\n");
}
void init(przejscie_t **graf, int vp, int n) {
*graf = realloc(*graf, (vp + 1) * sizeof(przejscie_t));
if (n == vp || n == -1){
(*graf)[vp].l = 1;
(*graf)[vp].node = malloc((*graf)[vp].l * sizeof(int));
(*graf)[vp].waga = malloc((*graf)[vp].l * sizeof(int));
}
else {
for (int i = n; i <= vp; i++) {
(*graf)[i].l = 1;
(*graf)[i].node = malloc((*graf)[i].l * sizeof(int));
(*graf)[i].waga = malloc((*graf)[i].l * sizeof(int));
}
}
}
Here some suggestions:
I think you should pre-calculate the required size of your *graf memory instead of reallocating it again and again. By using a prealloc_graf function for example.
You will get some great time improvement since reallocating is time-consuming especially when it must actually move the memory.
You should do this method especially if you are working with big files.
And since you're working with files, pre-calculating should be done easily.
If your files size are both light and heavy, you have two choices:
Accept your fate and allow your code to be a little bit less optimized on small files.
Create two init functions: The first one is optimized for small files, the other one will be for bigger files but... You will have to run some benchmarks to actually determine what algorithm is the best for each case before being able to implement it. You could actually automate that if you have the time and the will to do so.
It is important to check for successful memory allocation before trying to use the said memory because allocation function can fail.
Finally, some changes for the init function :
void init(przejscie_t **graf, int vp, int n) {
*graf = realloc(*graf, (vp + 1) * sizeof(przejscie_t));
// The `if` statement was redundant.
// Added a ternary operator for ``n == -1``.
// Alternatively, you could use ``n = (n == -1 ? vp : n)`` right before the loop.
for (int i = (n == -1 ? vp : n); i <= vp; i++) {
(*graf)[i].l = 1;
// (*graf)[X].l is is always 1.
// There is no reason to use (*graf)[X].l * sizeof(int) for malloc.
(*graf)[i].node = malloc(sizeof(int));
(*graf)[i].waga = malloc(sizeof(int));
}
}
I've commented everything that I've changed but here is a summary :
The if statement was redundant.
The for loop cover all cases with ternary operator for n
equals -1.
The code should be easier to understand and to comprehend this way.
The node and waga arrays were not being initialized "properly".
Since l is always equals 1 there was no need for an
additional operation.
This doesn't really change execution time tho since its constant.
I would also suggest that your "functions running allocation functions" should return a boolean saying if the function succeeded. In the case the allocation failed you can return false to say that your function failed.

copy two arrays of int to one char* in C

I have to arrays of int for example arr1={0,1,1,0,0}, arr2={1,0,1,1,1} and I need to return 1 char* created by malloc that will be shown like this : "01100,10111".
when I do for loop it doesn't work, how can I do it ?
char* ans = (char*)malloc((size * 2+1) * sizeof(int));
for (int i = 0; i < size; i++)
ans[i] = first[i];
ans[size] = ",";
for (int i = size+1; i < 2*size+1; i++)
ans[i] = second[i];
Among the multitude of problems:
Your allocation size is wrong. It should include space for the separating comma and the terminating nullchar. sizeof(int) is wrong regardless, it should be sizeof(char) and as-such can be omitted (sizeof(char) is always 1).
Your storage is wrong. You want to store characters, and your values should be adjusted relative to '0'.
Your indexing of the second loop is wrong.
In reality, you don't need the second loop in the first place:
char* ans = malloc(size * 2 + 2);
for (int i = 0; i < size; i++)
{
ans[i] = '0' + first[i];
ans[size+1+i] = '0' + second[i];
}
ans[size] = ',';
ans[2*size+1] = 0;
That's it.
1.
char* ans = (char*)malloc((size * 2+1) * sizeof(int));
What is size here? It is not defined and declared in the provided code.
You do not need to cast the return value of malloc() to char. In fact, you do not need to cast the return value of malloc() anymore. It is a habit from the early C days.
Why do you need a char pointer here at all exactly? If you want to print 01100,10111 there is no need to use a char pointer for the output of the integer values.
2.
for (int i = 0; i < size; i++)
ans[i] = first[i];
Again what is size here?
What is first here? If it isn´t a pointer this statement is invalid.
3.
ans[size] = ",";
This operation is invalid. You are trying to assign a string to a pointer.
By the way, I don´t know what you trying to do with this statement. You can incorporate the comma separate in the output of 01100,10111, without your intend to include it int the memory of the int arrays itself.
4.
for (int i = size+1; i < 2*size+1; i++)
ans[i] = second[i];
Same as above: What is value and the type of size?
What is second? If it isn´t it a pointer this statement is invalid.
5.
To answer to the question title:
(How to) Copy two arrays of int to one char* in C
This isn´t possible. You can´t copy two arrays with its data to a pointer to char.
There are at least four issues with your code.
You malloc the wrong size, you want to use sizeof(char).
You need to zero terminate it, so you need to add extra room for the terminating zero
char* ans = (char*)malloc((size * 2+2) * sizeof(char));
second[size * 2+1] = 0;
Also the indexing of the second loop is wrong. You are accessing second array out of bounds. Make the loop more like the first.
We also need to convert the integer value to a char in the loops.
for (int i = 0; i < size; i++)
ans[size+i+1] = second[i] + '0';

Swapping elements of char array in C

I have this code:
char *sort(char *string){ //shell-sort
int lnght = length(string) - 1; // length is my own function
int gap = lnght / 2;
while (gap > 0)
{
for (int i = 0; i < lnght; i++)
{
int j = i + gap;
int tmp =(int)string[j];
while (j >= gap && tmp > (int)string[j - gap])
{
string[j] = string[j - gap]; // code fails here
j -= gap;
}
string[j] = (char)tmp; // and here as well
}
if (gap == 2){
gap = 1;
}
else{
gap /= 2.2;
}
}
return string;
}
The code should sort (shell-sort) the characters in the string, given the ordinal value (ASCII value). Even though the code is pretty simple, it still fails at lines I've commented - segmentation fault. I've spent plenty of time with this code and still can't find the problem.
As you say in comment , you call our function like this -
char *str = "test string";
sort(str);
String literal is in read-only memory and creates a pointer str to that, thus it cannot be modified , and your function modifies it . Therefore ,it can result in segmentation fault .
Declare like this -
char str[] = "test string";
In situations like this look at your statements not so much as executable code, but as mathematical boundary conditions. I've replaced the monstrous name lnght with length for readability purposes.
Here are the relevant conditions that affect the value of j when entering the while loop, relative to the length.
i < length;
gap = length / 2;
j = i + gap;
Now we plug in a value. Consider the case where length == 10. Then presumably the maximum index in your array is 9 which is also the highest value that i can take on.
Then we also have that gap == 5 and so after entering the while loop j == i + gap == 9 + 5. Clearly 9 + 5 > 10. The rest is left as an exercise to the programmer.
How do you test your function? With a static string (i.e. char *buffer = "test string";) ?
Because on first loop at least j and j-gap should be inside the string boundaries. So if you get a segfault I guess it is because of a bad string (statics can't be modified).
Replacing length() by strlen() and calling it with a well-created test string lead me to a valid result:
"adgfbce" → "gfedcba"

In-place run length decoding?

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place.
The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].
This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.
Also, one cannot assume that the decoded string is always larger than the encoded string.
Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?
And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555
If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.
It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.
A3B1C2D1E1
becomes
A3BC2DE
Here is some code, in C++, to remove the 1s from the string (O(n) complexity).
// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
assert(j <= i); // optional check
if(str[i] != '1') {
str[j] = str[i];
++ j;
}
++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string
Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.
(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).
Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.
During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.
The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.
High level overview:
Determine the encoded length.
Determine the decoded length by reading all the numbers and summing them up.
End of buffer is MAX(decoded length, encoded length).
Decode the string by starting from the end of the string. Write from the end of the buffer.
Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.
int isDigit (char c) {
return '0' <= c && c <= '9';
}
unsigned int toDigit (char c) {
return c - '0';
}
unsigned int intLen (char * str) {
unsigned int n = 0;
while (isDigit(*str++)) {
++n;
}
return n;
}
unsigned int forwardParseInt (char ** pStr) {
unsigned int n = 0;
char * pChar = *pStr;
while (isDigit(*pChar)) {
n = 10 * n + toDigit(*pChar);
++pChar;
}
*pStr = pChar;
return n;
}
unsigned int backwardParseInt (char ** pStr, char * beginStr) {
unsigned int len, n;
char * pChar = *pStr;
while (pChar != beginStr && isDigit(*pChar)) {
--pChar;
}
++pChar;
len = intLen(pChar);
n = forwardParseInt(&pChar);
*pStr = pChar - 1 - len;
return n;
}
unsigned int encodedSize (char * encoded) {
int encodedLen = 0;
while (*encoded++ != '\0') {
++encodedLen;
}
return encodedLen;
}
unsigned int decodedSize (char * encoded) {
int decodedLen = 0;
while (*encoded++ != '\0') {
decodedLen += forwardParseInt(&encoded);
}
return decodedLen;
}
void shift (char * str, int n) {
do {
str[n] = *str;
} while (*str++ != '\0');
}
unsigned int max (unsigned int x, unsigned int y) {
return x > y ? x : y;
}
void decode (char * encodedBegin) {
int shiftAmount;
unsigned int eSize = encodedSize(encodedBegin);
unsigned int dSize = decodedSize(encodedBegin);
int writeOverflowed = 0;
char * read = encodedBegin + eSize - 1;
char * write = encodedBegin + max(eSize, dSize);
*write-- = '\0';
while (read != encodedBegin) {
unsigned int i;
unsigned int n = backwardParseInt(&read, encodedBegin);
char c = *read;
for (i = 0; i < n; ++i) {
*write = c;
if (write != encodedBegin) {
write--;
}
else {
writeOverflowed = 1;
}
}
if (read != encodedBegin) {
read--;
}
}
if (!writeOverflowed) {
write++;
}
shiftAmount = encodedBegin - write;
if (write != encodedBegin) {
shift(write, shiftAmount);
}
return;
}
int main (int argc, char ** argv) {
//char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
//char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char * str = buff + 3;
//char buff[256] = { "A1B1" };
//char * str = buff;
decode(str);
return 0;
}
This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?
For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).
Then loop though, moving the characters to their new slots:
To start: A|3|B|1|C|2|||||||
Have a variable called end storing the index 5, i.e. the last, non-blank, entry.
You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).
Pseudocode for the move:
var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.
for(i = end; i > cursor; i++)
{
array[i + n] = array[i];
}
This would leave you with:
A|3|A|3|B|1|C|2|||||
Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:
for(i = cursor; i < cursor + n + 1; i++)
{
array[i] = array[cursor - 1];
}
// increment the cursor afterwards!
cursor += n + 1;
Giving:
A|A|A|A|B|1|C|2|||||
Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.
Another O(n^2) solution follows.
Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.
while ( there is an expandable element ):
expand that element
adjust (shift) all of the elements on the right side of the expanded element
Where:
Free space size is the number of empty elements left in the array.
An expandable element is an element that:
expanded size - encoded size <= free space size
The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least
one element that can be expanded (easy to prove).

subtract 2 numbers using char arrays

I wanted to subtract two char arrays which have numeric values. I am doing it because I want to subtract big numbers. When I compile this program,it does not show any errors but in the execution it crashes.
I tried to do as following pseudo code
foreach character(right2left)
difference=n1[i]-n2[i]//here suppose they are integers
if(difference<0)
{
n1[i-1]--;
difference+=10;
}
result[i]=diff;
I wrote pseudo code for clarity.
int subtract(char *n1,char *n2,int n1Len,int n2Len){
int diff;
int max=n1Len;
char* res = (char*)malloc (max+2);
memset(res, '0', max +1);
res[max] = '\0';
int i=n1Len - 1, j = n2Len - 1, k = max;
for (; i >= 0 && j >=0; --i, --j, --k) {
if(i >= 0 && j>=0)
{
diff=(n1[i]-'0') - (n2[i]-'0') ;
if(diff<0)
{
int temp=n1[i-1]-'0';
temp=temp-1;
n1[i-1]=temp+'0';
diff+=10;
}
res[i]=diff+'0';
}
else
res[i]=n1[i];
}
return atoi(res);
}
int main(void) {
int t=subtract("55","38",2,2);
printf("%d\n", t);
}
There are a few visible mistakes. Hopefully these will provide you with some pointers:
You are passing string literals to the function & trying to modify them in the function. That is not valid and will most likely cause segmentation fault. Instead of int t=subtract("55","38",2,2); Maybe you can try:
char a[] = "55";
char b[] = "38";
int t=subtract(a,b,strlen(a), strlen(b));
max should be n1Len+1 to accommodate terminating NUL character in res char array. You can set it to 0 rather than '0' when initializing. res[max] = '\0'; invokes undefined behavior as you access out of bound element, get rid of it. So use memset(res,0,max) instead. Or use calloc instead of malloc+memset as suggested by #pmg.
Don't typecast return value of malloc or calloc when coding in C
for (; i >= 0 || j >=0; --i, --j, --k) should actually be for (; i >= 0 && j >=0; --i, --j, --k) as neither i nor j should be 0. You need to work on the function logic wherein i!=j.
diff=n1[i]-'0'+n2[i]-'0' should be diff=(n1[i]-'0') - (n2[i]-'0') as you are subtracting and not adding the digits
res[i]=diff is incorrect as you are setting the integer result as character value. Change it to res[i]=diff+'0' to set the character value
Hopefully this will get you started.
Hope this helps!
char* res = (char*)malloc (max);
memset(res, '0', max-1); // set the result to all zeros
res[max] = '\0';
Let's say max is 3.
You set res[0], and res[1] to 0. Then you set the inexistent res[3] to 0.
res[2] is still uninitialized.
Try calloc instead, and don't forget space for the zero string terminator :)
Also, casting the return value from malloc (or calloc) is, at best, redundant and may hide an error the compiler would have caught if the cast wasn't there.
char *res = calloc(max + 1, 1); // allocate and initialize to 0
This
diff=n1[i]-'0'+n2[i]-'0';
should be the difference
diff = (n1[i] - '0') - (n2[j] - '0');
(besides subtracting and not adding, the index for n2 ought to be j, I think). With adding, you can get non-digit characters in the result, and atoi() stops at the first of them, if that's the very first, it returns 0.
Also, you should check that n2 is indeed not longer than n1, or you'll write out of bounds.
diff=n1[i]-'0'+n2[i]-'0';
this does not give the difference.It should be
diff = (n1[i] - '0') - (n2[j] - '0');

Resources