Why am I getting strange characters leading my substring? - c

I am trying to create a function in C that returns a substring of any given char array. It is giving me most of the correct output, but I am also getting some strange characters at the start of the string. For instance, when given the input "abcdefg" and asked for chars 3 to 5, it gives me "ÀvWdef". So it appears I am getting the g cut off properly, but it is just doing something strange to the first 3 chars instead of ignoring them. My code is included below:
typedef char * String;
String subStr(String src, int start, int end){
if ((start > (strLen(src)-1)) || start < 0 || end < start ||
(end > (strLen(src) - 1)) ){
return NULL;
}
int s = start, e = end;
String r = (String)malloc((e - s + 1) * sizeof(char));
while(s <= e){
r[s] = src[s];
s++;
}
r[s] = '\0';
return r;
}

You have a logic error in:
while(s <= e){
r[s] = src[s];
s++;
}
The index for src is correct. But the index for r is not. You need another index.
i = 0;
while(s <= e){
r[i] = src[s];
s++;
i++
}
After the loop, instead of:
r[s] = '\0';
use:
r[i] = '\0';
You also have logic error in the number of characters to allocate using malloc.
Given s = 3 and e = 5, you are returning a string that consists of src[3], src[4], src[5]. That is 3 characters. When you add the terminating null character, you need 4 characters. (e - s + 1) gives you only 3. Use (e - s + 2) instead.

When you allocate a block of memory with malloc it can be filled with anything so you should write zero to it initially.
bzero(r,e-s+1);
furthermore you want to start copying to r from index 0 not index s. You should not be using s for both.
Edit corrected < to <=
for(i =0; i <= e-s;i++)
r[i] = src[s + i];
r[i] = '\0';

I think the problem might be here:
r[s] = src[s];
Change it to
r[i++] = src[s];
where i keeps track of index of string r, and is initialised to 0.

Related

C-Lang: Segmentation Fault when working with string in for-loop

Quite recently, at the university, we began to study strings in the C programming language, and as a homework, I was given the task of writing a program to remove extra words.
While writing a program, I faced an issue with iteration through a string that I could solve in a hacky way. However, I would like to deal with the problem with your help, since I cannot find the error myself.
The problem is that when I use the strlen(buffer) function as a for-loop condition, the code compiles easily and there are no errors at runtime, although when I use the __act_buffer_len variable, which is assigned a value of strlen(buffer) there will be a segmentation fault at runtime.
I tried many more ways to solve this problem, but the only one, which I already described, worked for me.
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
int __act_buffer_len = strlen(buffer);
// for debugging purposes
printf("__actbuff: %d\n", __act_buffer_len);
printf("sizeof: %d\n", sizeof(buffer));
printf("strlen: %d\n", strlen(buffer));
char* _newbuff = malloc(__act_buffer_len + 1); // <- new buffer without words with less than 2 unique words
char* _tempbuff; // <- used to store current word
int beg_point = 0;
int curr_wlen = 0;
for (int i = 0; i < strlen(buffer); i++) // no errors at runtime, app runs well
// for (int i = 0; i < __act_buffer_len; i++) // <- segmentation fault when loop is reaching a space character
// for (int i = 0; buffer[i] != '\0'; i++) // <- also segmentation fault at the same spot
// for (size_t i = 0; i < strlen(buffer); i++) // <- even this gives a segmentation fault which is totally confusing for me
{
printf("strlen in loop %d\n", i);
if (buffer[i] == delim)
{
char* __cpy;
memcpy(__cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t __letters = __get_letters(__cpy, curr_wlen); // <- will return number of unique letters in word
if (__letters > 2) // <- will remove all the words with less than 2 unique letters
{
strcat(_newbuff, __cpy);
strcat(_newbuff, " ");
}
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
}
else curr_wlen++;
}
return _newbuff;
}
In short, the code above just finds delimiter character in string and counts the number of unique letters of the word before this delimiter.
My fault was in not initializing a __cpy variable.
Also, as #n.1.8e9-where's-my-sharem. stated, I shouldn't name vars with two underscores.
The final code:
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
size_t _act_buffer_len = strlen(buffer);
char* _newbuff = malloc(_act_buffer_len); // <- new buffer without words with less than 2 unique words
int beg_point = 0;
int curr_wlen = 0;
for (size_t i = 0; i < _act_buffer_len; i++)
{
if (buffer[i] == delim)
{
char* _cpy = malloc(curr_wlen);
memcpy(_cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t _letters = _get_letters(_cpy, curr_wlen); // <- will return number of unique letters in word
if (_letters > 2) // <- will remove all the words with less than 2 unique letters
strcat(_newbuff, _cpy);
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
free(_cpy);
}
else curr_wlen++;
}
return _newbuff;
}
Thanks for helping me

Why will my function not loop as expected?

Below is a code sample for a function that removes similar adjacent characters in a string. For a given arguement like remove_adjacent("azyyzb"), the expected output should be ab, since it should first remove adjacent characters yy to get substring azzb. It should then remove adjacent characters zz in the substring, to finally get substring ab. However, my function only handles the first adjacent characters, and fails to handle the second part. It could be an indication that it is not looping as expected. Could someone point out what the problem might be?
public static void remove_adjacent(String str1)
{
int i = 0;
do
{
int j = i + 1;
if (str1.charAt(i) == str1.charAt(j))
str1 = str1.substring(0, i) + str1.substring(j + 1, str1.length());
i++;
}
while (i < str1.length() - 1);
System.out.println(str1);
}
Because your i is already pointing at index 2 when you remove yy. But your first z is at index 1, which won't be checked anymore.
When you remove a part of the string, you will need to go one position backwards in your string to check, if a new pair of equal characters was created. And you must only step forward, if nothing was removed in the current iteration.
public static void remove_adjacent(String str1)
{
int i = 0;
do
{
int j = i + 1;
if (str1.charAt(i) == str1.charAt(j)) {
str1 = str1.substring(0, i) + str1.substring(j + 1, str1.length());
if (i > 0) i--;
} else {
i++;
}
}
while (i < str1.length() - 1);
System.out.println(str1);
}

Count the occurrences of every word in C

I want to count the occurrences of every word of this small text " A broken heart of a broken mind ."
Every word of this text is in 2d array[100][20] in which 100 is the max_words and 20 is the max_word_length. And I have a pointers array[100] in which every pointer points the word. I can't find a clever way to count the same words,
for example
a: 2 times
broken: 2 times
heart: 1 time
mind: 1 time
. : 1 time
These would be the pointers and the words array:
POINTERS ARRAY WORDS ARRAY
point0(points "a") a
point1(points "broken") broken
point2(points "heart") heart
point3(points "of") of
point4 (points "a") mind
point5(points "broken") .
point6(points "mind") \0\0\0\0\0
point7(points ".") \0\0\0\0\0
NULL ..
NULL
..
NULL \0\0\0\0\0
Side note: Every word is lowercase.
void frequence_word(char *pointers[], int frequence_array[]) {
int word = 0;
int i;
int count = 1;
int check[MAX_WORDS];
for (word = 0; word < MAX_WORDS; word++) {
check[word] = -1;
}
for (word = 0; word < MAX_WORDS; word++) {
count = 1;
for (i = word + 1; i < MAX_WORDS; i++) {
if (pointers[word + 1] != NULL
&& strcmp(pointers[word], pointers[i]) == 0) {
count++;
check[i] = 0;
}
}
if (check[word] != 0) {
check[word] = count;
}
}
}
Any ideas please?
This seems like a use case for strstr. You can call strstr, then iteratively reassign to the original string until NULL is reached.
const char substring[] = "A broken heart of a broken mind";
const char* total = ...;
const char* result;
long count = 0;
while (result = strstr(total, substring)) {
count++;
total += (sizeof(substring) - 1);
}
I think this is mostly self-explanetory, but I will explain this line:
total += (sizeof(substring) - 1);
It takes advantage of the fact that sizeof on arrays returns the array length. Thus, sizeof on a character array returns the number of characters in it. We subtract one to ignore the null terminator.

Trying to delete a specific character from a string in C?

I'm trying to delete a specific character (?) from the end of a string and return a pointer to a string, but it's not removing it at all at the moment. What am I doing wrong? Is there a better way to go about it?
char * word_copy = malloc(strlen(word)+1);
strcpy(word_copy, word);
int length = strlen(word_copy);
int i = 0;
int j = 0;
for (i = 0; word_copy[i] != '\0'; i++) {
if (word_copy[length - 1] == '?' && i == length - 1){
break;
}
}
for (int j = i; word_copy[j] != '\0'; j++) {
word_copy[j] = word_copy[j+1];
}
word = strdup(word_copy);
I'm immediately seeing a couple of problems.
The first for loop does nothing. It doesn't actually depend on i so it could be replaced with a single if statement.
if (word_copy[length - 1] == '?') {
i = length - 1;
} else {
i = length + 1;
}
The second for loop also acts as an if statement since it starts at the end of the string and can only ever run 0 or 1 times.
You could instead do something like this to remove the ?. This code will return a new malloced string with the last character removed if its ?.
char *remove_question_mark(char *word) {
unsigned int length = strlen(word);
if (length == 0) {
return calloc(1, 1);
}
if (word[length - 1] == '?') {
char *word_copy = malloc(length);
// Copy up to '?' and put null terminator
memcpy(word_copy, word, length - 1);
word_copy[length - 1] = 0;
return word_copy;
}
char *word_copy = malloc(length + 1);
memcpy(word_copy, word, length + 1);
return word_copy;
}
Or if you are feeling lazy, you could also just make the last character the new null terminator instead. Its essentially creates a memory leak of 1 byte, but that may be an acceptable loss. It should also be a fair bit faster since it doesn't need to allocate any new memory or copy the previous string.
unsigned int length = strlen(word);
if (length > 0 && word[length - 1] == '?') {
word[length] = 0;
}

Swapping elements of char array in C

I have this code:
char *sort(char *string){ //shell-sort
int lnght = length(string) - 1; // length is my own function
int gap = lnght / 2;
while (gap > 0)
{
for (int i = 0; i < lnght; i++)
{
int j = i + gap;
int tmp =(int)string[j];
while (j >= gap && tmp > (int)string[j - gap])
{
string[j] = string[j - gap]; // code fails here
j -= gap;
}
string[j] = (char)tmp; // and here as well
}
if (gap == 2){
gap = 1;
}
else{
gap /= 2.2;
}
}
return string;
}
The code should sort (shell-sort) the characters in the string, given the ordinal value (ASCII value). Even though the code is pretty simple, it still fails at lines I've commented - segmentation fault. I've spent plenty of time with this code and still can't find the problem.
As you say in comment , you call our function like this -
char *str = "test string";
sort(str);
String literal is in read-only memory and creates a pointer str to that, thus it cannot be modified , and your function modifies it . Therefore ,it can result in segmentation fault .
Declare like this -
char str[] = "test string";
In situations like this look at your statements not so much as executable code, but as mathematical boundary conditions. I've replaced the monstrous name lnght with length for readability purposes.
Here are the relevant conditions that affect the value of j when entering the while loop, relative to the length.
i < length;
gap = length / 2;
j = i + gap;
Now we plug in a value. Consider the case where length == 10. Then presumably the maximum index in your array is 9 which is also the highest value that i can take on.
Then we also have that gap == 5 and so after entering the while loop j == i + gap == 9 + 5. Clearly 9 + 5 > 10. The rest is left as an exercise to the programmer.
How do you test your function? With a static string (i.e. char *buffer = "test string";) ?
Because on first loop at least j and j-gap should be inside the string boundaries. So if you get a segfault I guess it is because of a bad string (statics can't be modified).
Replacing length() by strlen() and calling it with a well-created test string lead me to a valid result:
"adgfbce" → "gfedcba"

Resources