Array of separators for strtok() function - c

I want to divide my text into words. Separator is any symbol except latin letters.
Here i have loop, filling my separators array:
for(i = 0; i <= 127; i ++) {
if(!isalpha(i)) {
separators = (char*) realloc(separators, (length + 1) * sizeof(char));
separators[length] = i;
length ++;
}
}
Then i use it here:
char text[] = "hello world!";
char** words = NULL;
char* p = strtok(text, separators);
int cnt = 0;
while(p != NULL) {
words = (char**) realloc(words, (cnt + 1) * sizeof(char*));
words[cnt] = strdup(p);
cnt ++;
p = strtok(NULL, separators);
}
for(i = 0; i < pnt; i ++) {
printf(" - %d %s\n", i + 1, words[i]);
}
As a result a have:
-1 hello world!
If separators array is replaced by " " is works well.
What's the problem with array?

The first value of i in your loop, 0, is not alpha; so a 0 will be stored as the very first byte in the separator array.
strtok() expects to receive the separator list as a string, and strings in C are terminated by a zero. So strtok() receives a sequence beginning with a terminator, and it thinks is an empty list, with no separators at all.
You can start the array from 1 to get rid of that interfering zero:
for (i = 1; i <= 127; i ++) {
if(!isalpha(i)) {
separators = (char*) realloc(separators, (length + 1) * sizeof(char));
separators[length] = i;
length ++;
}
}
// then you also need to terminate it, otherwise strtok() will continue reading
// past the end of the array, with unpredictable (but very likely undesirable) results.
separators[length] = 0x0;
You might also want instead to allocate the string only once (you waste some space, but save some time);
#define MAX_SEPARATORS 128
separators = (char*) malloc(separators, MAX_SEPARATORS * sizeof(char));
for (i = 1; i < MAX_SEPARATORS; i++) {
if (!isalpha(i)) {
separators[length++] = i;
}
}
separators[length] = 0x0;

You have to remember that the strtok wants the separators as a string, complete with a string terminator character ('\0'). Unfortunately you don't have that terminator in the separators "string", so strtok will read that one beyond what you have allocated leading to undefined behavior.

Related

Why are there extra characters in my char[]?

I,m trying to shorten a char[] by a specified number, and for some reason, I've got more characters in my new char[]. Can you help me fix this?
When I tried with 1 or 2 letters, the result is this:
(the d, n, k, a are the first letters of each lines reversed)
#▬w #▬n #▬k #▬a
(the di, an, ok, la are the first two letters of each lines reversed)
#id #an #ok #la
With 3 letters, it works perfectly:
nid ran rok mla
But same problem with more than 3:
qp░nnid qp░aran qp░trok qp░amla
And with more letters than the longest line, it also works perfectly:
eynnid scnaran etrok amla
<--- These are my words backwards --->
char **read(FILE *file, int lineLength, int *pLines)
{
size_t total = 0;
size_t allocated = START;
int sor = 0;
char buffer[MAX_LENGTH];
char shortened[lineLength];
/////////
//printf("%d", sizeof(shortened));
char **lines= (char **)malloc(allocated* sizeof(char *));
while (fgets(buffer, MAX_LENGTH, file) != NULL)
{
for (int i = 0; i < lineLength; i++)
{
shortened[i] = buffer[i];
}
int length = strlen(shortened);
if (shortened[length - 1] == '\n')
{
shortened[length - 1] = '\0';
}
if (line == allocated)
{
allocated*= 2;
lines= realloc(sorok, allocated* sizeof(char *));
}
lines[line] = (char *)malloc(lineLength);
strcpy(lines[line], shortened);
line++;
}
*pLines = line;
return lines;
}
One major problem is this loop:
for (int i = 0; i < lineLength; i++)
{
shortened[i] = buffer[i];
}
If lineLength > strlen(buffer) then you will copy the null-terminator (and beyond, including data that isn't initialized by the fgets call).
But if strlen(buffer) >= lineLength you will not copy the null-terminator. Then you use the strlen function on shortened which will then go beyond the end of shortened and you will have undefined behavior.
And for a better way to remove the newline (which you need to do for buffer and not shortened) see Removing trailing newline character from fgets() input

How to copy a malloced string into another string in C

I have this function that randomly generates a string
char *generate_random_string(int seed) {
if (seed != 0) {
srand(seed);
}
char *alpha_num_str =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789";
char *random_str = malloc(RAND_STR_LEN);
for (int i = 0; i < RAND_STR_LEN; i++) {
random_str[i] = alpha_num_str[rand() % (strlen(alpha_num_str) - 1)];
}
return random_str;
}
I want to copy the return value of this function if I give it a seed of '1' into a string called initialisation_vector and this is currently how I am doing it:
char initialisation_vector[RAND_STR_LEN + 1] = {0};
strcpy(initialisation_vector, generate_random_string(1));
However, when I run the code I get a malloc buffer overflow error from the strcpy line. What am I doing wrong, how do I allocate enough memory for this?
The main issue:
Strings is C are expected to be null/zero terminated. This means that an n characters string should be allocated with n+1 bytes and the last one should be '\0'.
The problem in your code is that random_str is allocated to contain exactly RAND_STR_LEN characters and therefore misses the null termination.
You should change:
char *random_str = malloc(RAND_STR_LEN);
for (int i = 0; i < RAND_STR_LEN; i++) {
random_str[i] = alpha_num_str[rand() % (strlen(alpha_num_str) - 1)];
}
To:
/*-------------------------------------vvv--*/
char *random_str = malloc(RAND_STR_LEN + 1);
for (int i = 0; i < RAND_STR_LEN; i++) {
random_str[i] = alpha_num_str[rand() % (strlen(alpha_num_str) - 1)];
}
random_str[RAND_STR_LEN] = '\0'; /* add zero termination */
Note:
If you want to select random characters from all the available ones in alpha_num_str, you should use: rand() % (strlen(alpha_num_str)) (without the -1). This is because x % n will return values of 0..n-1.
Also for efficiency reasons you can calculate it once before the loop and store in a variable.

padding string by adding char to the string

I am trying to create a padding function that adds an underscore to a string. The string length should be 16, and if the string is less than 16 the function should add an underscore until the string length is 16, then return the padding string. If the string is more than 16, the padding function should ignore the characters and returned the first 16 characters of the string.
char* padding(char* plaintext) {
char ch = '_';
size_t len = strlen(plaintext);
size_t lench = 17 - len;
char *p_text = malloc(len + 1 + 1);
strcpy(p_text, plaintext);
if (len < 17) {
int i;
for (i = lench; i < 16; i++) {
p_text[i] = ch;
}
}
// p_text[17] = '\0';
return p_text;
}
int main() {
char *key = "0123456789ABCDEF";
while (1) {
char plaintext[WIDTH + 1];
printf("Enter a string: ");
fgets(plaintext, WIDTH + 1, stdin);
if (plaintext[0] == '\n' || plaintext[0] == '\r')
break;
char* padded_plaintext = padding(plaintext);
printf("padded plaintext = %s\n", padded_plaintext);
printf("\n");
}
return 0;
}
this code returns a weird result.
Consider a clean solution to this problem. Hopefully seeing this (and the accompanying explanation) helps.
char *padded_string(char *src, int width, char ch) {
char *dest = calloc(1, width + 1);
strncpy(dest, src, width);
for (int i = 0; i < width; i++) {
if (!dest[i]) {
dest[i] = ch;
}
}
return dest;
}
We provide ourselves a clean slate to work on by allocating width + 1 bytes using calloc. Using calloc will ensure all bytes are set to 0. When working with strings in C, they need to be null-terminated, so this is very useful.
We copy the contents of src into dest. Using strncpy ensures we don't get a buffer overflow if the source string is longer than the string we want to end up with.
Next we loop from 0, width times. If the character in the destination string at i is a '\0' we'll insert the padding character at that index.
Now, because we used calloc and we allocated an extra byte beyond the character length we needed, the string is already null-terminated, we can simply return its pointer.
The space that you allocate for p_text should not depend on the input. It should always be 17. If len + 1 + 1 < 16, then accessing p_text[i] will lead to undefined behavior for certain values of i that you are using. You should replace:
char *p_text = malloc(len + 1 + 1);
with
char *p_text = malloc(17);
and check that p_text is not NULL before you write to it.
Also, the commented out //p_text[17] = '\0'; is wrong. That should be
p_text[16] = '\0';

Trying to delete a specific character from a string in C?

I'm trying to delete a specific character (?) from the end of a string and return a pointer to a string, but it's not removing it at all at the moment. What am I doing wrong? Is there a better way to go about it?
char * word_copy = malloc(strlen(word)+1);
strcpy(word_copy, word);
int length = strlen(word_copy);
int i = 0;
int j = 0;
for (i = 0; word_copy[i] != '\0'; i++) {
if (word_copy[length - 1] == '?' && i == length - 1){
break;
}
}
for (int j = i; word_copy[j] != '\0'; j++) {
word_copy[j] = word_copy[j+1];
}
word = strdup(word_copy);
I'm immediately seeing a couple of problems.
The first for loop does nothing. It doesn't actually depend on i so it could be replaced with a single if statement.
if (word_copy[length - 1] == '?') {
i = length - 1;
} else {
i = length + 1;
}
The second for loop also acts as an if statement since it starts at the end of the string and can only ever run 0 or 1 times.
You could instead do something like this to remove the ?. This code will return a new malloced string with the last character removed if its ?.
char *remove_question_mark(char *word) {
unsigned int length = strlen(word);
if (length == 0) {
return calloc(1, 1);
}
if (word[length - 1] == '?') {
char *word_copy = malloc(length);
// Copy up to '?' and put null terminator
memcpy(word_copy, word, length - 1);
word_copy[length - 1] = 0;
return word_copy;
}
char *word_copy = malloc(length + 1);
memcpy(word_copy, word, length + 1);
return word_copy;
}
Or if you are feeling lazy, you could also just make the last character the new null terminator instead. Its essentially creates a memory leak of 1 byte, but that may be an acceptable loss. It should also be a fair bit faster since it doesn't need to allocate any new memory or copy the previous string.
unsigned int length = strlen(word);
if (length > 0 && word[length - 1] == '?') {
word[length] = 0;
}

Reversing an array in C

I am new to cpp and have a question regarding arrays. The code I have below should create a reversed version of str and have it be stored in newStr. However, newStr always comes up empty. Can someone explain to me why this is happening even though I am assigning a value from str into it?
void reverse (char* str) {
char* newStr = (char*)malloc(sizeof(str));
for (int i=0;i<sizeof(str)/sizeof(char);i++) {
int index = sizeof(str)/sizeof(char)-1-i;
newStr [i] = str [index];
}
}
PS: I know that it is much more efficient to reverse an array by moving the pointer or by using the std::reverse function but I am interested in why the above code does not work.
As above commenters pointed out sizeof(str) does not tell you the length of the string. You should use size_t len = strlen(str);
void reverse (char* str) {
size_t len = strlen(str);
char* newStr = (char*)malloc(len + 1);
for (int i=0; i<len;i++) {
int index = len-1-i;
newStr[i] = str[index];
}
newStr[len] = '\0'; // Add terminator to the new string.
}
Don't forget to free any memory you malloc. I assume your function is going to return your new string?
Edit: +1 on the length to make room for the terminator.
The sizeof operator (it is not a function!) is evaluated at compile time. You are passing it a pointer to a region of memory that you claim holds a string. However, the length of this string isn't fixed at compile time. sizeof(str)/sizeof(char) will always yield the size of a pointer on your architecture, probably 8 or 4.
What you want is to use strlen to determine the length of your string.
Alternatively, a more idiomatic way of doing this would be to use std::string (if you insist of reversing the string yourself)
std::string reverse(std::string str) {
for (std::string::size_type i = 0, j = str.size(); i+1 < j--; ++i) {
char const swap = str[i];
str[i] = str[j];
str[j] = swap;
}
return str;
}
Note that due to implicit conversion (see overload (5)), you can also call this function with your plain C-style char pointer.
There are two issues here:
The sizeof operator won't give you the length of the string. Rather, it gives you the size of a char* on the machine you are using. You can use strlen() instead to get the
A c-string is terminated by a NULL character (which is why strlen() can return the correct length of the string). You need to make sure you are not accidentally copying the NULL character from your source string to the beginning of your destination string. Also, you need to add a NULL character at the end of your destination string or you will get some unexpected output.
#include <bits/stdc++.h>
using namespace std;
vector<string> split_string(string);
// Complete the reverseArray function below.
vector<int> reverseArray(vector<int> a) {
return {a.rbegin(), a.rend()};
}
int main()
{
ofstream fout(getenv("OUTPUT_PATH"));
int arr_count;
cin >> arr_count;
cin.ignore(numeric_limits<streamsize>::max(), '\n');
string arr_temp_temp;
getline(cin, arr_temp_temp);
vector<string> arr_temp = split_string(arr_temp_temp);
vector<int> arr(arr_count);
for (int i = 0; i < arr_count; i++) {
int arr_item = stoi(arr_temp[i]);
arr[i] = arr_item;
}
vector<int> res = reverseArray(arr);
for (int i = 0; i < res.size(); i++) {
fout << res[i];
if (i != res.size() - 1) {
fout << " ";
}
}
fout << "\n";
fout.close();
return 0;
}
vector<string> split_string(string input_string) {
string::iterator new_end = unique(input_string.begin(), input_string.end(), [] (const char &x, const char &y) {
return x == y and x == ' ';
});
input_string.erase(new_end, input_string.end());
while (input_string[input_string.length() - 1] == ' ') {
input_string.pop_back();
}
vector<string> splits;
char delimiter = ' ';
size_t i = 0;
size_t pos = input_string.find(delimiter);
while (pos != string::npos) {
splits.push_back(input_string.substr(i, pos - i));
i = pos + 1;
pos = input_string.find(delimiter, i);
}
splits.push_back(input_string.substr(i, min(pos, input_string.length()) - i + 1));
return splits;
}

Resources