C : using strlen for string including \0

C : using strlen for string including \0 - c

What I need to do is when given a text or string like
\0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer
I need to sort this string using qsort and based on the rot encoding comparison.
int my_rot_conv(int c) {
if ('a' <= tolower(c) && tolower(c) <= 'z')
return tolower(c)+13 <= 'z' ? c+13 : c-13;
return c;
}
int my_rot_comparison(const void *a, const void *b) {
char* ia = (char*) a;
char* ib = (char*) b;
int i=0;
ia++, ib++;
while (i<strlen(ia)) {
if (ia[i] == '\0' || ia[i] == '\n' || ia[i] == '\t' || ib[i] == '\0' || ib[i] == '\n' || ib[i] == '\t') {
i++;
}
if (my_rot_conv(ia[i]) > my_rot_conv(ib[i])) {
return 1;
} else if (my_rot_conv(ia[i]) < my_rot_conv(ib[i]))
return -1;
}
return 0;
}
I get to the point that I compare two string that starts with \0, getting the -1 in the following example.
printf("%d \n", my_rot_comparison("\0Abbsdf\n", "\0Csdf\n"));
But this wouldn't work for a string with qsort because ia++, ib++; does work only for one word comparison.
char *my_arr;
my_arr = malloc(sizeof(\0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer));
strcpy(my_arr, \0abc\n\0Def\n\0Heel\n\0Jijer\n\tlkjer);
qsort(my_arr, sizeof(my_arr), sizeof(char), my_rot_comparison);
and the array should be sorted like \0Def\n\0Heel\n\0Jijer\n\0\n\tlkjer
My question is how do I define the comparison function that works for the string that includes \0 and \t and \n characters?

strlen simply cannot operate properly on a string which embeds \0 bytes, since by definition of the function strlen considers the end of the string to be the first encountered \0 byte at or after the beginning of the string.
The rest of the standard C string functions are defined in the same way.
This means that you have to use a different set of functions to manipulate string(-like) data that can include \0 bytes. You will perhaps have to write these functions yourself.
Note that you will probably have to define a structure which has a length member in it, since you won't be able to rely on a particular sentinel byte (such as \0) to mark the end of the string. For example:
typedef struct {
unsigned int length;
char bytes[];
}
MyString;
If there is some other byte (other than \0) which is forbidden in your input strings, then (per commenter #Sinn) you can swap it and \0, and then use normal C string functions. However, it is not clear whether this would work for you.

assuming you use an extra \0 at the end to terminate
int strlenzz(char*s)
{
int length =0;
while(!(*s==0 && *(s+1) == 0))
{
s++;
length++;
}
return length+1
}

Personally I'd prefer something like danfuzz's suggestion, but for the sake of listing an alternative...
You could use an escaping convention, writing functions to:
"escape" / encode, expanding embedded (but not the terminating) '\0'/NUL to say '\' and '0' (adopting the convention used when writing C source code string literals), and
another to unescape.
That way you can still pass them around as C strings, your qsort/rot comparison code above will work as is, but you should be very conscious that strlen(escaped_value) will return the number of bytes in the escaped representation, which won't equal the number of bytes in the unescaped value when that value embeds NULs.
For example, something like:
void unescape(char* p)
{
char* escaped_p = p;
for ( ; *escaped_p; ++escaped_p)
{
if (*escaped_p == '\\')
if (*++escaped_p == '0')
{
*p++ = '\0';
continue;
}
*p++ = *escaped_p;
}
*escaped_p = '\0'; // terminate
}
Escaping is trickier, as you need some way to ensure you have enough memory in the buffer, or to malloc a new buffer - either of the logical size of the unescaped_value * 2 + 1 length as an easy-to-calculate worst-case size, or by counting the NULs needing escaping and sizing tightly to logical-size + #NULs + 1....

Related

Cannot assign chars from one string to another

So I have a function that takes a string, and strips out special format characters, and assigns it to another string for later processing.
A sample call would be:
act_new("$t does $d");
It should strip out the $t and the $d and leave the second string as " does ", but its not assigning anything. I am getting back into programming after quite a few years of inactivity, and this is someone elses code (A MUD codebase, Rom), but I feel like I am missing something fundamental with pointer assignments. Any tips?
(This is truncated code, the rest has no operations on str or point until much later)
void act_new(const char *format)
{
const char *str;
char *point;
str = format;
while ( *str != '\0' ) {
if ( *str != '$' ) {
*point++ = *str++;
continue;
}
}
}

You need to increment str every time through the loop, not only when you assign to point. Otherwise you end up in an infinite loop when the character doesn't match the if condition.
You also want to skip the character after $, so you have to increment str twice when you encounter $.
The code is simpler if you use a for loop and array indexing rather than pointer arithmetic.
size_t len = strlen(format);
for (size_t i = 0; i < len; i++) {
if (format[i] == '$') {
i++; // extra increment to skip character after $
} else {
*point++ = format[i];
}
}

There are a few problems with your code, as pointed out in the comments:
point is not initialized (garbage pointer value)
continue doesn't do anything
infinite loop if a $ is encountered
When writing the function, one must also keep in mind to skip an extra character if a $ is encountered if and only if it's not the last character in the string (except for the '\0').
Since you know how many times you need to loop, a for loop is better suited and, as a bonus, you don't have to explicitly check if the character after a $ is '\0' when skipping an extra character in format (renamed src below). Also, don't forget to terminate the destination string.
This code will take care of those things for you:
void act_new(const char *src)
{
const size_t length = strlen(src);
char * const dst = (char*)malloc(sizeof(char)*(length+1));
if(dst == NULL)
// Error handling left out
return;
char *point = dst;
for(size_t i = 0; i < length; ++i)
{
if(src[i] == '$')
{
++i;
continue;
}
*point++ = src[i];
}
*point = '\0'; //Terminate string properly
printf("%s\n", dst);
free(dst);
}

void function that removes al the non alphabet chars

I am trying to write a program that gets several strings until it gets the 'Q' string (this string basically stops the scanf).
Each one of the strings is sent to a function that romoves everything except the letters. For example if I scan 'AJUYFEG78348' the printf should be 'AJUYFEG'.
The problem is that the function has to be void.
I have tried several ways to make the "new array with only letters" printed, but none of them worked.
(Is is not allowed to use strlen function)
#include <stdio.h>
void RemoveNonAlphaBetChars(char*);
int main()
{
int flag=1;
char array[100]={0};
while (flag == 1)
{
scanf("%s", &array);
if(array[0] == 'Q' && array[1] =='\0') {
flag=0;
}
while (flag == 1)
{
RemoveNonAlphaBetChars(array);
}
}
return 0;
}
void RemoveNonAlphaBetChars(char* str)
{
int i=0, j=0;
char new_string[100]={0};
for (i=0; i<100; i++)
{
if (((str[i] >= 'a') && (str[i] <= 'z')) || ((str[i] >= 'A') && (str[i] <= 'Z')))
{
new_string[j] = str[i];
j++;
}
}
printf("%s", new_string);
return;
}

The fact that the function has only one argument, non-const char pointer, hints at the fact that the string is going to be changed in the call (better document it anyway), and it's perfectly all right.
A few fixes to your code can make it right:
First, don't loop to the end of the buffer, just to the end of the string (without strlen, it's probably faster too):
for (i=0; str[i] != '\0'; i++)
then don't forget to nul-terminate the new string after your processing:
new_string[j] = '\0';
Then, in the end (where you're printing the string) copy the new string into the old string. Since it's smaller, there's no risk:
strcpy(str,new_string);
now str contains the new stripped string.
Another approach would be to work in-place (without another buffer): each time you encounter a character to remove, copy the rest of the string at this position, and repeat. It can be inefficient if there are a lot of characters to remove, but uses less memory.

The key here is that you are never inserting new characters into the string. That guarantees that the input buffer is large enough to hold the result. It also makes for an easy in-place solution, which is what the void return type is implying.
#include <ctype.h>
#include <stdio.h>
...
void RemoveNonAlphaBetChars(char* str)
{
char *from, *to;
for(from = to = str; *from; from++) {
if(isalpha(*from)) {
if(from > to) *to = *from;
to++;
}
}
*to = *from;
printf("%s\n", str);
return;
}
The pointer from steps along the string until it points to a NUL character, hence the simple condition in the loop. to only receives the value of from if it is a character. The final copy after the loop ensures NUL termination.
Update
If you are dealing with 1) particularly large strings, and 2) you have long stretches of letters with some numbers in between, and 3) your version of memmove is highly optimized compared to copying things manually (e.g. with a special processor instruction), you can do the following:
#include <stdio.h>
#include <ctype.h>
#include <string.h>
...
void RemoveNonAlphaBetChars(char* str)
{
char *from, *to, *end;
size_t len;
for(from = to = str; *from; from = end) {
for(; *from && !isalpha(*from); from++) ;
for(end = from; *end && isalpha(*end); end++) ;
len = end - from;
if(from > to) {
if(len > 1) {
memmove(to, from, len);
} else {
*to = *from;
}
}
to += len;
}
*to = *end;
printf("%s\n", str);
return;
}
The general idea is to find the limits of each range of letters (between from and end), and copy into to block by block. As I stated before though, this version should not be used for the general case. It will only give you a boost when there is a huge amount of data that meets particular conditions.

void return type is a common approach to making functions that produce C string results. You have two approaches to designing your API:
Make a non-destructive API that takes output buffer and its length, or
Make an API that changes the the string in place.
The first approach would look like this:
void RemoveNonAlphaBetChars(const char* str, char *result, size_t resultSize) {
...
}
Use result in place of new_string, and make sure you do not go past resultSize. The call would look like this:
if (flag == 1) { // if (flag == 1), not while (flag == 1)
char result[100];
RemoveNonAlphaBetChars(array, result, 100);
printf("%s\n", result);
}
If you decide to use the second approach, move printf into main, and use strcpy to copy the content of new_string back into str:
strcpy(str, new_string);

Format "%s" expects and agument of type char* etc, I just want to print the alphabet

Why can't I print the alphabet using this code?
void ft_putchar(char c)
{
write(1, &c, 1);
}
int print_alf(char *str)
{
int i;
i = 0;
while (str[i])
{
if (i >= 'A' && i <= 'Z')
ft_putchar(str[i]);
else
ft_putchar('\n');
i++;
}
return (str);
}
int main ()
{
char a[26];
printf("%s", print_alf(a));
return (0);
}
I get this warning
format ' %s ' expects type 'char*' but argument 2 has type 'int'
How do I print the alphabet using a string, and write function?

Your entire print_alf function looks suspicious.
You are returning str which is of type char *. Therefore the return type of print_alf should to be char * instead of int.
Your while (str[i]) loop makes no sense at all since you are passing uninitialized memory to it. So your code will very likely corrupt the memory since the while loop will continue to run until a '\0' is found within the memory which does not need to be the case within the boundaries of the passed memory (a).
You are not adding a zero termination character ('\0') at the end of the string. This will result in printf("%s", print_alf(a)); printing as many characters beginning at the address of a until a '\0' is found within the memory.
Here is a suggestion how to fix all that problems:
char *print_alf(char *str, size_t len)
{
char letter;
if ((str) && (len >= 27)) // is str a valid pointer and length is big enough?
{
for (letter = 'A'; letter <= 'Z'; letter++) // iterate all characters of the alphabet
{
*str = letter;
str++;
}
*str = '\0'; // add zero termination!!!
}
else
{
str = NULL; // indicate an error!
}
return (str);
}
int main()
{
char a[26 + 1]; // ensure '\0' fits into buffer!
printf("%s", print_alf(a, sizeof(a)));
return (0);
}

Make up your mind whether print_alf should return a string which you then print with printf or whether print_alf should be a void function that does the printing, which you should then just call without printf. At the moment, your code tries to be a mixture of both.
The easiest way is to just print the alphabet:
void print_alf(void)
{
int c;
for (c = 'A'; c <= 'Z'; c++) putchar(c);
}
Call this function like so:
print_alf(); // print whole alphabet to terminal
A more complicated variant is to fill a string with the alphabet and then print that string. That's what you tried to achieve, I think. In that case, you must pass a sufficiently big buffer to the function and return it. Note that if you want to use the string functions and features of the standard lib (of which printf("%s", ...) is one) you must null-terminate your string.
char *fill_alf(chat *str)
{
int i;
for (i = 0; i < 26; i++) str[] = 'A' + i;
str[26] = '\0';
return str;
}
It is okay to return the buffer that was passed into the function, but beware of cases where you return local character buffers, which will lead to undefined behaviour.
You can call it as you intended in your original code, but note that you must make your buffer at least 27 characters big to hold the 26 letters and the null terminator:
char a[27];
printf("%s\n", fill_alf(a));
Alternatively, you could do the filling and printing in twp separate steps:
char a[27];
fill_alf(a); // ignore return value, because it's 'a'
printf("%s\n", a); // print filled buffer
If you just want to print the alphabet, the print_alf variant is much simpler and straightforward. If you want to operate further on the alphabet, eg do a shuffle, consider using fill_alf.

Your print_alf(char *str) function actually returns an integer which causes the error (it is defined to return int). When you specify %s to printf it expects characters, not numbers.
You can fix this by changing the return type of your function to char and if everything else works in your code you'll be good to go.

How does C know the end of my string?

I have a program in which I wanted to remove the spaces from a string. I wanted to find an elegant way to do so, so I found the following (I've changed it a little so it could be better readable) code in a forum:
char* line_remove_spaces (char* line)
{
char *non_spaced = line;
int i;
int j = 0;
for (i = 0; i <= strlen(line); i++)
{
if ( line[i] != ' ' )
{
non_spaced[j] = line[i];
j++;
}
}
return non_spaced;
}
As you can see, the function takes a string and, using the same allocated memory space, selects only the non-spaced characters. It works!
Anyway, according to Wikipedia, a string in C is a "Null-terminated string". I always thought this way and everything was good. But the problem is: we put no "null-character" in the end of the non_spaced string. And somehow the compiler knows that it ends at the last character changed by the "non_spaced" string. How does it know?

This does not happen by magic. You have in your code:
for (i = 0; i <= strlen(line); i++)
^^
The loop index i runs till strlen(line) and at this index there is a nul character in the character array and this gets copied as well. As a result your end result has nul character at the desired index.
If you had
for (i = 0; i < strlen(line); i++)
^^
then you had to put the nul character manually as:
for (i = 0; i < strlen(line); i++)
{
if ( line[i] != ' ' )
{
non_spaced[j] = line[i];
j++;
}
}
// put nul character
line[j] = 0;

Others have answered your question already, but here is a faster, and perhaps clearer version of the same code:
void line_remove_spaces (char* line)
{
char* non_spaced = line;
while(*line != '\0')
{
if(*line != ' ')
{
*non_spaced = *line;
non_spaced++;
}
line++;
}
*non_spaced = '\0';
}

The loop uses <= strlen so you will copy the null terminator as well (which is at i == strlen(line)).

You could try it. Debug it while it is processing a string containing only one space: " ". Watch carefully what happens to the index i.

How do you know that it "knows"? The most likely scenario is that you're simply having luck with your undefined behavior, and that there is a '\0'-character after the valid bytes of line end.
It's also highly likely that you're not seeing spaces at the end, which might be printed before hitting the stray "lucky '\0'".
A few other points:
There's no need to write this using indexing.
It's not very efficient to call strlen() on each loop iteration.
You might want to use isspace() to remove more whitespace characters.
Here's how I would write it, using isspace() and pointers:
char * remove_spaces(char *str)
{
char *ret = str, *put = str;
for(; *str != '\0'; str++)
{
if(!isspace((unsigned char) *str)
*put++ = *str;
}
*put = '\0';
return ret;
}
Note that this does terminate the space-less version of the string, so the returned pointer is guaranteed to point at a valid string.

The string parameter of your function is null-terminated, right?
And in the loop, the null character of the original string get also copied into the non spaced returned string. So the non spaced string is actually also null-terminated!
For your compiler, the null character is just another binary data that doesn't get any special treatment, but it's used by string APIs as a handy character to easily detect end of strings.

If you use the <= strlen(line), the length of the strlen(line) include the '\0' so your program can work. You can use debug and run analysis.

Strcat throws segmentation fault on simple getch-like password input

I am using Linux and there is a custom function of which returns an ASCII int of current key sort of like getch(). When trying to get used to it and how to store the password I came into an issue, my code is as follows:
int main() {
int c;
char pass[20] = "";
printf("Enter password: ");
while(c != (int)'\n') {
c = mygetch();
strcat(pass, (char)c);
printf("*");
}
printf("\nPass: %s\n", pass);
return 0;
}
Unfortunately I get the warning from GCC:
pass.c:26: warning: passing argument 2 of ‘strcat’ makes pointer from integer without a cast
/usr/include/string.h:136: note: expected ‘const char * __restrict__’ but argument is of type ‘char’
I tried using pointers instead of a char array for pass, but the second I type a letter it segfaults. The function works on its own but not in the loop, atleast not like getch() would on a Windows system.
What can you see is wrong with my example? I am enjoying learning this.
EDIT: Thanks to the answers I came up with the following silly code:
int c;
int i = 0;
char pass[PASS_SIZE] = "";
printf("Enter password: ");
while(c != LINEFEED && strlen(pass) != (PASS_SIZE - 1)) {
c = mygetch();
if(c == BACKSPACE) {
//ensure cannot backspace past prompt
if(i != 0) {
//simulate backspace by replacing with space
printf("\b \b");
//get rid of last character
pass[i-1] = 0; i--;
}
} else {
//passed a character
pass[i] = (char)c; i++;
printf("*");
}
}
pass[i] = '\0';
printf("\nPass: %s\n", pass);

The problem is that strcat expects a char * as its second argument (it concatenates two strings). You don't have two strings, you have one string and one char.
If you want to add c to the end of pass, just keep an int i that stores the current size of pass and then do something like
pass[i] = (char) c.
Make sure to null-terminate pass when you are done (by setting the last position to 0).

A single character is not the same as a string containing a single character.
In other words, 'a' and "a" are very different things.
A string, in C, is a null-terminated array of chars. Your "pass" is an array of 20 chars - a block of memory containing space for 20 chars.
The function mygetch() returns a char.
What you need to do is to insert c into one of the spaces.
Instead of "strcat(pass, c)", you want to do "pass[i] = c", where i starts at zero, and increments by one for every time you call mygetch().
Then you need to do a pass[i] = '\0', when the loop is done, with i equal to the number of times you called mygetch(), to add the null terminator.
You're other problem is that you haven't set a value for c, the first time you check to see if it's '\n'. You want to call mygetch() before you do the comparison:
int i = 0;
for (;;)
{
c = mygetch();
if (c == '\n')
break;
c = mygetch();
pass[i++] = c;
}
pass[i] = '\0';

Over and above the correctly diagnosed issue with strcat() taking two strings -- why did you ignore the compiler warnings, or if there were no warnings, why don't you have warnings turned on? As I was saying, over and above that problem, you also need to consider what happens if you get EOF, and you also need to worry about the initial value of 'c' (which could accidentally be '\n' though it probably isn't).
That leads to code like this:
int c;
char pass[20] = "";
char *end = pass + sizeof(pass) - 1;
char *dst = pass;
while ((c = getchar()) != EOF && c != '\n' && dst < end)
*dst++ = c;
*dst = '\0'; // Ensure null termination
I switched from 'mygetch()' to 'getchar()' - primarily because what I say applies to that and might not apply to your 'mygetch()' function; we don't have a specification of what that function does on EOF.
Alternatively, if you must use strcat(), you still need to keep a track on the length of the string, but you can do:
char c[2] = "";
char pass[20] = "";
char *end = pass + sizeof(pass) - 1;
char *dst = pass;
while (c[0] != '\n' && dst < end)
{
c[0] = mygetch();
strcat(dst, c);
dst++;
}
Not as elegant as all that - using strcat() in context is overkill. You could, I suppose, do simple counting and repeatedly use strcat(pass, c), but that has quadratic behaviour as strcat() has to skip of 0, 1, 2, 3, ... characters on the subsequent iterations. By contrast, the solution where dst points to the NUL at the end of the string means that strcat() doesn't have to skip anything. With a fixed size addition of 1 character, though, you're probably better off with the first loop.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C : using strlen for string including \0 - c

assuming you use an extra \0 at the end to terminate int strlenzz(chars) { int length =0; while(!(s==0 && *(s+1) == 0)) { s++; length++; } return length+1 }

Related

Cannot assign chars from one string to another

void function that removes al the non alphabet chars

Format "%s" expects and agument of type char* etc, I just want to print the alphabet

How does C know the end of my string?

Strcat throws segmentation fault on simple getch-like password input

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C : using strlen for string including \0 - c

assuming you use an extra \0 at the end to terminate int strlenzz(char*s) { int length =0; while(!(*s==0 && *(s+1) == 0)) { s++; length++; } return length+1 }

Related

Cannot assign chars from one string to another

void function that removes al the non alphabet chars

Format "%s" expects and agument of type char* etc, I just want to print the alphabet

How does C know the end of my string?

Strcat throws segmentation fault on simple getch-like password input

Categories

Resources

assuming you use an extra \0 at the end to terminate int strlenzz(chars) { int length =0; while(!(s==0 && *(s+1) == 0)) { s++; length++; } return length+1 }