Creating a simplified version of strchr() [duplicate] - c

This question already has answers here:
Why does strchr take an int for the char to be found?
(4 answers)
Closed 6 years ago.
Trying to create a simple function that would look for a single char in a string "like strchr() would", i did the following:
char* findchar(char* str, char c)
{
char* position = NULL;
int i = 0;
for(i = 0; str[i]!='\0';i++)
{
if(str[i] == c)
{
position = &str[i];
break;
}
}
return position;
}
So far it works. However, when i looked at the prototype of strchr():
char *strchr(const char *str, int c);
The second parameter is an int? I'm curious to know.. Why not a char? Does this mean that we can use int for storing characters just like we use a char?
Which brings me to the second question, i tried to change my function to accept an int as a second parameter... but i'm not sure if it's correct and safe to do the following:
char* findchar(char* str, int c)
{
char* position = NULL;
int i = 0;
for(i = 0; str[i]!='\0';i++)
{
if(str[i] == c) //Specifically, is this line correct? Can we test an int against a char?
{
position = &str[i];
break;
}
}
return position;
}

Before ANSI C89, functions were declared without prototypes. The declaration for strchr looked like this back then:
char *strchr();
That's it. No parameters are declared at all. Instead, there were these simple rules:
all pointers are passed as parameters as-is
all integer values of a smaller range than int are converted to int
all floating point values are converted to double
So when you called strchr, what really happened was:
strchr(str, (int)chr);
When ANSI C89 was introduced, it had to maintain backwards compatibility. Therefore it defined the prototype of strchr as:
char *strchr(const char *str, int chr);
This preserves the exact behavior of the above sample call, including the conversion to int. This is important since an implementation may define that passing a char argument works differently than passing an int argument, which makes sense on 8 bit platforms.

Consider the return value of fgetc(), values in the range of unsigned char and EOF, some negative value. This is the kind of value to pass to strchr().
#Roland Illig presents a very good explanation of the history that led to retaining use of int ch with strchr().
OP's code fails/has trouble as follows.
1) char* str is treated like unsigned char *str per §7.23.1.1 3
For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char
2) i should be type size_t, to handle the entire range of the character array.
3) For the purpose of strchr(), the null character is considered part of the search.
The terminating null character is considered to be part of the string.
4) Better to use const as str is not changed.
char* findchar(const char* str, int c) {
const char* position = NULL;
size_t i = 0;
for(i = 0; ;i++) {
if((unsigned char) str[i] == c) {
position = &str[i];
break;
}
if (str[i]=='\0') break;
}
return (char *) position;
}
Further detail
The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. C11dr §7.23.5.2 2
So int c is treat like a char. This could imply
if((unsigned char) str[i] == (char) c) {
Yet what I think this is meant:
if((unsigned char) str[i] == (unsigned char)(char) c) {
or simply
if((unsigned char) str[i] == (unsigned char)c) {

Related

Having a problem only printing the first capital letter in a string

I need it to only print the first capital of each word The point of this is to make acronyms from sentences by calling the function above.
EX.
input: Hello output: H.
input: MACinery Bean output: M.C.B.
Desired output: M.B.
void CreateAcronym(char userPhrase[50], char userAcronym[50]){
int len, up, uptwo;
char upper[50];
int isupper();
len = strlen(userPhrase);
for (int i = 0; i < len; i++){
up = isupper(userPhrase[i]);
uptwo = isupper(userPhrase[i+1]);
if (up != 0 && uptwo == 0){
strcpy(&upper[i], &userPhrase[i]);
printf("%c.", upper[i]);
}
}
printf("\n");
The first letter of each word is either the beginning of the string or the previous character being a space. So you must test for that in your condition, not compare the current character and the next character.
for (int i = 0; i < len; i++){
if ((i == 0 || userPhrase[i-1] == ' ') && isupper(userPhrase[i]) {
printf("%c.", userPhrase[i]);
}
}
int isupper();
This is a strange prototype for the <ctype.h> function isupper. I'm not really sure if it's well defined. There are two ways I know to get a correct prototype for this function:
#include <ctype.h>, typically somewhere near the top of your file. This is the way we most often see.
int isupper(int); will also introduce the correct prototype.
Nonetheless I have no doubt the default argument promotions that occur are causing the correct type to be passed, so even if the behaviour is undefined it's probably no trouble.
We need to talk some more about that int argument, though, hence bringing it to your attention. We're expected to ensure that int argument is either an unsigned char value or EOF, but the value you passed is instead a char value that may be neither. When passing char values to <ctype.h> functions, we should (generally, but not always) cast the argument to unsigned char like so:
up = isupper((unsigned char) userPhrase[i]);
As far as the rest of your code goes, I think perhaps you may need to think more about what your algorithm is meant to do. I'ma use a functional style, because I think it makes sense for the sake of explanation here. You should translate this to a procedural style for class. So for my example (which may not even compile 🤷‍♂️ but is there to help you think about your algorithm) you can expect my continuations to be facilitated by something like:
typedef int fun(), contin(char *, char const *, fun **);
It seems to me like you want to skip leading non-alphabet characters, extract the first alpha, then skip any alpha until you reach non-alpha again. For example, the first step:
int skip_nonalpha(char *dst, char const *src, fun **f) {
if (!*src) return *dst = 0;
if (!isalpha((unsigned char) *src)) return skip_nonalpha(dst, src + 1, f);
return ((contin *) *f)(dst, src, f + 1);
}
In the function above you can see the argument passed to isalpha is explicitly converted to unsigned char, as I mentioned earlier. Anyhow, the next step is to extract and convert to upper:
int extract_toupper(char *dst, char const *src, fun **f) {
*dst = toupper(*src);
return ((contin *) *f)(dst + 1, src + 1, f + 1);
}
We don't need the explicit conversion in the function above because we already know src is alpha, which makes it a positive value (within the range of [0..UCHAR_MAX]). Next, skip the rest of the alpha characters:
int skip_alpha(char *dst, char const *src, fun **f) {
if (!*src) return *dst = 0;
if (isalpha((unsigned char) *src)) return skip_alpha(dst, src + 1, f);
return ((contin *) *f)(dst, src, f + 1);
}
Finally, wrapping it all up:
int make_acronym_tail(char *dst, char const *src, fun **f) {
return skip_nonalpha(dst, src,
(fun *[]){ (fun *) extract_alpha
, (fun *) skip_alpha
, (fun *) make_acronym_tail)
});
}
int make_acronym(char *dst, char const *src) { return make_acronym_tail(dst, src, NULL); }
Now assuming this compiles you should be able to use it like:
char phrase[] = "hello world", acronym[strlen(phrase) + 1];
make_acronym(acronym, phrase);

What kind of useful function will return a char taking a char and its index in a string?

I cant explain any better so here's this prototype:
char *my_function(char *str, char (*f)(int, char))
What I'm trying to figure out is what f would stand for in here, given that the int is an index and char is str[index]. For even more clarity:
f(index, str[index]);
What could this be apart from my crappy test f that returns str[index] + index? Is there any USEFUL function with such prototype out there?
A useful pair of functions would be to lightly encrypt/decrypt the string.
As the goal includes working with a string, some useful ideas:
As an index to a string, better to use size_t than int. int is insufficient for long strings.
// char (*f)(int, char)
char (*f)(size_t offset, char ch)
With dst[i] = f(i, str[i]);, to maintain string length, f(offset, 0) should return 0 and f(offset, non_zero) should return non-zero.
If used to encrypt/decrypt, it would make sense that the function maps the 256 different inputs to 256 different results.
Example:
char encrypt(size_t offset, char ch) {
if (ch == 0) return 0;
// Simplistic encryption for illustration
offset = offset%255 + 1;
ch += offset;
if (ch == 0) ch = offset;
assert(ch);
return ch;
}

why does while(*s++ == *t++) not work to compare two strings

I'm implementing strcmp(char *s, char *t) which returns <0 if s<t, 0 if s==t, and >0 if s>t by comparing the fist value that is different between the two strings.
implementing by separating the postfix increment and relational equals operators works:
for (; *s==*t; s++, t++)
if (*s=='\0')
return 0;
return *s - *t;
however, grouping the postfix increment and relational equals operators doesn't work (like so):
while (*s++ == *t++)
if (*s=='\0')
return 0;
return *s - *t;
The latter always returns 0. I thought this could be because we're incrementing the pointers too soon, but even with a difference in the two string occurring at index 5 out of 10 still produces the same result.
Example input:
strcomp("hello world", "hello xorld");
return value:
0
My hunch is this is because of operator precedence but I'm not positive and if so, I cannot exactly pinpoint why.
Thank you for your time!
Because in the for loop, the increment (s++, t++ in your case) is not called if the condition (*s==*t in your case) is false. But in your while loop, the increment is called in that case too, so for strcomp("hello world", "hello xorld"), both pointers end up pointing at os in the strings.
Since you always increment s and t in the test, you should refer to s[-1] for the termination in case of equal strings and s[-1] and t[-1] in case they differ.
Also note that the order is determined by the comparison as unsigned char.
Here is a modified version:
int strcmp(const char *s, const char *t) {
while (*s++ == *t++) {
if (s[-1] == '\0')
return 0;
}
return (unsigned char)s[-1] - (unsigned char)t[-1];
}
Following the comments from LL chux, here is a fully conforming implementation for perverse architectures with non two's complement representation and/or CHAR_MAX > INT_MAX:
int strcmp(const char *s0, const char *t0) {
const unsigned char *s = (const unsigned char *)s0;
const unsigned char *t = (const unsigned char *)t0;
while (*s++ == *t++) {
if (s[-1] == '\0')
return 0;
}
return (s[-1] > t[-1]) - (s[-1] < t[-1]);
}
Everyone is giving the right advice, but are still hardwired to inlining those increment operators within the comparison expression and doing weird off by 1 stuff.
The following just feels simpler and easier to read. No pointer is ever incremented or decremented to an invalid address.
while ((*s == *t) && *s)
{
s++;
t++;
}
return *s - *t;
For completeness in addition to what was already well answered about the wrong offset during subtraction:
*s - *t; is incorrect when *s, *t is negative.
The standard C library specifies that string functions compare as if char was unsigned char. Thus code that subtracts via a char * gives the wrong answer when the characters are negative.
For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).
C17dr § 7.24.1 3
int strcmp(const char *s, const char *t) {
const unsigned char *us = (const unsigned char *) s;
const unsigned char *ut = (const unsigned char *) t;
while (*us == *ut && *us) {
us++;
ut++;
}
return (*us > *ut) - (*us < *ut);
}
This code also addresses obscure concerns of non-2's complement access of -0 and char range exceeding int.

A way to pass an optional argument to a function

Take the following function:
char * slice(const char * str, unsigned int start, unsigned int end) {
int string_len = strlen(str);
int slice_len = (end - start < string_len) ? end - start : string_len;
char * sliced_str = (char *) malloc (slice_len + 1);
sliced_str[slice_len] = '\0';
// Make sure we have a string of length > 0, and it's within the string range
if (slice_len == 0 || start >= string_len || end <= 0) return "";
for (int i=0, j=start; i < slice_len; i++, j++)
sliced_str[i] = str[j];
return sliced_str;
}
I can call this as follows:
char * new_string = slice("old string", 3, 5)
Is there a way to be able to "omit" an argument somehow in C? For example, passing something like the following:
char * new_string = slice("old string", 3, NULL)
// NULL means ignore the `end` parameter and just go all the way to the end.
How would something like that be done? Or is that not possible to do in C?
Optional arguments (or arguments that have default values) are not really a thing in C. I think you have the right idea by passing in 'NULL', except for that NULL is equal to 0 and will be interpreted as an integer. Instead, I would recommend changing the argument to a signed integer instead of unsigned, and passing in a -1 as your flag to indicate that the argument should be ignored.
There's only two ways to pass optional arguments in C, and only one is common. Either pass a pointer to the optional argument and understand NULL as not passed, or pass an out-of-range value as not passed.
Way 1:
char * slice(const char * str, const unsigned int *start, const unsigned int *end);
// ...
const unsigned int three = 3;
char * new_string = slice("old string", &three, NULL)
Way 2:
#include <limits.h>
char * slice(const char * str, const unsigned int start, const unsigned int end);
char * new_string = slice("old string", 3, UINT_MAX);
BTW, this example should really be using size_t and SIZE_MAX but I copied your prototype.
The proposed dupe target is talking about vardiac functions, which do have optional arguments, but it's not like what you're asking for. It's always possible in such a function call to determine if the argument is (intended to be) present by looking at the arguments that come before. In this case, that won't help at all.

Concatenate string with error

I have this code in C but I don't know why I get an error like Argument of type “char” is incompatible with parameter of type const char*
char number_string[size] = { NULL };
int counter = 0;
for (counter = 0; counter < strlen(input_string - 1); counter++)
{
temp = input_string[counter];
if (isdigit(temp))
{
strcat(number_string, temp); //temp variable has the error only in this line
}
}
You're mixing char and char *. The declaration of number_string[] might not warn you, because NULL may simply be defined as 0, which is legal as a char and as a pointer. But the variable temp is definately a problem: You don't show its declaration, but its first assignment makes it a char, and its second use in strcat assumes it is a char *.
If you want to add a single character at a time to a string, you'll have to do it by hand, something like this:
int nslen = strlen(number_string);
for ...
number_string[nslen++] = temp
number_string[nslen] = '\0';
char *strcat(char *dest, const char *src); expects its last argument to have type const char *
temp is a variable of type char.
So you're getting the error Argument of type “char” is incompatible with parameter of type const char*.
You could try something like this...
strncat(number_string, &temp, 1)
I usually favor something like...
sprintf (buffer, "%s%c", number_string, temp)
This is not an answer to your problem with types; it is a suggestion to tackle the problem in another way.
You want to fill a string with all numeric digits from the input string. There is no standard function to do this.
You could use strcat, but that function operates on strings, which must be zero-terminated. You could create a temporary string of two chars – one digit and one null-terminator – but that would be ineffective. strcat also requires you to ensure that you don't overflow the char buffer that you append to.
In cases like yours, it is usually easier to tackle the problem on a low level, where you create the char array yourself, one character at a time. For example, you can iterate through the input string with i and copy all digits to the number string by means of a second index, j:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int main()
{
size_t size = 12;
char number[size];
const char *input = "Alpha 123/Bravo 456/Charlie 789/Delta 007";
int i, j;
j = 0;
for (i = 0; input[i] != '\0'; i++) {
if (isdigit((unsigned char) input[i]) && j + 1 < size) {
number[j++] = input[i];
}
}
number[j] = '\0';
puts(number);
return 0;
}
Note how the code keeps track of the characters in the number string and how it takes care not to overflow the buffer. The number string may be truncated, but it will always be null-terminated.
I've also used input[i] != '\0' to detect the end of the string (which is, by definition, the null terminator '\0') instead of calling strlen(input), which always starts looking for the terminator fro the beginning of the string.
There are multiple problems in your code:
calling strlen(input_string - 1) most likely invokes undefined behaviour. You probably meant strlen(input_string) - 1 which would still cause the loop to run too far if input_string is an empty string;
calling strlen for each iteration is inefficient anyway;
calling isdigit(temp) is incorrect if temp is a char and char is signed by default;
strcat cannot be used the way you call it;
you should check for potential buffer overflow if all digits do not fit in the destination array.
Here is a much simpler function to extract all digits from the input_string:
char number_string[size];
int i, j;
for (i = j = 0; input_string[i] != '\0'; i++) {
unsigned char uc = input_string[i];
if (j < size - 1 && isdigit(uc)) {
number_string[j++] = uc;
}
}
number_string[j] = '\0';

Resources