How do I use tolower() with a char array? - c

I'm learning C in school and am doing some input and string comparisons and am running into what looks like a casting issue.
Here is my code:
size_t unit_match_index(char *userInput) {
char* unit = malloc(strlen(userInput) + 1);
strcpy(unit, userInput);
//convert to lowercase
for (size_t i = 0; i < strlen(unit); ++i) {
unit[i] = tolower(unit[i]);
/*C6385: invalid data from 'unit': the readable size is 'strlen(userInput)+1' bytes, but 2bytes may be read
C6386: buffer overrun while writing to 'unit': the writable size is [same as above]
*/
}
//...
}
After doing a little bit of research, it looks like tolower() looks for an int and returns an int (2 bytes), and thinks strlen(userInput)+1 may equate to 1, making the total unit array size only 1 byte.
Is there something I should be doing to avoid this, or is this just the analyzer being a computer (computers are dumb)? I'm concerned because I will lose marks on my assignment if there are errors.

As suggested in an answer to this related question, these two warnings are caused by a "bug" in the MSVC Code Analyser.
I even tried the 'fix' I suggested in my answer to that question (that is, using char* unit = malloc(max(strlen(userInput), 0) + 1);) – but it didn't work in your code (not sure why).
However, what did work (and I have no idea why) is to use the strdup function in place of your calls to malloc and strcpy – it does the same thing but in one fell swoop.
Adding the casts (correctly)1 suggested in the comments, here's a version of your code that doesn't generate the spurious C6385 and C6386 warnings:
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
size_t unit_match_index(char* userInput)
{
char* unit = strdup(userInput);
//convert to lowercase
for (size_t i = 0; i < strlen(unit); ++i) {
unit[i] = (char)tolower((unsigned char)unit[i]);
}
//...
return 0;
}
However, MSVC will now generate a different (but equally spurious) warning:
warning C4996: 'strdup': The POSIX name for this item is deprecated.
Instead, use the ISO C and C++ conformant name: _strdup. See online
help for details.
As it happens, the strdup function (without the leading underscore) is adopted as part of the ISO Standard since C23 (6/2019).
1 On the reasons for the casts when using the tolower function, see: Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?. However, simply adding those casts does not silence the two code analysis warnings.

Related

Pointers in C and casting

I've started learning pointers this time. Im trying to read bytes from this array. Task is almost done but CLang keep warns me with warning I don't understand. Here's my code.
Warning says : " Function call argument is an uninitialized value"
int main(void)
{
int tab[] = {67305985,134678021,202050057};
int *pp=0;
pp=tab;
char *wsk=(char*)pp;
for (int i = 0; i < 12; i++)
{
if((wsk+i)!=(void*)NULL)
printf("%d ",*(wsk+i)); // warning on this line
else
return 0;
}
}
These warnings are from Clang Static Analyzer (or whatever they call it these days).
It looks like a false positive to me, if we assume that int is at least 4 bytes and the real code has #include <stdio.h> and no other changes.
If you're using the latest version of the analyzer, you could file a clang bug report. Well -- you could if they allowed people who don't already have accounts to file bug reports. Maybe someone else reading this thread can do it.
Note: it would help the question to post exactly which version of the analyzer you are running (this may be different to the compiler you're using to build -- some IDEs use different compilers for building than for these inline messages).
This if statement
if((wsk+i)!=(void*)NULL)
does not make sense. The macro NULL is already a null pointer constant. So there is no sense to cast it to void *.
And the pointer wsk+i can not be equal to NULL in this loop because initially it points to an object.
Just remove the if statement and output each character in the loop.
And it is a bad idea to use magic numbers like 12 used in the loop.
You could write for example
const size_t N = sizeof( tab ) / sizeof( *tab );
and then in the loop
for ( size_t i = 0; i < N * sizeof( int ); i++ )
//...
As for the warning then it is irrelative to the presented code provided that you included the header <stdio.h>.
wsk will never be null, since you've set it to be the same as tab (arrays, in C/C++ are just pointers to a block of data. The warning is telling you that no matter how much you add to wsk it will always be non-null. What you need to do is limit your iterations in the loop to the number of valid items in the array, since C/C++ arrays have no terminators of any kind.
The printf might just be confusing CLang. Try printf("%d ", wsk[i]);

How does this implementation of randomization work in C?

Since I found this particular documentation on https://www.tutorialspoint.com/c_standard_library/c_function_rand.htm,I have been thinking about this particular line of code srand((unsigned)time(&t));.Whenever I had to generate some stuff,I used srand(time(NULL)) in order not to generate the same stuff everytime I run the program,but when I came across this,I have been wondering :Is there any difference between srand((unsigned)time(&t)) and srand(time(NULL))?Because to me they seem like they do the same thing.Why is a time_t variable used?And why is the adress operator used in srand()?
#include <stdio.h>
#include<stdlib.h>
int main(){
int i,n;
time_t t;
n = 5;
srand((unsigned)time(&t));
for (i = 0; i < n; i++) {
printf("%d\n", rand() % 50);
}
return(0);
}
Yes, it will yield the same result. But the example is badly written.
I would be careful reading Tutorialspoint. It's a site known for bad C code, and many bad habits you see in questions here at SO can be traced to that site. Ok, it's anecdotal evidence, but I did ask a user here why they cast the result of malloc, and they responded that they had learned that on Tutorialspoint. You can actually see (at least) four examples in this short snippet.
They cast the result from the call to time() which is completely unnecessary and just clutters the code.
For some reason they use the variable t, which is completely useless in this example. If you read the documentation for time() you'll see that just passing NULL is perfectly adequate in this example.
Why use the variable n? For this short example it's perfectly ok with a hardcoded value. And when you use variables to avoid hardcoded values, you should declare them const and give them a much more descriptive name than n. (Ok, I realize I was a bit on the edge when writing this. Omitting const isn't that big of a deal, even if it's preferable. And "n" is a common name meaning "number of iterations". And using a variable instead of a hard coded value is in general a good thing. )
Omitted #include<time.h> which would be ok if they also omitted the rest of the includes.
Using int main() instead of int main(void).
For 5, I'd say that in most cases, this does not matter for the main function, but declaring other functions as for example int foo() with empty parenthesis instead of int foo(void) could cause problems, because they mean different things. From the C standard:
The use of function declarators with empty parentheses (not prototype-format parameter type declarators) is an obsolescent feature.
Here is a question related to that: What are the semantics of function pointers with empty parentheses in each C standard?
One could also argue about a few other things, but some people would disagree about these.
Why declare i outside the for loop? Declaring it inside have been legal since C99, which is 20 years old.
Why end the function with return 0? Omitting this is also ok since C99. You only need to have a return in main if you want to return something else than 0. Personally, in general I find "it's good practice" as a complete nonsense statement unless there are some good arguments to why it should be good practice.
These are good to remember if your goal is to maintain very old C code in environments where you don't have compilers that supports C99. But how common is that?
So if I got to rewrite the example at tutorialspoint, i'd write it like this:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(void){
srand(time(NULL));
for (int i = 0; i < 5; i++) {
printf("%d\n", rand() % 50);
}
}
Another horrible example can be found here: https://www.tutorialspoint.com/c_standard_library/c_function_gets.htm
The function gets is removed from standard C, because it's very dangerous. Yet, the site does not even mention that.
Also, they teach you to cast the result of malloc https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm which is completely unnecessary. Read why here: Do I cast the result of malloc?
And although they mention that malloc returns NULL on failure, they don't show in the examples how to properly error check it. Same goes for functions like scanf.

Is redundant assignment elimination fairly typical optimization?

Given the following:
len = strlen(str);
/* code that does not read from len */
len = new_len;
Can I depend on the compiler to remove the first line?
I'm writing a code generating script. The first line is generated in one place, while everything that follows is generated elsewhere (by a virtual function in a descendant class). Most of the time, len isn't needed. Sometimes though, it is. I wonder if I can just set it in all cases and let the compiler get rid of the line if it isn't required.
No, of course you cannot "depend" on the optimization choices done by the compiler.
They can change at the user's whim (compiler command line options), or with different compiler versions.
Since the behavior you're describing is not required by the language standard, you cannot depend on compilers to implement it.
On the other hand, you can surely test it, and try to "coax" it into being optimized out, by (again) asking your compiler to do as much optimization as possible, for instance.
Compilers are pretty clever. But to rely on the compiler removing "unused" function calls is probably not a good idea. For one thing, the compiler will NEED to understand the function you are calling (so strlen is a good example here, because most compilers understand what strlen does and how it affects other things) - if the function is not one that the compiler "understands", then it can't optimise it out.
What if you did:
x = printf("Hello, World!\n");
x = printf("World, Hello!\n");
Would you think they compiler had done the right thing by removing the first printf? Probably not... So, with any function the compiler can't determine "is side-effect free", the compiler MUST call the function, even if the result is not used. Side-effect free means under normal circumstances - e.g. there is a side-effect of calling strlen() with an invalid pointer - your code will probably crash - but that's not "normal circumstances" - you'd be pretty daft to use strlen() just to check if your pointer is a valid one or not, right?
So, in other words, you probably want to make sure your call to strlen() is really needed before you call strlen - or live with the fact that the compiler may wall generate an unnecessary strlen call.
Modern compilers do seem to do an excellent job of eliminating redundant assignments. I ran the following snippet through VS 2008 and gcc 4.6 and the call to strlen (the inlined code, rather) is in fact removed. Both compilers managed to see that the both something() and something_else() produce no side effects. Calls to these are removed as well.
This occurs at /O1 in VC and /O1 in gcc.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int something(int len) {
if(len != len) {
printf("%d\n", len);
}
return 0;
}
int something_else(int len) {
return len * len;
}
int main(void) {
char *s = malloc(1000);
size_t len;
strcpy(s, "Hello world");
len = strlen(s);
something(len);
printf("%s\n", s);
len += 8;
something_else(len);
len = 5;
printf("%d\n", len);
return 0;
}
It depends on compiler & the optimization level you specify.
Here it gets assigned initially from a function call. What if that function has a side effect?
So you cannot just assume the compiler to optimize it for you.
If the initial assignment statement is free from any side effects & you specify optimization level good enough (e.g. -O3 in case of gcc) it may optimize it for you.

How to tell if an optional argument was passed to a function C

Edit 3: For the code itself all together check the first answer or the end of this post.
As stated in the title I'm trying to find a way to tell if an optional argument was passed to a function or not. What I'm trying to do is something like how almost all dynamic languages handle their substring function. Below is mine currently, but it doesn't work since I don't know how to tell if/when the thing is used.
char *substring(char *string,unsigned int start, ...){
va_list args;
int unsigned i=0;
long end=-1;
long long length=strlen(string);
va_start(args,start);
end=va_arg(args,int);
va_end(args);
if(end==-1){
end=length;
}
char *to_string=malloc(end);
strncpy(to_string,string+start,end);
return to_string;
}
Basically I want to still be able to not include the length of the string I want back and just have it go to the end of the string. But I cannot seem to find a way to do this. Since there's also no way to know the number of arguments passed in C, that took away my first thought of this.
Edit:
new way of doing it here's the current code.
#define substring(...) P99_CALL_DEFARG(substring, 3, __VA_ARGS__)
#define substring_defarg_2 (0)
char *substring(char *string,unsigned int start, int end){
int unsigned i=0;
int num=0;
long long length=strlen(string);
if(end==0){
end=length;
}
char *to_string=malloc(length);
strncpy(to_string,string+start,end);
return to_string;
}
and then in a file I call test.c to see if it works.
#include "functions.c"
int main(void){
printf("str:%s",substring("hello world",3,2));
printf("\nstr2:%s\n",substring("hello world",3));
return 0;
}
functions.c has an include for functions.h which includes everything that is ever needed. Here's the clang output(since clang seems to usually give a bit more detail.
In file included from ./p99/p99.h:1307:
./p99/p99_generic.h:68:16: warning: '__error__' attribute ignored
__attribute__((__error__("Invalid choice in type generic expression")))
^
test.c:4:26: error: called object type 'int' is not a function or function
pointer
printf("\nstr2:%s\n",substring("hello world",3));
^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from test.c:1:
In file included from ./functions.c:34:
In file included from ./functions.h:50:
./string.c:77:24: note: instantiated from:
#define substring(...) P99_CALL_DEFARG(substring, 3, __VA_ARGS__)
GCC just says the object is not a function
Edit 2: Note that setting it to -1 doesn't change it either, it still throws the same thing. The compile options I'm using are as follows.
gcc -std=c99 -c test.c -o test -lm -Wall
Clang is the same thing(whether or not it works with it is another question.
ANSWER HERE
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include "p99/p99.h"
#define substring(...) P99_CALL_DEFARG(substring, 3, __VA_ARGS__)
#define substring_defarg_2() (-1)
char *substring(char *string, size_t start, size_t len) {
size_t length = strlen(string);
if(len == SIZE_MAX){
len = length - start;
}
char *to_string = malloc(len + 1);
memcpy(to_string, string+start, len);
to_string[len] = '\0';
return to_string;
}
You will need p99 from there. It is by the selected answer. Just drop into your source directory and you should be OK. Also to summarize his answer on the license. You're able to use it however you want, but you cannot fork it basically. So for this purpose you're free to use it and the string function in any project whether proprietary or open source.
The only thing I ask is that you at least give a link back to this thread so that others who happen upon it can learn of stack overflow, as that's how I do my comments for things I've gotten help with on here.
In C, there's no such thing as an optional argument. The common idiom for situations like this is to either have two functions; substr(char *, size_t start, size_t end) and substr_f(char *, size_t start) or to have a single function where end, if given a special value, will take on a special meaning (such as in this case, possibly any number smaller than start, or simply 0).
When using varargs, you need to either use a sentinel value (such as NULL) at the end of the argument list, or pass in as an earlier argument the argc (argument count).
C has a very low amount of runtime introspection, which is a feature, not a bug.
Edit: On a related note, the correct type to use for string lengths and offsets in C is size_t. It is the only integer type that is guaranteed to be both large enough to address any character in any string, and guaranteed to be small enough to not be wasting space if stored.
Note too that it is unsigned.
Other than common belief functions with optional arguments can be implemented in C, but va_arg functions are not the right tool for such a thing. It can be implemented through va_arg macros, since there are ways to capture the number of arguments that a function receives. The whole thing is a bit tedious to explain and to implement, but you can use P99 for immediate use.
You'd have to change your function signature to something like
char *substring(char *string, unsigned int start, int end);
and invent a special code for end if it is omitted at the call side, say -1. Then with P99 you can do
#include "p99.h"
#define substring(...) P99_CALL_DEFARG(substring, 3, __VA_ARGS__)
#define substring_defarg_2() (-1)
where you see that you declare a macro that "overloads" your function (yes this is possible, common C library implementations use this all the time) and provide the replacement with the knowledge about the number of arguments your function receives (3 in this case). For each argument for which you want to have a default value you'd then declare the second type of macro with the _defarg_N suffix, N starting at 0.
The declaration of such macros is not very pretty, but tells at least as much what is going on as the interface of a va_arg function would. The gain is on the caller ("user") side. There you now can do things like
substring("Hello", 2);
substring("Holla", 2, 2);
to your liking.
(You'd need a compiler that implements C99 for all of this.)
Edit: You can even go further than that if you don't want to implement that convention for end but want to have two distinct functions, instead. You'd implement the two functions:
char *substring2(char *string, unsigned int start);
char *substring3(char *string, unsigned int start, unsigned int end);
and then define the macro as
#define substring(...) \
P99_IF_LT(P99_NARG(__VA_ARGS__, 3)) \
(substring2(__VA_ARGS__)) \
(substring3(__VA_ARGS__))
this would then ensure that the preprocessor chooses the appropriate function call by looking at the number of arguments it receives.
Edit2: Here a better suited version of a substring function:
use the types that are semantically correct for length and stuff like
that
the third parameter seems to be a length for you and not the end of the string, name it accordingly
strncpy is almost never the correct function to chose, there are situations where it doesn't write the terminating '\0' character. When you know the size of a string use memcpy.
char *substring(char *string, size_t start, size_t len) {
size_t length = strlen(string);
if(len == SIZE_MAX){
len = length - start;
}
char *to_string = malloc(len + 1);
memcpy(to_string, string+start, len);
to_string[len] = '\0';
return to_string;
}
Unfortunately, you cannot use va_arg like that:
Notice also that va_arg does not determine either whether the retrieved argument is the last argument passed to the function (or even if it is an element past the end of that list). The function should be designed in such a way that the amount of parameters can be inferred in some way by the values of either the named parameters or the additional arguments already read.
A common "workaround" is to give the other "overload" a nice mnemonic name, such as right_substr. It will not look as fancy, but it will certainly run faster.
If duplicating implementation is your concern, you could implement left_substr, substring, and right_substr as wrappers to a hidden function that takes start and length as signed integers, and interprets negative numbers as missing parameters. It is probably not a good idea to use this "convention" in your public interface, but it would probably work fine in a private implementation.
In standard C, when using variable argument prototypes (...), there is no way to tell directly how many arguments are being passed.
Behind the scenes, functions like printf() etc assume the number of arguments based on the format string.
Other functions that take, say, a variable number of pointers, expect the list to be terminated with a NULL.
Consider using one of these techniques.

Difference between byte and char in C

I am wondering why I can't compile an example from book. I simplify the example here to avoid posting example from a copyrighted book.
#include <stdio.h>
BYTE *data = "data";
int main()
{
printf("%s", data);
return 0;
}
When compile with g++, i get error,
error: invalid conversion from 'const char*' to 'BYTE*'
The program works by simply replacing BYTE with char, but I must be doing something wrong since the example comes from a book.
Please help pointing out the problem. Thanks.
BYTE isn't a part of the C language or C standard library so it is totally system dependent on whether it is defined after including just the standard stdio.h header file.
On many systems that do define a BYTE macro, it is often an unsigned char. Converting from a const char* to an unsigned char* would require an explicit cast.

Resources