Using difference of pointers with printf("%.*s") - c

The problem I'm facing has to do with intptr_t data type and the way fprintf() takes arguments for the %.*s format. The %.*s format expect field precision to have type int, and maybe that's not unreasonable per se.
Not in this case though:
#include <stdio.h>
#include <stdint.h>
int main() {
char fname[] = "Some_File.txt";
FILE *write = fopen(fname, "w");
if (write != NULL) {
printf("\n\tType below :\n\n");
char in[501] = ""; char *p;
while (1) {
fgets(in, MAX_LN, stdin);
/*** Region with compiler warnings begins ***/
if ((p = strstr(in, "/end/")) != 0) {
intptr_t o = p - in;
fprintf(write, "%.*s", o, in);
/*** Region with compiler warnings ends ***/
fclose(write);
break;
} else {
fputs(in, write);
}
}
}
}
If I compile this, it doesn't play well with %.*s, and the compiler points that out:
warning: field precision should have type 'int', but argument has type 'intptr_t' (aka 'long') [-Wformat]
If I make it int o;, it plays well with %.*s but of course isn't ideal, and the compiler says as much:
warning: implicit conversion loses integer precision: 'long' to 'int' [-Wshorten-64-to-32]
Now, this is demo code, and the max size that o can hold is 500 here, however, in my actual code, it can be 10,000 or even 100,000 (still very much within the size of a 32-bit int, isn't it?)
So what will resolve this best with the least changes?
Compiled on Clang (might be very similar on GCC) with -Wall -Wextra -pedantic.

The difference of two pointers is type ptrdiff_t. "... is the signed integer type of the result of subtracting two pointers;"
// intptr_t o = p-in;
ptrdiff_t o = p - in;
Given these both point in char in[501], the difference also fits in an int.
Simply cast. The .* expects a match int, not a intptr_t nor ptrdiff_t.
// fprintf(write,"%.*s",o,in);
fprintf(write,"%.*s", (int) o, in);
Or all at once:
fprintf(write,"%.*s", (int) (p - in), in);

The type of a pointer difference such as p-in is ptrdiff_t, not intptr_t. Anyway, one alternative in this case would be to use fwrite:
if ((p = strstr(in, "/end/")) != NULL) {
size_t len = (size_t)(p - in); // p - in is an offset into data with a size
fwrite(in, sizeof(char), len, write);
fclose(write);
break;
} else {
fputs(in, write);
}
Then add error checks.

Related

How to automatically add type casts to printf style functions in c source code?

I'm porting a large c project from Windows to Unix and the source contains many thousand calls for a logprint function which is declared like this:
VOID logprint(DWORD level, LPCSTR format, ...);
Now here are my two problems:
1.) Used format type specifiers are not portable
The code uses %lu for ULONG variables. On Windows this is fine because ULONG is a typedef for unsigned long. However when porting the code I cannot reproduce this typedef because ULONG must always be exactly 32-bit according to [MS-DTYP] (NB: With Microsoft's c compilers unsigned long is always 32-bit).
So I've created a windows types header file wtypes.h which defines the basic Windows data types with the help of stdint.h and limits.h.
Of course now this results in invalid reads because of the %lu specifier if the systems unsigned long is 64-bit and my ULONG is 32-bit. So I also have to add a (unsigned long) cast to all ULONG logprint arguments.
And ULONG is just one example of course ...
2.) Invalid format type specifiers used
In addition that code uses lots of invalid format specifiers. E.g. %d for DWORD arguments.
Of course it is easy to solve:
identify all logprint calls
identify the type of each argument
verify that the correct format specifier is used
add the correct type casts to the arguments
Example:
Replace:
ULONG ulMin, ulMax;
...
logprint(FATAL, "specified interval is invalid %ld..%u out of range",
ulMin, ulMax);
with:
logprint(FATAL, "specified interval is invalid %lu..%lu",
(unsigned long) ulMin, (unsigned long) ulMax);
But it would take me at least two weeks and my brain will be garbled after that.
So my actual question:
Are there any automated tools for making these kind of changes?
As a minimum requirement that tool would have to be able to identify the type of the arguments and prefix them with a type cast. Once the typecasts are there I can easily write a python script which fixes the format specifiers.
Is the source of the logprint accessible? If it is, the best way seems to change it directly. It must contain type casting code for va_arg such as:
ul = va_arg(argp, ULONG);
then just change ULONG as you needed.
If it is not, just make your own wrapper function such as logprint64 doing the similar task but casting the types for the arguments as needed. Substituting logprint64 for logprint will take less than a hour, I guess.
Or, you may rewrite logprint. According to your post and reply, the logprint seems be in the following form:
#include <stdio.h>
#include <stdarg.h>
enum ErrCode { FATAL, MILD };
typedef unsigned short ULONG;
#define MAX 100
char Buf[MAX];
void logprint(enum ErrCode code, char *fmt, ...)
{
va_list aptr;
va_start(aptr, fmt);
vsprintf(Buf, fmt, aptr);
va_end(aptr);
}
int main()
{
ULONG ulMin = 97, ulMax = 99;
logprint(FATAL,"interval is invalid %c..%c", ulMin, ulMax);
printf("%s\n", Buf);
return(0);
}
You can replace it with the following definition simulating vsprintf:
void logprint(enum ErrCode code, const char *fmt, ...)
{ // add your types as needed
ULONG h;
unsigned long u;
long d;
int i;
const char *p;
char *buf;
va_list argp;
va_start(argp, fmt);
for (p = fmt, buf = Buf; *p != '\0'; p++) {
if (*p != '%') {
buf += sprintf(buf, "%c", *p); continue;
}
switch (*++p) { // change the type casting as needed
case 'l':
switch (*++p) {
case 'u':
u = (unsigned long) va_arg(argp, ULONG);
buf += sprintf(buf, "%lu", u); continue;
case 'd':
d = va_arg(argp, long);
buf += sprintf(buf, "%ld", d); continue;
}
case 'c':
u = va_arg(argp, unsigned long);
buf += sprintf(buf, "%lu", u); continue;
case 'd':
i = va_arg(argp, int);
buf += sprintf(buf, "%d", i); continue;
}
}
va_end(argp);
}
Hope this helps.

Error : format'%s' expects argument of type 'char *', but argument 2 has type 'int' [-Wformat=]

I am currently trying to do my own shell, and it has to be polyglot.
So I tryed to implement a function that reads the lines in a .txt file.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// globals
char lang[16] = {'t','r','y'};
char aMsg[512];
// functions
void takeFile() {
int i =0;
char namFil[32];
char lg[16];
FILE * file;
char tmp[255];
char * line = tmp;
size_t len = 0;
ssize_t read;
strcpy(namFil,"/media/sf_Projet_C/");
strcpy(lg,lang);
strcat(lg, ".txt");
strcat(namFil, lg);
file = fopen(namFil, "r");
printf("%s\n", namFil);
while((read = getline(&line,&len, file)) != -1) {
aMsg[i] = *line;
i++;
}
}
enum nMsg {HI, QUIT};
int main(void) {
takeFile();
printf("%s\n%s\n", aMsg[HI], aMsg[QUIT]);
}
I am on win7 but I compile with gcc on a VM.
I have a warning saying :
format'%s' expects argument of type 'char *', but argument 2 (and 3) has type 'int' [-Wformat=]
I tried to execute the prog with %d instead of %s and it prints numbers.
I don't understand what converts my aMsg into a int.
My try.txt file is just :
Hi
Quit
The contents of your text file have nothing to do with the warning, which is generated by the compiler before your program ever runs. It is complaining about this statement:
printf("%s\n%s\n", aMsg[HI], aMsg[QUIT]);
Global variable aMsg is an array of char, so aMsg[HI] designates a single char. In this context its value is promoted to int before being passed to printf(). The %s field descriptor expects an argument of type char *, however, and GCC is smart enough to recognize that what you are passing is incompatible.
Perhaps you had in mind
printf("%s\n%s\n", &aMsg[HI], &aMsg[QUIT]);
or the even the equivalent
printf("%s\n%s\n", aMsg + HI, aMsg + QUIT);
but though those are valid, I suspect they won't produce the result you actually want. In particular, given the input data you specified and the rest of your program, I would expect the output to be
HQ
Q
If you wanted to read in and echo back the whole contents of the input file then you need an altogether different approach to both reading in and writing out the data.
Let's take a closer look on the problematic line:
printf("%s\n%s\n", aMsg[HI], aMsg[QUIT]);
The string you would like to print expects 2 string parameters. You have aMsg[HI] and aMsg[QUIT]. These two are pointing to a char, so the result is one character for each. All char variables can be interpreted as a character or as a number - the character's ID number. So I assume the compiler resolves these as int types, thus providing you that error message.
As one solution you merely use %c instead of %s.
However, I suspect you want to achieve something else.
I'm completely guessing, but I think what you want is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// globals
char lang[16] = {'t','r','y'};
char *aMsg[512];
// functions
void takeFile() {
int i =0;
char namFil[32];
char lg[16];
FILE * file;
char tmp[255];
char * line = tmp;
size_t len = 0;
ssize_t read;
strcpy(namFil,"/media/sf_Projet_C/");
strcpy(lg,lang);
strcat(lg, ".txt");
strcat(namFil, lg);
file = fopen(namFil, "r");
printf("%s\n", namFil);
while((read = getline(&line,&len, file)) != -1) {
aMsg[i] = malloc(strlen(line)+1);
strcpy(aMsg[i], line);
i++;
}
fclose(file);
}
enum nMsg {HI, QUIT};
int main(void) {
takeFile();
printf("%s\n%s\n", aMsg[HI], aMsg[QUIT]);
free(aMsg[HI]);
free(aMsg[QUIT]);
return 0;
}

Pointer difference and size_t

I want to allocate memory for holding a field extracted from a given string. The size of the field is determined by the difference of two pointers, see the following minimal example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(int argc, char *argv[])
{
const char line[] = "foo,bar,baz";
char *field_start = line;
char *field_end;
char *field;
field_end = strchr(line, ',');
field = malloc(field_end - field_start + 1);
memcpy(field, field_start, field_end - field_start);
*(field + (field_end - field_start)) = '\0';
printf("field=\"%s\"\n", field);
/* ... */
return (0);
}
Compiling this code with clang -Weverything -o ex ex.c results in the following warnings:
ex.c:14:41: warning: implicit conversion changes signedness: 'long' to 'unsigned long'
[-Wsign-conversion]
field = malloc(field_end - field_start + 1);
~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~^~~
ex.c:15:39: warning: implicit conversion changes signedness: 'long' to 'unsigned long'
[-Wsign-conversion]
memcpy(field, field_start, field_end - field_start);
~~~~~~ ~~~~~~~~~~^~~~~~~~~~~~~
As I understand it, the result of the pointer difference is of ptrdiff_t type while the malloc/memcpy expect an argument of type size_t.
So my question is how to address this and to eliminate the warning? As
field_end >= field_start the difference cannot become negative, so could the
above be safely casted to size_t
field = malloc(size_t(field_end - field_start + 1));
memcpy(field, size_t(field_start, field_end - field_start));
or are the any problems I'm overlooking?
Note:
There are no checks for return values in the above just for simplicity. field_start and _end should be const of course.
field_end >= field_start only holds in case strchr does not return NULL, i.e. nothing in the type system tells the compiler that this indeed always holds. Hence the warning is warranted. However, if you make sure that this is not the case, then (size_t)(field_end - field_start) should be fine. In order to not duplicate this all over, I'd add
size_t field_len;
/* memchr & null-check go here */
field_len = (size_t)(field_end - field_start);
...and then use field_len all over.
That being said, you may want to replace your malloc/memcpy combination with a call to strndup.

safe malloc/realloc: wrapping the call into a macro?

I would like to wrap my calls to malloc/realloc into a macro that would stop the program if the method returns NULL
can I safely use the following macro?
#define SAFEMALLOC(SIZEOF) (malloc(SIZEOF) || (void*)(fprintf(stderr,"[%s:%d]Out of memory(%d bytes)\n",__FILE__,__LINE__,SIZEOF),exit(EXIT_FAILURE),0))
char* p=(char*)SAFEMALLOC(10);
it compiles, it works here with SAFEMALLOC(1UL) and SAFEMALLOC(-1UL) but is it a safe way to do this?
static void* safe_malloc(size_t n, unsigned long line)
{
void* p = malloc(n);
if (!p)
{
fprintf(stderr, "[%s:%ul]Out of memory(%ul bytes)\n",
__FILE__, line, (unsigned long)n);
exit(EXIT_FAILURE);
}
return p;
}
#define SAFEMALLOC(n) safe_malloc(n, __LINE__)
No, it's broken.
It seems to assume that the boolean or operator || returns its argument if it's deemed true, that's not how it works.
C's boolean operators always generate 1 or 0 as integers, they do not generate any of the input values.
Using your macro:
#define SAFEMALLOC(SIZEOF) (malloc(SIZEOF) || (void*)(fprintf(stderr,"[%s:%d]Out of memory(%d bytes)\n",__FILE__,__LINE__,SIZEOF),exit(EXIT_FAILURE),0))
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *p = SAFEMALLOC(10);
char *q = SAFEMALLOC(2000);
printf("p = %p, q = %p\n", p, q);
// Leak!
return 0;
}
Warnings (should be a clue):
weird.c:8: warning: cast to pointer from integer of different size
weird.c:8: warning: initialization makes pointer from integer without a cast
weird.c:9: warning: cast to pointer from integer of different size
weird.c:9: warning: initialization makes pointer from integer without a cast
Output:
p = 0x1, q = 0x1
In summary, no, it's not very safe! Writing a function would probably be less error prone.

How does the particular C function work?

I am trying to learn C and am very confused already.
In the OOP languages i have used there exists the ability to perform method overloading, where the same function could have different parameter types and call whichever was the most appropriate.
Now in C i know that this is not the case so i cant figure out the following problem, How printf() works.
For example:
char chVar = 'A';
int intVar = 123;
float flVar = 99.999;
printf("%c - %i - %f \n",chVar, intVar, flVar);
printf("%i - %f - %c \n",intVar, flVar, chVar);
printf("%f - %c - %i \n",flVar, chVar, intVar);
Now as C does'nt support function overloading, How does printf manage to take any number of arguments, of any type, and then work correctly with them?
I have tried to find the printf() working by downloading the glibc source package but can quite seem to find it, though i'll keep looking.
Could anyone here explain how C performs the above task?
C supports a type of function signature called "varargs" meaning "variable (number of) arguments". Such a function must have at least one required argument. In the case of printf, the format string is a required argument.
Generally, on a stack-based machine, when you call any C function, the arguments are pushed onto the stack from right-to-left. In this way, the first argument to the function is that found on the "top" of the stack, just after the return address.
There are C macros defined which allow you to retrieve the variable arguments.
The key points are:
There is no type-safety for the variable arguments. In the case of printf(), if the format string is wrong, the code will read invalid results from memory, possibly crashing.
The variable arguments are read through a pointer which is incremented through the memory containing those arguments.
The argument pointer must be initialized with va_start, incremented with va_arg, and released with va_end.
I have posted a ton of code you may find interesting on the related question:
Best Way to Store a va_list for Later Use in C/C++
Here's a skeleton of a printf() which only formats integers ("%d"):
int printf( const char * fmt, ... )
{
int d; /* Used to store any int arguments. */
va_list args; /* Used as a pointer to the next variable argument. */
va_start( args, fmt ); /* Initialize the pointer to arguments. */
while (*fmt)
{
if ('%' == *fmt)
{
fmt ++;
switch (*fmt)
{
case 'd': /* Format string says 'd'. */
/* ASSUME there is an integer at the args pointer. */
d = va_arg( args, int);
/* Print the integer stored in d... */
break;
}
}
else
/* Not a format character, copy it to output. */
fmt++;
}
va_end( args );
}
Internally, printf will (at least usually) use some macros from stdarg.h. The general idea is (a greatly expanded version of) something like this:
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
int my_vfprintf(FILE *file, char const *fmt, va_list arg) {
int int_temp;
char char_temp;
char *string_temp;
char ch;
int length = 0;
char buffer[512];
while ( ch = *fmt++) {
if ( '%' == ch ) {
switch (ch = *fmt++) {
/* %% - print out a single % */
case '%':
fputc('%', file);
length++;
break;
/* %c: print out a character */
case 'c':
char_temp = va_arg(arg, int);
fputc(char_temp, file);
length++;
break;
/* %s: print out a string */
case 's':
string_temp = va_arg(arg, char *);
fputs(string_temp, file);
length += strlen(string_temp);
break;
/* %d: print out an int */
case 'd':
int_temp = va_arg(arg, int);
itoa(int_temp, buffer, 10);
fputs(buffer, file);
length += strlen(buffer);
break;
/* %x: print out an int in hex */
case 'x':
int_temp = va_arg(arg, int);
itoa(int_temp, buffer, 16);
fputs(buffer, file);
length += strlen(buffer);
break;
}
}
else {
putc(ch, file);
length++;
}
}
return length;
}
int my_printf(char const *fmt, ...) {
va_list arg;
int length;
va_start(arg, fmt);
length = my_vfprintf(stdout, fmt, arg);
va_end(arg);
return length;
}
int my_fprintf(FILE *file, char const *fmt, ...) {
va_list arg;
int length;
va_start(arg, fmt);
length = my_vfprintf(file, fmt, arg);
va_end(arg);
return length;
}
#ifdef TEST
int main() {
my_printf("%s", "Some string");
return 0;
}
#endif
Fleshing it out does involve quite a bit of work -- dealing with field width, precision, more conversions, etc. This is enough, however, to at least give a flavor of how you retrieve varying arguments of varying types inside your function.
(Don't forget that, if you're using gcc (and g++?), you can pass -Wformat in the compiler options to get the compiler to check that the types of the arguments match the formatting. I hope other compilers have similar options.)
Could anyone here explain how C performs the above task?
Blind faith. It assumes that you have ensured that the types of the arguments match perfectly with the corresponding letters in your format string. When printf is called, all the arguments are represented in binary, unceremoniously concatenated together, and passed effectively as a single big argument to printf. If they don't match, you'll have problems. As printf iterates through the format string, every time it see %d it will take 4 bytes from the arguments (assuming 32-bit, it would be 8 bytes for 64-bit ints of course) and it will interpret them as an integer.
Now maybe you actually passed a double (typically taking up twice as much memory as an int), in which case printf will just take 32 of those bits and represented them as an integer. Then the next format field (maybe a %d) will take the rest of the double.
So basically, if the types don't match perfectly you'll get badly garbled data. And if you're unlucky you will have undefined behaviour.

Resources