Casting from unsigned into signed char in C - c

I am converting an input raw pcm stream into mp3 using lame. The encoding function within that library returns mp3 encoded samples in an array of type unsigned char. This mp3-encoded stream now needs to be placed within an flv container which uses a function that writes encoded samples in char array. My problem is that I am passing the array from lame (of type unsigned char) into the flv library. The following piece of code (only symbolic) illustrates my problem:
/* cast from unsigned char to char. */
#include <stdio.h>
#include <stdlib.h>
void display(char *buff, int len) {
int i = 0;
for(i = 0; i < len; i++) {
printf("buff[%d] = %c\n", i, buff[i]);
}
}
int main() {
int len = 10;
unsigned char* buff = (unsigned char*) malloc(len * sizeof(unsigned char));
int i = 0;
for(i = 65; i < (len + 65); i++) {
buff[i] = (unsigned char) i;
printf("char = %c", (char) i);
}
printf("Displaying array in main.\n");
for(i = 0; i < len; i++) {
printf("buff[%d] = %u\n", i, 'buff[i]');
}
printf("Displaying array in func.\n");
display(buff, len);
return 0;
}
My question(s):
1. Is the implicit type conversion in the code below (as demonstrated by passing of buff into function display safe? Is some weird behaviour likely to occur?
2. Given that I have little option but to stick to the functions as they are present, is there a "safe" way of converting an array of unsigned chars into chars?

The only problem with converting unsigned char * into char * (or vice versa) is that it's supposed to be an error. Fix it with a cast.
display((char *) buff, len);
Note: This cast is unnecessary:
printf("char = %c", (char) i);
This is fine:
printf("char = %c", i);
The %c formatter takes an int arg to begin with, since it is impossible to pass a char to printf() anyway (it will always get converted to int, or in an extremely unlikely case, unsigned int.)

You seem to worry a lot about type safety where there is no need for it. Since this is C and not C++, there is no strong typing system in place. Conversions from unsigned char to char are usually harmless, as long as the "sign bit" is never set. The key to avoiding problems is to actually understand them. The following problems/features exist in C:
The default char type has implementation-defined signedness. One should never make any assumptions of its signedness, nor use it in arithmetic of any kind, particularly not bit-wise operations. char should only be used for storing/printing ASCII letters. It should never be mixed with hex literals or there is a potential for subtle bugs.
The integer promotions in C implicitly promote all small integer types, among them char and unsigned char, to an integer type that can hold their result. This will in practice always be int.
Formally, pointer conversions between different types could be undefined behavior. But pointer conversions between unsigned char and char are in practice safe.
Character literals '\0' etc are of type int in C.
printf and similar functions default promote all character parameters to int.
You are also casting the void* result of malloc, which is completely pointless in C, and potentially harmful in older versions of the C standard that translated functions to "default int" if no function prototype was visible.
And then you have various weird logic-related bugs and bad practice, which I have fixed but won't comment in detail. Use this modified code:
#include <stdio.h>
#include <stdlib.h>
void display(const char *buff, int len) {
for(int i = 0; i < len; i++) {
printf("buff[%d] = %c\n", i, buff[i]);
}
}
int main() {
int len = 10;
unsigned char* buff = malloc(len * sizeof(unsigned char));
if(buff == NULL)
{
// error handling
}
char ch = 'A';
for(int i=0; i<len; i++)
{
buff[i] = (unsigned char)ch + i;
printf("char = %c\n", buff[i]);
}
printf("\nDisplaying array in main.\n");
for(int i = 0; i < len; i++) {
printf("buff[%d] = %u\n", i, buff[i]);
}
printf("\nDisplaying array in func.\n");
display((char*)buff, len);
free(buff);
return 0;
}

C/C++ casts from any integral type to any other same-or-larger integral type are guaranteed not to produce data loss. Casts between signed and unsigned fields would generally create overflow and underflow hazards, but the buffer you're converting actually points to raw data whose type is really void*.

Related

Recoding printf %p with write function, no printf

I am currently working on a task where I need to print the address of a variable. It would be easy to use printf %p but I am only allowed to use write from unistd.
I tried casting the pointer in to an unsigned integer and uintptr_t and then converting it into a hexadecimal number. With uintptr_t it works but with an unsigned integer it only prints half of the address. Maybe someone can explain me why this is the case?
I also saw some solutions using ">>" and "<<" but I didn't get why that works. It would be nice if someone can explain a solution using "<<" and ">>" step by step, because I am not sure if I am allowed to use uintptr_t.
this is the code I use to cast it into a unsigned int / unitptr_t / unsigned long long (I know that ft_rec_hex is missing leading 0's):
void ft_rec_hex(unsigned long long nbr)
{
char tmp;
if (nbr != 0)
{
ft_rec_hex(nbr / 16);
if (nbr % 16 < 10)
tmp = nbr % 16 + '0';
else
tmp = (nbr % 16) - 10 + 'a';
write(1, &tmp, 1);
}
}
int main(void)
{
char c = 'd';
unsigned long long ui = (unsigned long long)&c;
ft_rec_hex(ui);
}
It looks like only half of the address is printed because the "unsigned integer" you used has only half size of uintptr_t. (note that uintptr_t is an unsigned integer type)
You can use an array of unsigned char to store data in a pointer variable and print that to print full pointer withput uintptr_t.
Using character types to read objects with other type is allowed according to strict aliasing rule.
#include <stdio.h>
#include <unistd.h>
void printOne(unsigned char v) {
const char* chars = "0123456789ABCDEF";
char data[2];
data[0] = chars[(v >> 4) & 0xf];
data[1] = chars[v & 0xf];
write(1, data, 2);
}
int main(void) {
int a;
int* p = &a;
/* to make sure the value is correct */
printf("p = %p\n", (void*)p);
fflush(stdout);
unsigned char ptrData[sizeof(int*)];
for(size_t i = 0; i < sizeof(int*); i++) {
ptrData[i] = ((unsigned char*)&p)[i];
}
/* print in reversed order, assuming little endian */
for (size_t i = sizeof(int*); i > 0; i--) {
printOne(ptrData[i - 1]);
}
return 0;
}
Or read data in a pointer variable as unsigned char array without copying:
#include <stdio.h>
#include <unistd.h>
void printOne(unsigned char v) {
const char* chars = "0123456789ABCDEF";
char data[2];
data[0] = chars[(v >> 4) & 0xf];
data[1] = chars[v & 0xf];
write(1, data, 2);
}
int main(void) {
int a;
int* p = &a;
/* to make sure the value is correct */
printf("p = %p\n", (void*)p);
fflush(stdout);
/* print in reversed order, assuming little endian */
for (size_t i = sizeof(int*); i > 0; i--) {
printOne(((unsigned char*)&p)[i - 1]);
}
return 0;
}
It would be easy to use printf %p but I am only allowed to use write from unistd.
Then form a string and print that.
int n = snprintf(NULL, 0, "%p", (void *) p);
char buf[n+1];
snprintf(buf, sizeof buf, "%p", (void *) p);
write(1, buf, n);
Using a pointer converted to an integer marginally reduces portability and does not certainly form the best textual representation of the pointer - something implementation dependent.
With uintptr_t it works but with an unsigned integer it only prints half of the address.
unsigned is not specified to be wide enough to contain all the information in a pointer.
uintptr_t, when available (very common), can preserve most of that information for void pointers. Good enough to round-trip to an equivalent pointer, even if in another form.

Conversion of string constant to numeric value using C

I have written a C program which uses two different algorithms to convert a string constant representing a numeric value to its integer value. For some reasons, the first algorithm, atoi(), doesn't execute properly on large values, while the second algorithm, atoi_imp(), works fine. Is this an optimization issue or some other error? The problem is that the first function makes the program's process to terminate with an error.
#include <stdio.h>
#include <string.h>
unsigned long long int atoi(const char[]);
unsigned long long int atoi_imp(const char[]);
int main(void) {
printf("%llu\n", atoi("9417820179"));
printf("%llu\n", atoi_imp("9417820179"));
return 0;
}
unsigned long long int atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str) - 1; i >= 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i - 1; ++j) {
power *= 10;
}
num += (str[i] - '0') * power;
}
return num;
}
unsigned long long int atoi_imp(const char str[]) {
unsigned long long int i, num = 0;
for (i = 0; str[i] >= '0' && str[i] <= '9'; ++i) {
num = num * 10 + (str[i] - '0');
}
return num;
}
atoi is part of C standard library, with signature int atoi(const char *);.
You are declaring that a function with that name exists, but give it different return type. Note that in C, function name is the only thing that matters, and the toolchain can only trust what you tell in the source code. If you lie to the compiler, like here, all bets are off.
You should select different name for your own implementation to avoid issues.
As researched by #pmg, C standard (link to C99.7.1.3) says, using names from C standard library for your own global symbols (functions or global variables) is explicitly Undefined Behavior. Beware of nasal demons!
Ok there is at least one problem with your function atoi.
You are looping down on an unsigned value and check if its bigger equal zero, which should be an underflow.
The most easy fix is index shifting i.e.:
unsigned long long int my_atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str); i != 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i; ++j) {
power *= 10;
}
num += (str[i-1] - '0') * power;
}
return num;
}
Too late, but may help. I did for base 10, in case you change the base you need to take care about how to compute the digit 0, in *p-'0'.
I would use the Horner's rule to compute the value.
#include <stdio.h>
void main(void)
{
char *a = "5363", *p = a;
int unsigned base = 10;
long unsigned x = 0;
while(*p) {
x*=base;
x+=(*p-'0');
p++;
}
printf("%lu\n", x);
}
Your function has an infinite loop: as i is unsigned, i >= 0 is always true.
It can be improved in different ways:
you should compute the length of str just once. strlen() is not cheap, it must scan the string until it finds the null terminator. The compiler is not always capable of optimizing away redundant calls for the same argument.
power could be computed incrementally, avoiding the need for a nested loop.
you should not use the name atoi as it is a standard function in the C library. Unless you implement its specification exactly and correctly, you should use a different name.
Here is a corrected and improved version:
unsigned long long int atoi_power(const char str[]) {
size_t i, len = strlen(str);
unsigned long long int power = 1, num = 0;
for (i = len; i-- > 0; ) {
num += (str[i] - '0') * power;
power *= 10;
}
return num;
}
Modified this way, the function should have a similar performance as the atoi_imp version. Note however that they do not implement the same semantics. atoi_pow must be given a string of digits, whereas atoi_imp can have trailing characters.
As a matter of fact neither atoi_imp nor atoi_pow implement the specification of atoi extended to handle larger unsigned integers:
atoi ignored any leading white space characters,
atoi accepts an optional sign, either '+' or '-'.
atoi consumes all following decimal digits, the behavior on overflow is undefined.
atoi ignores and trailing characters that are not decimal digits.
Given these semantics, the natural implementation or atoi is that of atoi_imp with extra tests. Note that even strtoull(), which you could use to implement your function handles white space and an optional sign, although the conversion of negative values may give surprising results.

strlen is not allways a good idea to be used

I'm not sure if I chose the right Title, but today I discovered (as a beginner in C) that for me strlen is not always the right decision to be made when I need it.
So I tried the following:
#include<stdio.h>
#include<string.h>
int foo(char *s){
int len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
int len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%d)\n",c,len);
}else{
printf("False: C(%d) > Len(%d)\n",c,len);
}
return 0;
}
Output:
False: C(10) > Len(5)
But when I compile with "-Wconversion" I get:
program.c:5:19: warning: conversion to ‘int’ from ‘size_t’ may alter its value [-Wconversion]
int len = strlen(s);
^
A quick fix will be to cast strlen:
int len = (int)strlen(s);
But I was not agree, so I decided that I really need something else, another approach maybe?
I tried the following:
#include<stdio.h>
#include<string.h>
unsigned int size(char *s){
unsigned int len;
/* code here */
len = (unsigned int)strlen(s);
return len;
}
int main(void){
char *name = "Michi";
unsigned int len = size(name);
int a = 20, b = 10, c = a - b;
if(c < (signed int)len){
printf("True: C(%d) < Len(%d)\n",c,len);
}else{
printf("False: C(%d) > Len(%d)\n",c,len);
}
return 0;
}
But I still need to cast strlen because of its return type (size_t which I know that is an unsigned type (typedef long unsigned int size_t;))
Finally I decided for another approach, to create my own function, which make things easier and with less possible future problems and I got:
#include<stdio.h>
long int stringLEN(char *s){
int i = 0;
long int len = 0;
while (s[i] != '\0'){
len++;
i++;
}
return len;
}
long int foo(char *s){
long int len = stringLEN(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
long int len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%ld)\n",c,len);
}else{
printf("False: C(%d) > Len(%ld)\n",c,len);
}
return 0;
}
where no cast is needed anymore.
So my QUESTION is:
is this (for my case) a better approach ?
If not I need some explanations, my books (I have 3) does not explain me in that way that I can understand this things.
I know only that at some point cast could be a big problem, somehow.
EDIT:
This code will also not compile with -Wconversion:
#include<stdio.h>
#include<string.h>
size_t foo(char *s){
size_t len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
size_t len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%zu)\n",c,len);
}else{
printf("False: C(%d) > Len(%zu)\n",c,len);
}
return 0;
}
Output:
error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]|
But if I cast len works. I realized that, if the size is bigger then that int it will never fit.
Digging through all the other answers, your true question seems to be how to deal with a situation like this:
#include <string.h>
#include <libfoo.h>
extern void foo(void);
extern void bar(void);
void pick_foo_or_bar(const char *s)
{
size_t slen = strlen(s);
int value = libfoo_api_returning_an_int();
if (slen > value) // -Wconversion warning on this line
foo();
else
bar();
}
... where you can't change the type of either slen or value, because both are correct for the API they're receiving the result of.
The -Wconversion warning is trying to tell you something meaningful. Comparison of signed and unsigned integer types in C does something very strange, not what you would expect from the laws of arithmetic in ℤ; a naive comparison like what I wrote above can and has caused catastrophic bugs. But the cure is not casts or inventing your own strlen; the cure is to fix the comparison so it does what you expect from the laws of arithmetic. The principles for this are:
First check whether the signed quantity is negative. If so, treat it as smaller than the unsigned quantity.
Otherwise, cast the smaller type to the larger type before comparing them.
In this case, size_t is almost certain to be larger than, or the same size as, int, so you would write
#include <assert.h>
#include <limits.h>
#include <string.h>
#include <libfoo.h>
extern void foo(void);
extern void bar(void);
// Code below is correct only if size_t is at least as large as int.
static_assert(SIZE_MAX >= INT_MAX);
void pick_foo_or_bar(const char *s)
{
size_t slen = strlen(s);
int value = libfoo_api_returning_an_int();
if (value < 0 || (size_t)value < slen)
foo();
else
bar();
}
The static_assert is present because, if I remember correctly, the C standard does not guarantee size_t being at least as large as unsigned int. I could, for instance, imagine an ABI for the 80286 where int was four bytes wide but size_t only two. In that situation you would need to do the casting the other way around:
void pick_foo_or_bar(unsigned short a, long b)
{
if (b < 0 || b < (long)a)
foo();
else
bar();
}
If you don't know which of the two types is bigger, or if you don't know which of them is signed, your only recourse in standard C is (u)intmax_t:
void pick_foo_or_bar(uid_t a, gid_t b)
{
if (a < 0 && b < 0) {
if ((intmax_t)a < (intmax_t)b)
bar();
else
foo();
} else if (a < 0) {
bar();
} else if (b < 0) {
foo();
} else {
if ((uintmax_t)a < (uintmax_t)b)
bar();
else
foo();
}
}
... and, given the exceedingly unfortunate precedent set by C99 wrt long, there probably will come a day when (u)intmax_t is not the biggest integer type supported by the compiler, and then you're just hosed.
The lenght of a string can never be negative, whilst an integer could be - the warning is because the range of values for size_t is different to int, and some positive values of size_t would be treated as negative if cast to an int. The better option is to have the return type to your function match, in this case, have foo return a size_t - you'll soon see that the datatype would permiate most of the code, and leave some other oddities that could do odd things (size_t - size_t could underflow...)
This will compile without warnings:
#include<stdio.h>
#include<string.h>
size_t foo(char *s){
size_t len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
size_t len = foo(name);
size_t a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%zu) < Len(%zu)\n",c,len);
} else {
printf("False: C(%zu) > Len(%zu)\n",c,len);
}
return 0;
}
as well explained in the answers and comments by #thomasdickey, #rolandshaw, #andreaghidini, #olaf, #juanchopanza and others.
Did you really made a better approach? No: why should a stringlen function return values that can be negative? There is no such thing as a string with negative size.
The standard strlen function is already there, is more efficient, is able to deal with strings with a maximum size which is twice the maximum size handled by stringLEN, and has a more precise definition of the return type.
There are 2 issues:
strlen() returns type size_t. size_t is some unsigned integer type likely as wide or wider than int. It is compiler/platform dependent.
Code needs to compare and int to size_t. Since size_t is unsigned, and to prevent a warning of mixed signed/unsigned comparrison, explicitly change int to an unsigned integer. To change an non-negative int to an unsigned integer, cast to (unsigned).
To compare, test if c is negative and if not, then compare (unsigned)c directly to len. Compiler will covert types as needed and result in an arithmetically correct answer.
..
size_t len = strlen("SomeString");
int c = 20; // some int
if (c < 0 || (unsigned)c < len) puts("c less than len");
else puts("c >= len");
The normal way to solve this is to use variables typed size_t, and choose an appropriate format for printing them. Then no cast is needed. For printf, see these:
printf format specifiers for uint32_t and size_t
What's the correct way to use printf to print a size_t?
i think there must be varying from compiler to compiler.....because i tried it on a online compiler and it didn't show any warning.

Understanding unsigned 0 in C

I am trying to understand number representation in C.
I am working on a code segment which looks like the one below.
#include <stdio.h>
#include <string.h>
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, int len)
{
int i;
for (i = 0; i < len; i++)
printf(" %.2x", start[i]);
printf("\n");
}
void show_int(int x) {
show_bytes((byte_pointer) &x, sizeof(int));
}
void show_unsigned(short x) {
show_bytes((byte_pointer) &x, sizeof(unsigned));
}
int main(int argc,char*argv[])
{
int length=0;
unsigned g=(unsigned)length;// i aslo tried with unsigned g=0 and the bytes are the same
show_unsigned(g);
show_int(length);
printf("%d",g);//this prints 0
return 0;
}
Here, show_unsigned() and show_int() prints the byte representations of the variables specified as arguments.For int length the byte representation is all zeroes as expected, but for unsigned g, the byte representation is 00 00 04 08.But when I print g with a %d, I get 0(so i suppose the numeric value is interpreted as 0 )
Please could somebody explain how this is happening.
In:
void show_unsigned(short x) {
show_bytes((byte_pointer) &x, sizeof(unsigned));
}
You declared the argument short x which is smaller than int x so you ignored some of the 00 and your print function is displaying adjacent garbage.
You're reading sizeof(unsigned) bytes in a short. short isn't guaranteed to be the same size as unsigned, hence, when reading the bytes next to your short, garbage data is read.
To fix this, either pass your argument as an unsigned, or when using sizeof, use sizeof(short).
what you are doing doesn't make any sense, particularly with the type conversions that you have occurring. Someone else already pointed out my point about the conversion to short
Rather than writing an absurd number of functions try doing this
void show_bytes( void *start, unsigned int len ) {
unsigned char* ptr = (unsigned char *) start;
unsigned int i = 0;
for ( i = 0; i < len; ++i, ++ptr ) {
printf( " %.2x", ptr[0] );
}
}
Instead of calling as you had been just call it like:
show_bytes( (void *)&x, sizeof(x));
And if thats too much typing make a macro out of that. now it works for any type you come up with.

Convert int to array of bytes in C?

I need to convert decimal number stored in an int, to a array of bytes (aka stored in a unsigned char array).
Any clues?
Or if you know what you are doing:
int n = 12345;
char* a = (char*)&n;
Simplest possible approach - use sprintf (or snprintf, if you have it):
unsigned char a[SOMESIZE]
int n = 1234;
sprintf( a, "%d", n );
Or if you want it stored in binary:
unsigned char a[sizeof( int ) ];
int n = 1234;
memcpy( a, & n, sizeof( int ) );
This could work
int n=1234;
const int arrayLength=sizeof(int);
unsigned char *bytePtr=(unsigned char*)&n;
for(int i=0;i<arrayLength;i++)
{
printf("[%X]",bytePtr[i]);
}
Take care of order that depends on endianness
I understand the problem as converting a number to a string representation (as Neil does).
Below is a simple way to do it without using any lib.
int i = 0;
int j = 0;
do {a[i++] = '0'+n%10; n/=10;} while (n);
a[i--] = 0;
for (j<i; j++,i--) {int tmp = a[i]; a[i] = a[j]; a[j] = tmp;}
The question probably needs some clarification as others obviously understood you wanted the underlying bytes used in internal representation of int (but if you want to do that kind of thing, you'd better use some fixed size type defined in instead of an int, or you won't know for sure the length of your byte array).
Warning: untested code.
This should be an endianness-agnostic conversion. It goes from low to high. There's probably a more efficient way to do it, but I can't think of it at the moment.
#include <limits.h> // CHAR_BIT, UCHAR_MAX
int num = 68465; // insert number here
unsigned char bytes[sizeof(int)];
for (int i=0; i<sizeof(int); i++)
{
bytes[i] = num & UCHAR_MAX;
num >>= CHAR_BIT;
}
I'm posting this mostly because I don't see another solution here for which the results don't change depending on what endianness your processor is.

Resources