directly cast a char *nib as int hex - c

in pure and portable c.
So I am having trouble casting what has to be a variable, from a variable. char *nib to int hex. The idea being I had a char *nib ="ab"; or "0xab" or anything that directly represents two characters as a char *. then casting it as a integer. and writing it to a file to get a one to one write. so I start with char *nib="0xab"; then I write it as a int presumably, and to a hexdump or edit and the result is just ab.
I've been able to do this as a constant directly declaring... but the nib is always static.
this has to be a one to one starting with a two char string or nib. Not converting anything, purly casting.
So can you write it directly to a file without converting it? three look up tables seems like a bit much for a value what has the name length

There is no way to cast 2 characters (2 bytes) into one byte because cast does not change binary representation of the value.
The closest you can get to casting string that looks like hex to some value that will show something similar is use 0-15 characters via escape sequence like char* nib = "\x0A\x0B" and cast ( *((short*)nib)) that to 2-byte value (0x0A0B in this case) and store that to a file (I'm not sure if there is portable integer type of 2 bytes - short often is 2 bytes wide but does not have to be 2 bytes). Unfortunately I don't think there is a portable way to store 2 byte integer value to a file as different architectures may have different byte order.
Writing string value character by character is likely safest approach. Or convert string to int a usual way and use your own read/write code for integers to ensure portability.

Related

Internals binary saving of C chars

Hey i stumbled about something pretty weird while programming. I tried to transform a utf8 char into a hexadecimal byte representation like 0x89 or 0xff.
char test[3] = "ü";
for (int x = 0; x < 3; x++){
printf("%x\n",test[x]);
}
And i get the following output :
ffffffc3
ffffffbc
0
I know that C uses one byte of data fore every one char and therefore if i want to store an weird char like "ü" they count as 2 chars.
Transforming ASCII Chars is no problem but once i get to non ASCII Chars (from germans to chinese) instead to getting outputs like 0xc3 and 0xbc c adds 0xFFFFFF00 to them.
I know that i can just do something like &0xFF and fix that weird representation, but i can wrap my head around why that keeps happening in the first place.
C allows type char to behave either as a signed type or as an unsigned type, as the C implementation chooses. You are observing the effect of it being a signed type, which is pretty common. When the char value of test[x] is passed to printf, it is promoted to type int, in value-preserving manner. When the value is negative, that involves sign-extension, whose effect is exactly what you describe. To avoid that, add an explicit cast to unsigned char:
printf("%x\n", (unsigned char) test[x]);
Note also that C itself does not require any particular characters outside the 7-bit ASCII range to be supported in source code, and it does not specify the execution-time encoding with which ordinary string contents are encoded. It is not safe to assume UTF-8 will be the execution character set, nor to assume that all compilers will accept UTF-8 source code, or will default to assuming that encoding even if they do support it.
The encoding of source code is a matter you need to sort out with your implementation, but if you are using at least C11 then you can ensure execution-time UTF-8 encoding for specific string literals by using UTF-8 literals, which are prefixed with u8:
char test[3] = u8"ü";
Be aware also that UTF-8 code sequences can be up to four bytes long, and most of the characters in the basic multilingual plane require 3. The safest way to declare your array, then, would be to let the compiler figure out the needed size:
// better
char test[] = u8"ü";
... and then to use sizeof to determine the size chosen:
for (int x = 0; x < sizeof(test); x++) {
// ...

C Language: Why int variable can store char?

I am recently reading The C Programming Language by Kernighan.
There is an example which defined a variable as int type but using getchar() to store in it.
int x;
x = getchar();
Why we can store a char data as a int variable?
The only thing that I can think about is ASCII and UNICODE.
Am I right?
The getchar function (and similar character input functions) returns an int because of EOF. There are cases when (char) EOF != EOF (like when char is an unsigned type).
Also, in many places where one use a char variable, it will silently be promoted to int anyway. Ant that includes constant character literals like 'A'.
getchar() attempts to read a byte from the standard input stream. The return value can be any possible value of the type unsigned char (from 0 to UCHAR_MAX), or the special value EOF which is specified to be negative.
On most current systems, UCHAR_MAX is 255 as bytes have 8 bits, and EOF is defined as -1, but the C Standard does not guarantee this: some systems have larger unsigned char types (9 bits, 16 bits...) and it is possible, although I have never seen it, that EOF be defined as another negative value.
Storing the return value of getchar() (or getc(fp)) to a char would prevent proper detection of end of file. Consider these cases (on common systems):
if char is an 8-bit signed type, a byte value of 255, which is the character ÿ in the ISO8859-1 character set, has the value -1 when converted to a char. Comparing this char to EOF will yield a false positive.
if char is unsigned, converting EOF to char will produce the value 255, which is different from EOF, preventing the detection of end of file.
These are the reasons for storing the return value of getchar() into an int variable. This value can later be converted to a char, once the test for end of file has failed.
Storing an int to a char has implementation defined behavior if the char type is signed and the value of the int is outside the range of the char type. This is a technical problem, which should have mandated the char type to be unsigned, but the C Standard allowed for many existing implementations where the char type was signed. It would take a vicious implementation to have unexpected behavior for this simple conversion.
The value of the char does indeed depend on the execution character set. Most current systems use ASCII or some extension of ASCII such as ISO8859-x, UTF-8, etc. But the C Standard supports other character sets such as EBCDIC, where the lowercase letters do not form a contiguous range.
getchar is an old C standard function and the philosophy back then was closer to how the language gets translated to assembly than type correctness and readability. Keep in mind that compilers were not optimizing code as much as they are today. In C, int is the default return type (i.e. if you don't have a declaration of a function in C, compilers will assume that it returns int), and returning a value is done using a register - therefore returning a char instead of an int actually generates additional implicit code to mask out the extra bytes of your value. Thus, many old C functions prefer to return int.
C requires int be at least as many bits as char. Therefore, int can store the same values as char (allowing for signed/unsigned differences). In most cases, int is a lot larger than char.
char is an integer type that is intended to store a character code from the implementation-defined character set, which is required to be compatible with C's abstract basic character set. (ASCII qualifies, so do the source-charset and execution-charset allowed by your compiler, including the one you are actually using.)
For the sizes and ranges of the integer types (char included), see your <limits.h>. Here is somebody else's limits.h.
C was designed as a very low-level language, so it is close to the hardware. Usually, after a bit of experience, you can predict how the compiler will allocate memory, and even pretty accurately what the machine code will look like.
Your intuition is right: it goes back to ASCII. ASCII is really a simple 1:1 mapping from letters (which make sense in human language) to integer values (that can be worked with by hardware); for every letter there is an unique integer. For example, the 'letter' CTRL-A is represented by the decimal number '1'. (For historical reasons, lots of control characters came first - so CTRL-G, which rand the bell on an old teletype terminal, is ASCII code 7. Upper-case 'A' and the 25 remaining UC letters start at 65, and so on. See http://www.asciitable.com/ for a full list.)
C lets you 'coerce' variables into other types. In other words, the compiler cares about (1) the size, in memory, of the var (see 'pointer arithmetic' in K&R), and (2) what operations you can do on it.
If memory serves me right, you can't do arithmetic on a char. But, if you call it an int, you can. So, to convert all LC letters to UC, you can do something like:
char letter;
....
if(letter-is-upper-case) {
letter = (int) letter - 32;
}
Some (or most) C compilers would complain if you did not reinterpret the var as an int before adding/subtracting.
but, in the end, the type 'char' is just another term for int, really, since ASCII assigns a unique integer for each letter.

uint64 to string in C

I have a uint64 value that I want to convert into a string because it has to be inserted as the payload of an HTTP POST request.
I've already tried many solutions (ltoa, this solution ) but my problem still remains.
My function is the following:
void check2(char* fingerprint, guint64 s_id) {
//stuff
char poststr[400] = "action=CheckFingerprint&sessionid=";
//convert s_id to, for example, char* myChar
strcat(poststr, myChar);
}
I want to convert s_id to char*. I've tried:
1) char ses[8]; ltoa(s_id,ses,10) but I have a segmentation fault;
2) char *buf; sprintf(buf, "%" PRIu64, s_id);
I'm working on a APIs, so I have seen that when this guint64 variable is printed, it has the following form:
JANUS_LOG(LOG_INFO, "Creating new session: %"SCNu64"\n", session_id);
sprintf is the right way to go with an unsigned 64 bit format specifier.
You'll need to allocate enough space for 16 hex digits and the null byte. Here I've allocated 20 bytes to accommodate a leading 0x as well and then I rounded it up to 20 for no good reason other than it feels better than 19.
char foo[20];
sprintf(foo, "0x%016" PRIx64, (uint64_t)numberToConvert);
will print the number in hex with leading 0x and leading zeros padded up to 16. You do not need the cast if numberToConvert is already a uint64_t
i have a uint64 value that i want to convert into char* because of it have to be inserted as payload of an HTTP POST request.
What you have is a fundamental misunderstanding.
To insert a text representation of your value into a document, you need to convert it to a sequence of characters, which is quite a different thing from a pointer to a character (char *). One of your options, which seems to be what you're really after, is to convert the value to a sequence of characters in the form of a C string -- that is, a null-terminated array of characters. You would then have or be able to obtain a pointer to the first character in the sequence.
That explains what's wrong with this attempted solution:
char *buf;
sprintf(buf, "%" PRIu64, s_id);
You are trying to write the string representation of your number into the array pointed-to by buf, but it doesn't point to one. Not having been initialized or assigned, its value is indeterminate.
Even if you your buf pointed to an array, it is essential that the array be long enough to accommodate all the digits of the value's decimal representation, plus a terminator. That's probably what's wrong with your other attempt:
char ses[8]; ltoa(s_id,ses,10)
An unsigned, 64-bit binary number may require up to 20 decimal digits, plus you need space for a terminator. The array you're providing is not nearly large enough, unless you can be confident that the actual values you're going to write will not exceed 9,999,999 (which is well within the range of a 32-bit integer).

Separated integers with comma delimiters and sscanf

I need help with sscanf.
I have a data file. I read it line by line.
One line is look like this: 23,13,45;
I want to read the integers.
I try this code:
unsigned char a,b,c;
sscanf(line,"%d,%d,%d;",&a,&b,&c);
But this is not working, only the first number read, the others is 0.
This is because %d expects a pointer to a 4-byte integer, and you are passing a pointer to a 1-byte char. Because variables a, b and c are stored in the order of decreasing memory addresses, sscanf first fills a with 23, at the same time filling 3 other bytes of stack memory with zeros (this is a memory violation, BTW). Then it fills b with 13, also filling a and two other bytes with zeros. In the end it fills c with 45, also filling a and b and one other byte with zeros. This way you end up with zeros in both b and a, and an expected value only in c.
Of course this is only one possible scenario of what can happen, as it depends on the architecture and compiler.
A proper way to read 4 integers would be to use int instead of unsigned char, or change the format specifier.
Correct format specifier for unsigned char is %hhu.
Other than that I don't see any problem as long as line does contain the string in the format you expect.

I fail to understand the following assignment

char errorString[20];
*(UInt32*)(errorString + 1) = CFSwapInt32HostToBig(statusCode);
I found this in a book about audio programming and, considering CFSwapInt32HostToBig returns an Int32, I can't understand why does it need to make that strange cast and why it assigns starting with the address of the second element (+1) in the char buffer.
What will errorString contain after this assignment?
errorString+1 (which is of type char*) is casted to pointer to UInt32 and then dereferenced. Hence, the four consequent bytes of errorString, from the second to the fifth (errorString[1] ... errorString[4]), will contain a binary representation of an integer that is result of CFSwapInt32HostToBig(statusCode).
I can't understand why does it need to make that strange cast
The cast is necessary to avoid truncating the data to a single char: if you drop the cast, like this
*(errorString + 1) = CFSwapInt32HostToBig(statusCode);
the assignment will modify a single char. Effectively, it's this:
*(errorString + 1) = (char)CFSwapInt32HostToBig(statusCode);
which is not what the author of the code wanted.
As far as adding a byte goes, the answer depends on the use of errorString: most likely, some other piece of data is supposed to go there.
CFSwapInt32HostToBig returns a value of a 32-bit type but errorString is an array of char.
The programmer wants to store the 4 bytes into the array of char starting from position &errorString[1].
Note that is not safe and should be avoided as it breaks aliasing rules and may break alignment.

Resources