I am trying to compare two chars with each other but treating them as integers. These are struct values in a linked list. I have printed out temp->next->variable and temp->variable and confirmed that the if statement should hold, ex: 3 > 2. But I'm thinking it might not work because they are char.
Will the fact that they are char values have an impact on the comparison?
if(temp->next->variable > temp->variable)
{
....
}
Chars are integers, of 8-bit length, so if you're always treating the chars as ints, this will have the exact behavior you desire. Be wary, as some compilers allow char to default to unsigned, some to signed, which may create issues seemingly at random. (If both numbers have the same highest order bit, it won't make a difference, otherwise the result will be opposite what you expect, if the signed-ness is also opposite what you expect.)
If you're treating them as characters, then this will give you their lexicographical comparison, which is based on the interal integer representation of that character. It may be good to check what your locale is -- whether your program will be using look up tables of ASCII or Unicode 8-bit, or anything else.
If you still get hidden issues, a common mistake is having many layered pointers, and even though the arrow [->] is used throughout, you may still need to apply a de-reference [*], or else you'll secretly be testing their relative locations in memory.
According to the C Standard (6.5.8 Relational operators)
3 If both of the operands have arithmetic type, the usual arithmetic
conversions are performed.
The usual arithmetic conversion includes the integer promotion that particularly means that objects of type char are converted to objects of type int
Take into account that type char can behave either as type unsigned char or signed char. So you can get different results of the comparison if sign bits of objects of type char are set.
Related
in some code seen online, i saw that in read function in C, someone uses a uint8_t array for buffer insted of a char array buffer.
what are the differences?
thanks
The C standard allows char to be signed or unsigned. It also allows it to be more than eight bits.
uint8_t, if it is defined, is unsigned and eight bits. This allows programmers to be completely sure of the type that will be used. In particular, signed char types sometimes cause problems with bitwise and shift operations, due to how these operations are defined (or are not defined) when negative values are involved.
So every char corresponds to a number(see ascii table here). I think people use this to avoid some problems(sorry I don't use c I come from c++)
I was able to use bit operations to "turn off" binary digits of a number.
Ex:
x = x & ~(1<<0)
x = x & ~(1<<1)
(and repeat until desired number of digits starting from the right are changed to 0)
I would like to apply this technique to a pointer's address.
Unfortunately, the & operator cannot be used with pointers. Using the same lines of code as above, where x is a pointer, the compiler says "invalid operands to binary & (have int and int)."
I tried to typecast the pointers as ints, but that doesn't work as I assume the ints are too small (and I just realized I'm not allowed to cast).
(note: though this is part of a homework problem, I've already reasoned out why I need to turn off some digits after a good couple hours, so I'm fine in that regard. I'm simply trying to see if I can get a clever technique to do what I want to do here).
Restrictions: I cannot use loops, conditionals, any special functions, constants greater than 255, division, mod.
(edit: added restrictions to the bottom)
Use uintptr_t from <stdint.h>. You should always use unsigned types for bit twiddling, and (u)intptr_t is specifically chosen to be able to hold a pointer's value.
Note however that adjusting a pointer manually and dereferencing it is undefined behaviour, so watch your step. You shall be able to recover the exact original value of the pointer (or another valid pointer) before doing so.
Edit : from your comment I understand that you don't plan on dereferencing the twiddled pointer at all, so no undefined behaviour for you. Here is how you can check if your pointers share the same 64-byte block :
uintptr_t p1 = (uintptr_t)yourPointer1;
uintptr_t p2 = (uintptr_t)yourPointer2;
uintptr_t mask = ~(uintptr_t)63u; // Shave off 5 low-order bits
return (p1 & mask) == (p2 & mask);
C language standard library includes the (optional though) type intptr_t, for which there is guarantee that "any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Of course if you perform bitwise operation on the integer than the result is undefined behaviour.
Edit:
How unfortunate haha. I need a function to show two pointers are in
the same 64-byte block of memory. This holds true so long as every
digit but the least significant 6 digits of their binary
representations are equal. By making sure the last 6 digits are all
the same (ex: 0), I can return true if both pointers are equal. Well,
at least I hope so.
You should be able to check if they're in the same 64 block of memory by something like this:
if ((char *)high_pointer - (char *)low_pointer < 64) {
// do stuff
}
Edit2: This is likely to be undefined behaviour as pointed out by chris.
Original post:
You're probably looking for intptr_t or uintptr_t. The standard says you can cast to and from these types to pointers and have the value equal to the original.
However, despite it being a standard type, it is optional so some library implementations may choose not to implement it. Some architectures might not even represent pointers as integers so such a type wouldn't make sense.
It is still better than casting to and from an int or a long since it is guaranteed to work on implementations that supply it. Otherwise, at least you'll know at compile time that your program will break on a certain implementation/architecture.
(Oh, and as other answers have stated, manually changing the pointer when casted to an integer type and dereferencing it is undefined behaviour)
Some dynamically-typed languages use pointer tagging as a quick way to identify or narrow down the runtime type of the value being represented. A classic way to do this is to convert pointers to a suitably sized integer, and add a tag value over the least significant bits which are assumed to be zero for aligned objects. When the object needs to be accessed, the tag bits are masked away, the integer is converted to a pointer, and the pointer is dereferenced as normal.
This by itself is all in order, except it all hinges on one colossal assumption: that the aligned pointer will convert to an integer guaranteed to have zero bits in the right places.
Is it possible to guarantee this according to the letter of the standard?
Although standard section 6.3.2.3 (references are to the C11 draft) says that the result of a conversion from pointer to integer is implementation-defined, what I'm wondering is whether the pointer arithmetic rules in 6.5.2.1 and 6.5.6 effectively constrain the result of pointer->integer conversion to follow the same predictable arithmetic rules that many programs already assume. (6.3.2.3 note 67 seemingly suggests that this is the intended spirit of the standard anyway, not that that means much.)
I'm specifically thinking of the case where one might allocate a large array to act as a heap for the dynamic language, and therefore the pointers we're talking about are to elements of this array. I'm assuming that the start of the C-allocated array itself can be placed at an aligned position by some secondary means (by all means discuss this too though). Say we have an array of eight-byte "cons cells"; can we guarantee that the pointer to any given cell will convert to an integer with the lowest three bits free for a tag?
For instance:
typedef Cell ...; // such that sizeof(Cell) == 8
Cell heap[1024]; // such that ((uintptr_t)&heap[0]) & 7 == 0
((char *)&heap[11]) - ((char *)&heap[10]); // == 8
(Cell *)(((char *)&heap[10]) + 8); // == &heap[11]
&(&heap[10])[0]; // == &heap[10]
0[heap]; // == heap[0]
// So...
&((char *)0)[(uintptr_t)&heap[10]]; // == &heap[10] ?
&((char *)0)[(uintptr_t)&heap[10] + 8]; // == &heap[11] ?
// ...implies?
(Cell *)((uintptr_t)&heap[10] + 8); // == &heap[11] ?
(If I understand correctly, if an implementation provides uintptr_t then the undefined behaviour hinted at in 6.3.2.3 paragraph 6 is irrelevant, right?)
If all of these hold, then I would assume that it means that you can in fact rely on the low bits of any converted pointer to an element of an aligned Cell array to be free for tagging. Do they && does it?
(As far as I'm aware this question is hypothetical since the normal assumption holds for common platforms anyway, and if you found one where it didn't, you probably wouldn't want to look to the C standard for guidance rather than the platform docs; but that's beside the point.)
This by itself is all in order, except it all hinges on one colossal
assumption: that the aligned pointer will convert to an integer
guaranteed to have zero bits in the right places.
Is it possible to guarantee this according to the letter of the
standard?
It's possible for an implementation to guarantee this. The result of converting a pointer to an integer is implementation-defined, and an implementation can define it any way it likes, as long as it meets the standard's requirements.
The standard absolutely does not guarantee this in general.
A concrete example: I've worked on a Cray T90 system, which had a C compiler running under a UNIX-like operating system. In the hardware, an address is a 64-bit word containing the address of a 64-bit word; there were no hardware byte addresses. Byte pointers (void*, char*) were implemented in software by storing a 3-bit offset in the otherwise unused high-order 3 bits of a 64-bit word pointer.
All pointer-to-pointer, pointer-to-integer, and integer-to-pointer conversions simply copied the representation.
Which means that a pointer to an 8-byte aligned object, when converted to an integer, could have any bit pattern in its low-order 3 bits.
Nothing in the standard forbids this.
The bottom line: A scheme like the one you describe, that plays games with pointer representations, can work if you make certain assumptions about how the current system represents pointers -- as long as those assumptions happen to be valid for the current system.
But no such assumptions can be 100% reliable, because the standard says nothing about how pointers are represented (other than that they're of a fixed size for each pointer type, and that the representation can be viewed as an array of unsigned char).
(The standard doesn't even guarantee that all pointers are the same size.)
You're right about the relevant parts of the standard. For reference:
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
Since the conversions are implementation defined (except when the integer type is too small, in which case it's undefined), there's nothing the standard is going to tell you about this behaviour. If your implementation makes the guarantees you want, you're set. Otherwise, too bad.
I guess the answer to your explicit question:
Is it possible to guarantee this according to the letter of the standard?
Is "yes", since the standard punts on this behaviour and says the implementation has to define it. Arguably, "no" is just as good an answer for the same reason.
My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?
They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.
Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)
It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.
Aside from the fact that the first also defines an object iof type int, the semantics are identical.
i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.
I am currently dealing with code purchased from a third party contractor. One struct has an unsigned char field while the function that they are passing that field to requires a signed char. The compiler does not like this, as it considers them to be mismatched types. However, it apparently compiles for that contractor. Some Googling has told me that "[i]t is implementation-defined whether a char object can hold negative values". Could the contractor's compiler basically ignore the signed/unsigned type and treat them the same? Or is there a compiler flag that will treat them the same?
C is not my strongest language--just look at my tags on my user page--so any help would be much appreciated.
Actually char, signed char and unsigned char are three different types. From the standard (ISO/IEC 9899:1990):
6.1.2.5 Types
...
The three types char, signed char and
unsigned char are collectively called
the character types.
(and in C++ for instance you have to (or at least should) write override functions with three variants of them if you have a char argument)
Plain char might be treated signed or unsigned by the compiler, but the standard says (also in 6.1.2.5):
An object declared as type char is
large enough to store any member of
the basic execution character set. If
a member of the required source
character set in 5.2.1 is stored in a
char object, its value is guarantied
to be positive. If other quantities
are stored in a char object, the
behavior is implementation-defined:
the values are treated as either
signed or nonnegative integers.
and
An object declared as type signed char occupies the same amount of storage as a ''plain'' char object.
The characters referred to in 5.2.1 are A-Z, a-z, 0-9, space, tab, newline and the following 29 graphic characters:
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
Answer
All of that I interpret to basically mean that ascii characters with value less than 128 are guarantied to be positive. So if the values stored always are less than 128 it should be safe (from a value preserving perspective), although not so good practice.
This is compiler-dependent. For example, in VC++ there's a compiler option and a corresponding _CHAR_UNSIGNED macro defined if that option instructs to use unsigned char by default.
I take it that you're talking about fields of type signed char and unsigned char, so they're explicitly wrong. If one of them was simply char, it might match in whatever compiler the contractor is using (IIRC, it's implementation-defined whether char is signed or unsigned), but not in yours. In that case, you might be able to get by with a command-line option or something to change yours.
Alternatively, the contractor might be using a compiler, or compiler options, that allow him to compile while ignoring errors or warnings. Do you know what sort of compilation environment he has?
In any case, this is not good C. If one of the types is just char, it relies on implementation-defined behavior, and therefore isn't portable. If not, it's flat wrong. I'd take this up with the contractor.