What is wrong with following C code? [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Confused about C macro expansion and integer arithmetic
A riddle (in C)
The expected output of the following C program is to print the elements in the array. But when actually run, it doesn't do so.
#include<stdio.h>
#define TOTAL_ELEMENTS (sizeof(array) / sizeof(array[0]))
int array[] = {23,34,12,17,204,99,16};
int main()
{
int d;
for(d=-1;d <= (TOTAL_ELEMENTS-2);d++)
printf("%d\n",array[d+1]);
return 0;
}

Because sizeof gives you an unsigned value, which you probably would have noticed had you turned up the warning level, such as using -Wall -Wextra with gcc (a):
xyzzy.c: In function 'main':
xyzzy.c:8: warning: comparison between signed and unsigned
If you force it to signed, it works fine:
#define TOTAL_ELEMENTS (int)((sizeof(array) / sizeof(array[0])))
What happens in detail can be gleaned from the ISO standard. In comparisons between different types, promotions are performed to make the types compatible. The compatible type chosen depends on several factors such as sign compatibility, precision and rank but, in this case, it was deemed that the unsigned type size_t was the compatible type so d was upgraded to that type.
Unfortunately, casting -1 to an unsigned type (at least for two's complement which is almost certainly what you're using) results in a rather large positive number.
One that's certainly larger the the 5 you get from (TOTAL_ELEMENTS-2). In other words, your for statement effectively becomes:
for (d = some big honking number way greater than five;
d <= 5;
d++
) {
// fat chance of getting in here !!
}
(a) This requirement to use extra remains a point of contention between the gcc developers and myself. They're obviously using some new definition of the word "all" of which I was previously unaware (with apologies to Douglas Adams).

TOTAL_ELEMENTS is of type size_t, subtracting 2 is done at compile time and so it is 5UL (emphasis on the unsigned suffix). The comparison with the signed integer d is then always false. Try
for(d=-1;d <= (ssize_t)(TOTAL_ELEMENTS-2);d++)
FTW, the intel compiler warns about exactly this when you try and compile the code.

To clarify what went wrong: sizeof() translates to a result type of size_t, which is nothing but an unsigned integer, larger than or equal to unsigned int.
So the result of (sizeof(array) / sizeof(array[0])) is a result in two operands of size_t type. The division is performed on these operands:size_t / size_t. Both operands are of the same type so it works fine. The result of the division is of type size_t, which is the type that TOTAL_ELEMENTS results in.
The expression (TOTAL_ELEMENTS-2) therefore have the types size_t - int, since the integer literal 2 is of type int.
Here we have two different types. What happens then is something called balancing (formally "the usual arithmetic conversions"), which happens when the compiler spots two different types. The balancing rules state that if one operand is signed and the other unsigned, then the signed one is silently, implicitly converted to an unsigned type.
This is what happens in this code. size_t - int is converted to size_t - size_t, then the subtraction is executed, the result is size_t. Then int <= size_t is converted to size_t <= size_t. The variable d turns unsigned, and if it had a negative value, the code goes haywire.

Related

Adding or assigning an integer literal to a size_t

In C I see a lot of code that adds or assigns an integer literal to a size_t variable.
size_t foo = 1;
foo += 1;
What conversion takes place here, and can it ever happen that a size_t is "upgraded" to an int and then converted back to a size_t? Would that still wraparound if I was at the max?
size_t foo = SIZE_MAX;
foo += 1;
Is that defined behavior? It's an unsigned type size_t which is having a signed int added to it (that may be a larger type?) and the converted back to a size_t. Is there risk of signed integer overflow?
Would it make sense to write something like foo + bar + (size_t)1 instead of foo + bar + 1? I never see code like that, but I'm wondering if it's necessary if integer promotions are troublesome.
C89 doesn't say how a size_t will be ranked or what exactly it is:
The value of the result is implementation-defined, and its type (an unsigned integral type) is size_t defined in the header.
The current C standard allows for a possibility of an implementation that would cause undefined behavior when executing the following code, however such implementation does not exist, and probably never will:
size_t foo = SIZE_MAX;
foo += 1;
The type size_t is as unsigned type1, with a minimum range:2 [0,65535].
The type size_t may be defined as a synonym for the type unsigned short. The type unsigned short may be defined having 16 precision bits, with the range: [0,65535]. In that case the value of SIZE_MAX is 65535.
The type int may be defined having 16 precision bits (plus one sign bit), two's complement representation, and range: [-65536,65535].
The expression foo += 1, is equivalent to foo = foo + 1 (except that foo is evaluated only once but that is irrelevant here). The variable foo will get promoted using integer promotions3. It will get promoted to type int because type int can represent all values of type size_t and rank of size_t, being a synonym for unsigned short, is lower than the rank of int. Since the maximum values of size_t, and int are the same, the computation causes a signed overflow, causing undefined behavior.
This holds for the current standard, and it should also hold for C89 since it doesn't have any stricter restrictions on types.
Solution for avoiding signed overflow for any imaginable implementation is to use an unsigned int integer constant:
foo += 1u;
In that case if foo has a lower rank than int, it will be promoted to unsigned int using usual arithmetic conversions.
1 (Quoted from ISO/IEC 9899/201x 7.19 Common definitions 2)
size_t
which is the unsigned integer type of the result of the sizeof operator;
2 (Quoted from ISO/IEC 9899/201x 7.20.3 Limits of other integer types 2)
limit of size_t
SIZE_MAX 65535
3 (Quoted from ISO/IEC 9899/201x 6.3.1.1 Boolean, characters, and integers 2)
The following may be used in an expression wherever an int or unsigned int may
be used:
An object or expression with an integer type (other than int or unsigned int)
whose integer conversion rank is less than or equal to the rank of int and
unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions. All other types are unchanged by the
integer promotions.
It depends, since size_t is an implementation-defined unsigned integral type.
Operations involving a size_t will therefore introduce promotions, but these depend on what size_t actually is, and what other types involved in the expression actually are.
If size_t was equivalent to a unsigned short (e.g. a 16-bit type) then
size_t foo = 1;
foo += 1;
would (semantically) promote foo to a int, add 1, and then convert the result back to size_t for storing in foo. (I say "semantically", because that is the meaning of the code according to the standard. A compiler is free to apply the "as if" rule - i.e. do anything it likes, as long as it delivers the same net effect).
On another hand, if size_t was equivalent to a long long unsigned (e.g. a 64-bit signed type), then the same code would promote 1 to be of type long long unsigned, add that to the value of foo, and store the result back into foo.
In both cases, the net result is the same unless an overflow occurs. In this case, there is no overflow, since an both int and size_t are guaranteed able to represent the values 1 and 2.
If an overflow occurs (e.g. adding a larger integral value), then the behaviour can vary. Overflow of a signed integral type (e.g. int) results in undefined behaviour. Overflow of an unsigned integral type uses modulo arithmetic.
As to the code
size_t foo = SIZE_MAX;
foo += 1;
it is possible to do the same sort of analysis.
If size_t is equivalent to a unsigned short then foo would be converted to int. If int is equivalent to a signed short, it cannot represent the value of SIZE_MAX, so the conversion will overflow, and the result is undefined behaviour. If int is able to represent a larger range than short int (e.g. it is equivalent to long), then the conversion of foo to int will succeed, incrementing that value will succeed, and storing back to size_t will use modulo arithmetic and produce the result of 0.
If size_t is equivalent to unsigned long, then the value 1 will be converted to unsigned long, adding that to foo will use modulo arithmetic (i.e. produce a result of zero), and that will be stored into foo.
It is possible to do similar analyses assuming that size_t is actually other unsigned integral types.
Note: In modern systems, a size_t that is the same size or smaller than an int is unusual. However, such systems have existed (e.g. Microsoft and Borland C compilers targeting 16-bit MS-DOS on hardware with an 80286 CPU). There are also 16-bit microprocessors still in production, mostly for use in embedded systems with lower power usage and low throughput requirements, and C compilers that target them (e.g. Keil C166 compiler which targets the Infeon XE166 microprocessor family). [Note: I've never had reason to use the Keil compiler but, given its target platform, it would not be a surprise if it supports a 16-bit size_t that is the same size or smaller than the native int type on that platform].
foo += 1 means foo = foo + 1. If size_t is narrower than int (that is, int can represent all values of size_t), then foo is promoted to int in the expression foo + 1.
The only way this could overflow is if INT_MAX == SIZE_MAX. Theoretically that is possible, e.g. 16-bit int and 15-bit size_t. (The latter probably would have 1 padding bit).
More likely, SIZE_MAX will be less than INT_MAX, so the code will be implementation-defined due to out-of-range assignment. Normally the implementation definition is the "obvious" one, high bits are discarded, so the result will be 0.
As a practical decision I would not recommend mangling your code to cater to these cases (15-bit size_t, or non-obvious implementation-definition) which probably have never happened and never will. Instead, you could do some compile-time tests that will give an error if these cases do occur. A compile-time assertion that INT_MAX < SIZE_MAX would be practical in this day and age.

Why should the output of this C code be "no"? [duplicate]

This question already has answers here:
unsigned int and signed char comparison
(4 answers)
Closed 6 years ago.
I came across this question.
#include <stdio.h>
int main(void) {
// your code goes here
unsigned int i = 23;
signed char c = -23;
if (i > c)
printf("yes");
else
printf("no");
return 0;
}
I am unable to understand why the output of this code is no.
Can someone help me in understanding how the comparison operator works when comparison is done between int and char in C ?
You are comparing an unsigned int to a signed char. The semantics of this kind of comparison are counter-intuitive: most binary operations involving signed and unsigned operands are performed on unsigned operands, after converting the signed value to unsigned (if both operands have the same size after promotion). Here are the steps:
The signed char value is promoted to an int with the same value -23.
comparison is to be performed on int and unsigned int, the common type is unsigned int as defined in the C Standard.
The int is converted to an unsigned int with value UINT_MAX - 23, a very large number.
The comparison is performed on the unsigned int values: 23 is the smaller value, the comparison evaluates to false.
The else branch is evaluated, no is printed.
To make matters worse, if c had been defined as a long, the result would have depended on whether long and int have the same size or not. On Windows, it would print no, whereas on 64 bit Linux, it would print yes.
Never mix signed and unsigned values in comparisons. Enable compiler warnings to prevent this kind of mistake (-Wall or -Weverything). You might as well make all these warnings fatal with -Werror to avoid this kind of ill-fated code entirely.
For a complete reference, read the following sections of the C Standard (C11) under 6.3 Conversions:
integer promotions are explained in 6.3.1.1 Boolean, characters, and integers.
operand conversions are detailed in 6.3.1.8 Usual arithmetic conversions.
You can download the latest draft of the C11 Standard from the working group website: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

What is the proper way to store narrower data types into a wider data type in the C language?

I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.
Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?
Update - Code Sample
Here's what I'm doing:
// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;
I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.
So do some unsigned casting on the right.
Expressions are generally evaluated independently; their results are not affected by the context in which they appear.
An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.
I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).
So given:
unsigned long foo = 3 * 1024 * 1024 * 1024;
the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.
The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).
The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:
unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;
(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)
Let's look at the more general case:
int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;
You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:
unsigned long foo = (unsigned long)a *
(unsigned long)b *
(unsigned long)d *
(unsigned long)d;
This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).
NOTE: This will not work:
unsigned long foo = (unsigned long)(a * b * c * d);
The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.
Integral literals with a suffix are int if they can fit, in your case 3 and 1024 can definitely fit. This is covered in the draft C99 standard section 6.4.4.1 Integer constants, a quote of this section can be found in my answer to Are C macros implicitly cast?.
Next we have the multiplication, which performs the usual arithmetic conversions conversions on it's operands but since they are all int the result of which is too large to fit in a signed int which results in overflow. This is undefined behavior as per section 5 which says:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
We can discover this undefined behavior empirically using clang and the -fsanitize=undefined flags (see it live) which says:
runtime error: signed integer overflow: 3145728 * 1024 cannot be represented in type 'int'
Although in two complement this will just end up being a negative number. One way to fix this would be to use the ul suffix:
3ul * 1024ul * 1024ul * 1024ul
So why does a negative number converted to an unsigned value give a very large unsigned value? This is covered in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
which basically means unsigned long max + 1 is added to the negative number which results in very large unsigned value.

Inconsistent behaviour of implicit conversion between unsigned and bigger signed types

Consider following example:
#include <stdio.h>
int main(void)
{
unsigned char a = 15; /* one byte */
unsigned short b = 15; /* two bytes */
unsigned int c = 15; /* four bytes */
long x = -a; /* eight bytes */
printf("%ld\n", x);
x = -b;
printf("%ld\n", x);
x = -c;
printf("%ld\n", x);
return 0;
}
To compile I am using GCC 4.4.7 (and it gave me no warnings):
gcc -g -std=c99 -pedantic-errors -Wall -W check.c
My result is:
-15
-15
4294967281
The question is why both unsigned char and unsigned short values are "propagated" correctly to (signed) long, while unsigned int is not ? Is there any reference or rule on this ?
Here are results from gdb (words are in little-endian order) accordingly:
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 11111111111111111111111111111111
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 00000000000000000000000000000000
This is due to how the integer promotions applied to the operand and the requirement that the result of unary minus have the same type. This is covered in section 6.5.3.3 Unary arithmetic operators and says (emphasis mine going forward):
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
and integer promotion which is covered in the draft c99 standard section 6.3 Conversions and says:
if an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.
In the first two cases, the promotion will be to int and the result will be int. In the case of unsigned int no promotion is required but the result will require a conversion back to unsigned int.
The -15 is converted to unsigned int using the rules set out in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)
So we end up with -15 + (UMAX + 1) which results in UMAX - 14 which results in a large unsigned value. This is sometimes why you will see code use -1 converted to to an unsigned value to obtain the max unsigned value of a type since it will always end up being -1 + UMAX + 1 which is UMAX.
int is special. Everything smaller than int gets promoted to int in arithmetic operations.
Thus -a and -b are applications of unary minus to int values of 15, which just work and produce -15. This value is then converted to long.
-c is different. c is not promoted to an int as it is not smaller than int. The result of unary minus applied to an unsigned int value of k is again an unsigned int, computed as 2N-k (N is the number of bits).
Now this unsigned int value is converted to long normally.
This behavior is correct. Quotes are from C 9899:TC2.
6.5.3.3/3:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
6.2.5/9:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
6.3.1.1/2:
The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
So for long x = -a;, since the operand a, an unsigned char, has conversion rank less than the rank of int and unsigned int, and all unsigned char values can be represented as int (on your platform), we first promote to type int. The negative of that is simple: the int with value -15.
Same logic for unsigned short (on your platform).
The unsigned int c is not changed by promotion. So the value of -c is calculated using modular arithmetic, giving the result UINT_MAX-14.
C's integer promotion rules are what they are because standards-writers wanted to allow a wide variety of existing implementations that did different things, in some cases because they were created before there were "standards", to keep on doing what they were doing, while defining rules for new implementations that were more specific than "do whatever you feel like". Unfortunately, the rules as written make it extremely difficult to write code which doesn't depend upon a compiler's integer size. Even if future processors would be able to perform 64-bit operations faster than 32-bit ones, the rules dictated by the standards would cause a lot of code to break if int ever grew beyond 32 bits.
It would probably in retrospect have been better to have handled "weird" compilers by explicitly recognizing the existence of multiple dialects of C, and recommending that compilers implement a dialect that handles various things in consistent ways, but providing that they may also implement dialects which do them differently. Such an approach may end up ultimately being the only way that int can grow beyond 32 bits, but I've not heard of anyone even considering such a thing.
I think the root of the problem with unsigned integer types stems from the fact that they are sometimes used to represent numerical quantities, and are sometimes used to represent members of a wrapping abstract algebraic ring. Unsigned types behave in a manner consistent with an abstract algebraic ring in circumstances which do not involve type promotion. Applying a unary minus to a member of a ring should (and does) yield a member of that same ring which, when added to the original, will yield zero [i.e. the additive inverse]. There is exactly one way to map integer quantities to ring elements, but multiple ways exist to map ring elements back to integer quantities. Thus, adding a ring element to an integer quantity should yield an element of the same ring regardless of the size of the integer, and conversion from rings to integer quantities should require that code specify how the conversion should be performed. Unfortunately, C implicitly converts rings to integers in cases where either the size of the ring is smaller than the default integer type, or when an operation uses a ring member with an integer of a larger type.
The proper solution to solve this problem would be to allow code to specify that certain variables, return values, etc. should be regarded as ring types rather than numbers; an expression like -(ring16_t)2 should yield 65534 regardless of the size of int, rather than yielding 65534 on systems where int is 16 bits, and -2 on systems where it's larger. Likewise, (ring32)0xC0000001 * (ring32)0xC0000001 should yield (ring32)0x80000001 even if int happens to be 64 bits [note that if int is 64 bits, the compiler could legally do anything it likes if code tries to multiply two unsigned 32-bit values which equal 0xC0000001, since the result would be too large to represent in a 64-bit signed integer.
Negatives are tricky. Especially when it comes to unsigned values. If you look at the c-documentation, you'll notice that (contrary to what you'd expect) unsigned chars and shorts are promoted to signed ints for computing, while an unsigned int will be computed as an unsigned int.
When you compute the -c, the c is treated as an int, it becomes -15, then is stored in x, (which still believes it is an UNSIGNED int) and is stored as such.
For clarification - No ACTUAL promotion is done when "negativeing" an unsigned. When you assign a negative to any type of int (or take a negative) the 2's compliment of the number is instead used. Since the only practical difference between unsigned and signed values is that the MSB acts as a sign flag, it is taken as a very large positive number instead of a negative one.

Why use "strlen30()" instead of "strlen()"?

I've read and wondered about the source code of sqlite
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Why use strlen30() instead of strlen() (in string.h)??
The commit message that went in with this change states:
[793aaebd8024896c] part of check-in [c872d55493] Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007) (user: drh branch: trunk)
(this is my answer from Why reimplement strlen as loop+subtraction? , but it was closed)
I can't tell you the reason why they had to re-implement it, and why they chose int instead if size_t as the return type. But about the function:
/*
** Compute a string length that is limited to what can be stored in
** lower 30 bits of a 32-bit signed integer.
*/
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Standard References
The standard says in (ISO/IEC 14882:2003(E)) 3.9.1 Fundamental Types, 4.:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 41)
...
41): This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer
type
That part of the standard does not define overflow-behaviour for signed integers. If we look at 5. Expressions, 5.:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression
(5.19), in which case the program is ill-formed. [Note: most existing implementations of C + + ignore integer
overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point
exceptions vary among machines, and is usually adjustable by a library function. ]
So far for overflow.
As for subtracting two pointers to array elements, 5.7 Additive operators, 6.:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as ptrdiff_t in the cstddef header (18.1). [...]
Looking at 18.1:
The contents are the same as the Standard C library header stddef.h
So let's look at the C standard (I only have a copy of C99, though), 7.17 Common Definitions :
The types used for size_t and ptrdiff_t should not have an integer conversion rank
greater than that of signed long int unless the implementation supports objects
large enough to make this necessary.
No further guarantee made about ptrdiff_t. Then, Annex E (still in ISO/IEC 9899:TC2) gives the minimum magnitude for signed long int, but not a maximum:
#define LONG_MAX +2147483647
Now what are the maxima for int, the return type for sqlite - strlen30()? Let's skip the C++ quotation that forwards us to the C-standard once again, and we'll see in C99, Annex E, the minimum maximum for int:
#define INT_MAX +32767
Summary
Usually, ptrdiff_t is not bigger than signed long, which is not smaller than 32bits.
int is just defined to be at least 16bits long.
Therefore, subtracting two pointers may give a result that does not fit into the int of your platform.
We remember from above that for signed types, a result that does not fit yields undefined behaviour.
strlen30 does applies a bitwise or upon the pointer-subtract-result:
| 32 bit |
ptr_diff |10111101111110011110111110011111| // could be even larger
& |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
----------------------------------
= |00111101111110011110111110011111| // truncated
That prevents undefiend behaviour by truncation of the pointer-subtraction result to a maximum value of 3FFFFFFF16 = 107374182310.
I am not sure about why they chose exactly that value, because on most machines, only the most significant bit tells the signedness. It could have made sense versus the standard to choose the minimum INT_MAX, but 1073741823 is indeed slightly strange without knowing more details (though it of course perfectly does what the comment above their function says: truncate to 30bits and prevent overflow).
The CVS commit message says:
Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007)
I couldn't find any further reference to this commit or explanation how they got an overflow in that place. I believe that it was an error reported by some static code analysis tool.

Resources