Short-circuiting on boolean operands without side effects - c

For the bounty: How can this behavior can be disabled on a case-by-case basis without disabling or lowering the optimization level?
The following conditional expression was compiled on MinGW GCC 3.4.5, where a is a of type signed long, and m is of type unsigned long.
if (!a && m > 0x002 && m < 0x111)
The CFLAGS used were -g -O2. Here is the corresponding assembly GCC output (dumped with objdump)
120: 8b 5d d0 mov ebx,DWORD PTR [ebp-0x30]
123: 85 db test ebx,ebx
125: 0f 94 c0 sete al
128: 31 d2 xor edx,edx
12a: 83 7d d4 02 cmp DWORD PTR [ebp-0x2c],0x2
12e: 0f 97 c2 seta dl
131: 85 c2 test edx,eax
133: 0f 84 1e 01 00 00 je 257 <_MyFunction+0x227>
139: 81 7d d4 10 01 00 00 cmp DWORD PTR [ebp-0x2c],0x110
140: 0f 87 11 01 00 00 ja 257 <_MyFunction+0x227>
120-131 can easily be traced as first evaluating !a, followed by the evaluation of m > 0x002. The first jump conditional does not occur until 133. By this time, two expressions have been evaluated, regardless of the outcome of the first expression: !a. If a was equal to zero, the expression can (and should) be concluded immediately, which is not done here.
How does this relate to the the C standard, which requires Boolean operators to short-circuit as soon as the outcome can be determined?

The C standard only specifies the behavior of an "abstract machine"; it does not specify the generation of assembly. As long as the observable behavior of a program matches that on the abstract machine, the implementation can use whatever physical mechanism it likes for implementing the language constructs. The relevant section in the standard (C99) is 5.1.2.3 Program execution.

It is probably a compiler optimization since comparing integral types has no side effects. You could try compiling without optimizations or using a function that has side effects instead of the comparison operator and see if it still does this.
For example, try
if (printf("a") || printf("b")) {
printf("c\n");
}
and it should print ac

As others have mentioned, this assembly output is a compiler optimization that doesn't affect program execution (as far as the compiler can tell). If you want to selectively disable this optimization, you need to tell the compiler that your variables should not be optimized across the sequence points in the code.
Sequence points are control expressions (the evaluations in if, switch, while, do and all three sections of for), logical ORs and ANDs, conditionals (?:), commas and the return statement.
To prevent compiler optimization across these points, you must declare your variable volatile. In your example, you can specify
volatile long a;
unsigned long m;
{...}
if (!a && m > 0x002 && m < 0x111) {...}
The reason that this works is that volatile is used to instruct the compiler that it can't predict the behavior of an equivalent machine with respect to the variable. Therefore, it must strictly obey the sequence points in your code.

The compiler's optimising - it gets the result into EBX, moves it to AL, part of EAX, does the second check into EDX, then branches based on the comparison of EAX and EDX. This saves a branch and leaves the code running faster, without making any difference at all in terms of side effects.
If you compile with -O0 rather than -O2, I imagine it will produce more naive assembly that more closely matches your expectations.

The code is behaving correctly (i.e., in accordance with the requirements of the language standard) either way.
It appears that you're trying to find a way to generate specific assembly code. Of two possible assembly code sequences, both of which behave the same way, you find one satisfactory and the other unsatisfactory.
The only really reliable way to guarantee the satisfactory assembly code sequence is to write the assembly code explicitly. gcc does support inline assembly.
C code specifies behavior. Assembly code specifies machine code.
But all this raises the question: why does it matter to you? (I'm not saying it shouldn't, I just don't understand why it should.)
EDIT: How exactly are a and m defined? If, as you suggest, they're related to memory-mapped devices, then they should be declared volatile -- and that might be exactly the solution to your problem. If they're just ordinary variables, then the compiler can do whatever it likes with them (as long as it doesn't affect the program's visible behavior) because you didn't ask it not to.

Related

using definitions in c inefficient?

is using definitions in c like this:
#define C1 42
#define C2 10
#define finalC C1 * C2
inefficient as this code will be ALWAYS run during run-time even though the calculation might not be needed?
as this code will be ALWAYS run during run-time even though the calculation might not be needed?
The multiplication will most likely not be executed at run-time. For example, this short function:
#define C1 42
#define C2 10
#define finalC C1 * C2
int foo() { return finalC; }
may compile into:
foo():
push rbp
mov rbp, rsp
mov eax, 420
pop rbp
ret
and that's without any optimization flags!
The macro expansion happens before the compiler processes the code; and it is allowed, and tends to, evaluate constant expressions.
PS - The compiler used was GCC 11.3 and the target architecture is x86_64.
No reasonable compiler will multiple 42 by 10 for this code at run-time. The product, 420, will be computed during compilation (unless optimized out because it is not even needed in the final program).
The macro definitions and replacements are evaluated during program translation, not while the program is running.

Assembly procedures called and defined in C code

I have the following assembly code:
global stuff
stuff:
;do stuff
I wish to call this from C code, so would it be able to be called from a C program which contains it in _asm()?
Just because Linux does it is not a reason to do something. I am baffled by the use/desire to use inline assembly, but yes, sure, there are ways to do this which you could easily figure out. If you are asking this you are not quite ready to write an operating system. Keep working on it though. You need C and tool basics before you begin. An operating system is essentially a big bare-metal program.
If you tagged this nasm then you are not interested in inline asm anyway, just use real asm gas or nasm.
This
int fun ( void )
{
return 5;
}
does/can become:
0000000000000000 <fun>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3
so that means I can do this
.globl fun
fun:
mov $0x5,%eax
retq
and this
#include <stdio.h>
int fun ( void );
int main ( void )
{
printf("%d\n",fun());
return 0;
}
and build a binary linking the two parts which prints 5 when run.
So then with nasm I can
global fun
fun:
mov eax,5
ret
confirming it is the same machine code in this case or at least an equivalent.
0000000000000000 <fun>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3 retq
so I can link that in instead and it prints 5 as well.
So now I can do a simple inline, very real asm like, perhaps what you were asking
#include <stdio.h>
int fun ( void );
asm(".globl fun ; fun: mov $0x5,%eax ; retq");
int main ( void )
{
printf("%d\n",fun());
return 0;
}
This was using gcc, inline asm is tool specific and not assumed to be portable.
And now you can grossly over complicate it from there.
Using an abstraction to perform I/O operations ("its basically writing byte x to port y") in an OS (or anywhere) is absolutely the right thing to do (you do not want to inline something like that), so a separate function be it real asm or C or some hybrid is a good idea worth pursuing. At the end of the day though for an access type function like that you need to be in complete control over the instruction used so however you choose to do that is up to you. But elementary use of tools and the language is a required before starting any kind of work like this. You can examine different operating systems that exist now as a reference, but this is yours not theirs, your personal preferences not someone else's, your knowledge of the language and tools and assumptions not someone else's. They may have a system level implementation of something that you may not see all of and can fall into traps by simply copying a piece here or there.

C reference to structure member without initializing value [duplicate]

If in C I write:
int num;
Before I assign anything to num, is the value of num indeterminate?
Static variables (file scope and function static) are initialized to zero:
int x; // zero
int y = 0; // also zero
void foo() {
static int x; // also zero
}
Non-static variables (local variables) are indeterminate. Reading them prior to assigning a value results in undefined behavior.
void foo() {
int x;
printf("%d", x); // the compiler is free to crash here
}
In practice, they tend to just have some nonsensical value in there initially - some compilers may even put in specific, fixed values to make it obvious when looking in a debugger - but strictly speaking, the compiler is free to do anything from crashing to summoning demons through your nasal passages.
As for why it's undefined behavior instead of simply "undefined/arbitrary value", there are a number of CPU architectures that have additional flag bits in their representation for various types. A modern example would be the Itanium, which has a "Not a Thing" bit in its registers; of course, the C standard drafters were considering some older architectures.
Attempting to work with a value with these flag bits set can result in a CPU exception in an operation that really shouldn't fail (eg, integer addition, or assigning to another variable). And if you go and leave a variable uninitialized, the compiler might pick up some random garbage with these flag bits set - meaning touching that uninitialized variable may be deadly.
0 if static or global, indeterminate if storage class is auto
C has always been very specific about the initial values of objects. If global or static, they will be zeroed. If auto, the value is indeterminate.
This was the case in pre-C89 compilers and was so specified by K&R and in DMR's original C report.
This was the case in C89, see section 6.5.7 Initialization.
If an object that has automatic
storage duration is not initialized
explicitely, its value is
indeterminate. If an object that has
static storage duration is not
initialized explicitely, it is
initialized implicitely as if every
member that has arithmetic type were
assigned 0 and every member that has
pointer type were assigned a null
pointer constant.
This was the case in C99, see section 6.7.8 Initialization.
If an object that has automatic
storage duration is not initialized
explicitly, its value is
indeterminate. If an object that has
static storage duration is not
initialized explicitly, then: — if it
has pointer type, it is initialized to
a null pointer; — if it has arithmetic
type, it is initialized to (positive
or unsigned) zero; — if it is an
aggregate, every member is initialized
(recursively) according to these
rules; — if it is a union, the first
named member is initialized
(recursively) according to these
rules.
As to what exactly indeterminate means, I'm not sure for C89, C99 says:
3.17.2 indeterminate valueeither an unspecified value or a trap
representation
But regardless of what standards say, in real life, each stack page actually does start off as zero, but when your program looks at any auto storage class values, it sees whatever was left behind by your own program when it last used those stack addresses. If you allocate a lot of auto arrays you will see them eventually start neatly with zeroes.
You might wonder, why is it this way? A different SO answer deals with that question, see: https://stackoverflow.com/a/2091505/140740
It depends on the storage duration of the variable. A variable with static storage duration is always implicitly initialized with zero.
As for automatic (local) variables, an uninitialized variable has indeterminate value. Indeterminate value, among other things, mean that whatever "value" you might "see" in that variable is not only unpredictable, it is not even guaranteed to be stable. For example, in practice (i.e. ignoring the UB for a second) this code
int num;
int a = num;
int b = num;
does not guarantee that variables a and b will receive identical values. Interestingly, this is not some pedantic theoretical concept, this readily happens in practice as consequence of optimization.
So in general, the popular answer that "it is initialized with whatever garbage was in memory" is not even remotely correct. Uninitialized variable's behavior is different from that of a variable initialized with garbage.
Ubuntu 15.10, Kernel 4.2.0, x86-64, GCC 5.2.1 example
Enough standards, let's look at an implementation :-)
Local variable
Standards: undefined behavior.
Implementation: the program allocates stack space, and never moves anything to that address, so whatever was there previously is used.
#include <stdio.h>
int main() {
int i;
printf("%d\n", i);
}
compile with:
gcc -O0 -std=c99 a.c
outputs:
0
and decompiles with:
objdump -dr a.out
to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 10 sub $0x10,%rsp
40053e: 8b 45 fc mov -0x4(%rbp),%eax
400541: 89 c6 mov %eax,%esi
400543: bf e4 05 40 00 mov $0x4005e4,%edi
400548: b8 00 00 00 00 mov $0x0,%eax
40054d: e8 be fe ff ff callq 400410 <printf#plt>
400552: b8 00 00 00 00 mov $0x0,%eax
400557: c9 leaveq
400558: c3 retq
From our knowledge of x86-64 calling conventions:
%rdi is the first printf argument, thus the string "%d\n" at address 0x4005e4
%rsi is the second printf argument, thus i.
It comes from -0x4(%rbp), which is the first 4-byte local variable.
At this point, rbp is in the first page of the stack has been allocated by the kernel, so to understand that value we would to look into the kernel code and find out what it sets that to.
TODO does the kernel set that memory to something before reusing it for other processes when a process dies? If not, the new process would be able to read the memory of other finished programs, leaking data. See: Are uninitialized values ever a security risk?
We can then also play with our own stack modifications and write fun things like:
#include <assert.h>
int f() {
int i = 13;
return i;
}
int g() {
int i;
return i;
}
int main() {
f();
assert(g() == 13);
}
Note that GCC 11 seems to produce a different assembly output, and the above code stops "working", it is undefined behavior after all: Why does -O3 in gcc seem to initialize my local variable to 0, while -O0 does not?
Local variable in -O3
Implementation analysis at: What does <value optimized out> mean in gdb?
Global variables
Standards: 0
Implementation: .bss section.
#include <stdio.h>
int i;
int main() {
printf("%d\n", i);
}
gcc -O0 -std=c99 a.c
compiles to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 8b 05 04 0b 20 00 mov 0x200b04(%rip),%eax # 601044 <i>
400540: 89 c6 mov %eax,%esi
400542: bf e4 05 40 00 mov $0x4005e4,%edi
400547: b8 00 00 00 00 mov $0x0,%eax
40054c: e8 bf fe ff ff callq 400410 <printf#plt>
400551: b8 00 00 00 00 mov $0x0,%eax
400556: 5d pop %rbp
400557: c3 retq
400558: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40055f: 00
# 601044 <i> says that i is at address 0x601044 and:
readelf -SW a.out
contains:
[25] .bss NOBITS 0000000000601040 001040 000008 00 WA 0 0 4
which says 0x601044 is right in the middle of the .bss section, which starts at 0x601040 and is 8 bytes long.
The ELF standard then guarantees that the section named .bss is completely filled with of zeros:
.bss This section holds uninitialized data that contribute to the
program’s memory image. By definition, the system initializes the
data with zeros when the program begins to run. The section occu-
pies no file space, as indicated by the section type, SHT_NOBITS.
Furthermore, the type SHT_NOBITS is efficient and occupies no space on the executable file:
sh_size This member gives the section’s size in bytes. Unless the sec-
tion type is SHT_NOBITS , the section occupies sh_size
bytes in the file. A section of type SHT_NOBITS may have a non-zero
size, but it occupies no space in the file.
Then it is up to the Linux kernel to zero out that memory region when loading the program into memory when it gets started.
That depends. If that definition is global (outside any function) then num will be initialized to zero. If it's local (inside a function) then its value is indeterminate. In theory, even attempting to read the value has undefined behavior -- C allows for the possibility of bits that don't contribute to the value, but have to be set in specific ways for you to even get defined results from reading the variable.
The basic answer is, yes it is undefined.
If you are seeing odd behavior because of this, it may depended on where it is declared. If within a function on the stack then the contents will more than likely be different every time the function gets called. If it is a static or module scope it is undefined but will not change.
Because computers have finite storage capacity, automatic variables will typically be held in storage elements (whether registers or RAM) that have previously been used for some other arbitrary purpose. If a such a variable is used before a value has been assigned to it, that storage may hold whatever it held previously, and so the contents of the variable will be unpredictable.
As an additional wrinkle, many compilers may keep variables in registers which are larger than the associated types. Although a compiler would be required to ensure that any value which is written to a variable and read back will be truncated and/or sign-extended to its proper size, many compilers will perform such truncation when variables are written and expect that it will have been performed before the variable is read. On such compilers, something like:
uint16_t hey(uint32_t x, uint32_t mode)
{ uint16_t q;
if (mode==1) q=2;
if (mode==3) q=4;
return q; }
uint32_t wow(uint32_t mode) {
return hey(1234567, mode);
}
might very well result in wow() storing the values 1234567 into registers
0 and 1, respectively, and calling foo(). Since x isn't needed within
"foo", and since functions are supposed to put their return value into
register 0, the compiler may allocate register 0 to q. If mode is 1 or
3, register 0 will be loaded with 2 or 4, respectively, but if it is some
other value, the function may return whatever was in register 0 (i.e. the
value 1234567) even though that value is not within the range of uint16_t.
To avoid requiring compilers to do extra work to ensure that uninitialized
variables never seem to hold values outside their domain, and avoid needing
to specify indeterminate behaviors in excessive detail, the Standard says
that use of uninitialized automatic variables is Undefined Behavior. In
some cases, the consequences of this may be even more surprising than a
value being outside the range of its type. For example, given:
void moo(int mode)
{
if (mode < 5)
launch_nukes();
hey(0, mode);
}
a compiler could infer that because invoking moo() with a mode which is
greater than 3 will inevitably lead to the program invoking Undefined
Behavior, the compiler may omit any code which would only be relevant
if mode is 4 or greater, such as the code which would normally prevent
the launch of nukes in such cases. Note that neither the Standard, nor
modern compiler philosophy, would care about the fact that the return value
from "hey" is ignored--the act of trying to return it gives a compiler
unlimited license to generate arbitrary code.
If storage class is static or global then during loading, the BSS initialises the variable or memory location(ML) to 0 unless the variable is initially assigned some value. In case of local uninitialized variables the trap representation is assigned to memory location. So if any of your registers containing important info is overwritten by compiler the program may crash.
but some compilers may have mechanism to avoid such a problem.
I was working with nec v850 series when i realised There is trap representation which has bit patterns that represent undefined values for data types except for char. When i took a uninitialized char i got a zero default value due to trap representation. This might be useful for any1 using necv850es
As far as i had gone it is mostly depend on compiler but in general most cases the value is pre assumed as 0 by the compliers.
I got garbage value in case of VC++ while TC gave value as 0.
I Print it like below
int i;
printf('%d',i);

Finding variable name from instruction pointer using debugging symbols

I'm looking for a way to find the names of the variables accessed by a given instruction (that performs a memory access).
Using debugging symbols and, for example, addr2line or objdump it's easy to convert instruction addresses into source code files + line numbers, but unfortunately often a single source code line contains more than one variable so this method does not have sufficiently fine granularity.
I've found that objdump is able to convert instruction addresses to global variables. But I haven't yet found a way to do this for local variables. For example, in the example bellow, I'd like to know that instruction at address 0x4004c4 is accessing the local variable "local_hello" and that the instruction at address 0x4004c9 is accessing the local variable "local_hello2".
Hello.c:
int global_hello = 4;
int main(){
int local_hello = 3;
int local_hello2 = 0;
local_hello2 = global_hello + local_hello;
return local_hello2;
}
Using "objdump -S hello":
local_hello2 = global_hello + local_hello;
4004be: 8b 15 cc 03 20 00 mov 0x2003cc(%rip),%edx # 600890 <global_hello>
4004c4: 8b 45 fc mov -0x4(%rbp),%eax
4004c7: 01 d0 add %edx,%eax
4004c9: 89 45 f8 mov %eax,-0x8(%rbp)
This might work for simple programs with no or only moderate optimization levels but will become difficult with compiler optimzation.
You might want to look into gdb sources to learn about the efforts to connect variables to optimized compiler output.
What's your objective, after all?

What is the simplest standard conform way to produce a Segfault in C?

I think the question says it all. An example covering most standards from C89 to C11 would be helpful. I though of this one, but I guess it is just undefined behaviour:
#include <stdio.h>
int main( int argc, char* argv[] )
{
const char *s = NULL;
printf( "%c\n", s[0] );
return 0;
}
EDIT:
As some votes requested clarification: I wanted to have a program with an usual programming error (the simplest I could think of was an segfault), that is guaranteed (by standard) to abort. This is a bit different to the minimal segfault question, which don't care about this insurance.
raise() can be used to raise a segfault:
raise(SIGSEGV);
A segmentation fault is an implementation defined behavior. The standard does not define how the implementation should deal with undefined behavior and in fact the implementation could optimize out undefined behavior and still be compliant. To be clear, implementation defined behavior is behavior which is not specified by the standard but the implementation should document. Undefined behavior is code that is non-portable or erroneous and whose behavior is unpredictable and therefore can not be relied on.
If we look at the C99 draft standard §3.4.3 undefined behavior which comes under the Terms, definitions and symbols section in paragraph 1 it says (emphasis mine going forward):
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
and in paragraph 2 says:
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
If, on the other hand, you simply want a method defined in the standard that will cause a segmentation fault on most Unix-like systems then raise(SIGSEGV) should accomplish that goal. Although, strictly speaking, SIGSEGV is defined as follows:
SIGSEGV an invalid access to storage
and §7.14 Signal handling <signal.h> says:
An implementation need not generate any of these signals, except as a result of explicit calls to the raise function. Additional signals and pointers to undeclarable functions, with macro definitions beginning, respectively, with the letters SIG and an uppercase letter or with SIG_ and an uppercase letter,219) may also be specified by the implementation. The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive.
The standard only mentions undefined behavior. It knows nothing about memory segmentation. Also note that the code that produces the error is not standard-conformant. Your code cannot invoke undefined behavior and be standard conformant at the same time.
Nonetheless, the shortest way to produce a segmentation fault on architectures that do generate such faults would be:
int main()
{
*(int*)0 = 0;
}
Why is this sure to produce a segfault? Because access to memory address 0 is always trapped by the system; it can never be a valid access (at least not by userspace code.)
Note of course that not all architectures work the same way. On some of them, the above could not crash at all, but rather produce other kinds of errors. Or the statement could be perfectly fine, even, and memory location 0 is accessible just fine. Which is one of the reasons why the standard doesn't actually define what happens.
A correct program doesn't produce a segfault. And you cannot describe deterministic behaviour of an incorrect program.
A "segmentation fault" is a thing that an x86 CPU does. You get it by attempting to reference memory in an incorrect way. It can also refer to a situation where memory access causes a page fault (i.e. trying to access memory that's not loaded into the page tables) and the OS decides that you had no right to request that memory. To trigger those conditions, you need to program directly for your OS and your hardware. It is nothing that is specified by the C language.
If we assume we are not raising a signal calling raise, segmentation fault is likely to come from undefined behavior. Undefined behavior is undefined and a compiler is free to refuse to translate so no answer with undefined is guaranteed to fail on all implementations. Moreover a program which invokes undefined behavior is an erroneous program.
But this one is the shortest I can get that segfault on my system:
main(){main();}
(I compile with gcc and -std=c89 -O0).
And by the way, does this program really invokes undefined bevahior?
main;
That's it.
Really.
Essentially, what this does is it defines main as a variable.
In C, variables and functions are both symbols -- pointers in memory, so the compiler does not distinguish them, and this code does not throw an error.
However, the problem rests in how the system runs executables. In a nutshell, the C standard requires that all C executables have an environment-preparing entrypoint built into them, which basically boils down to "call main".
In this particular case, however, main is a variable, so it is placed in a non-executable section of memory called .bss, intended for variables (as opposed to .text for the code). Trying to execute code in .bss violates its specific segmentation, so the system throws a segmentation fault.
To illustrate, here's (part of) an objdump of the resulting file:
# (unimportant)
Disassembly of section .text:
0000000000001020 <_start>:
1020: f3 0f 1e fa endbr64
1024: 31 ed xor %ebp,%ebp
1026: 49 89 d1 mov %rdx,%r9
1029: 5e pop %rsi
102a: 48 89 e2 mov %rsp,%rdx
102d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1031: 50 push %rax
1032: 54 push %rsp
1033: 4c 8d 05 56 01 00 00 lea 0x156(%rip),%r8 # 1190 <__libc_csu_fini>
103a: 48 8d 0d df 00 00 00 lea 0xdf(%rip),%rcx # 1120 <__libc_csu_init>
# This is where the program should call main
1041: 48 8d 3d e4 2f 00 00 lea 0x2fe4(%rip),%rdi # 402c <main>
1048: ff 15 92 2f 00 00 callq *0x2f92(%rip) # 3fe0 <__libc_start_main#GLIBC_2.2.5>
104e: f4 hlt
104f: 90 nop
# (nice things we still don't care about)
Disassembly of section .data:
0000000000004018 <__data_start>:
...
0000000000004020 <__dso_handle>:
4020: 20 40 00 and %al,0x0(%rax)
4023: 00 00 add %al,(%rax)
4025: 00 00 add %al,(%rax)
...
Disassembly of section .bss:
0000000000004028 <__bss_start>:
4028: 00 00 add %al,(%rax)
...
# main is in .bss (variables) instead of .text (code)
000000000000402c <main>:
402c: 00 00 add %al,(%rax)
...
# aaand that's it!
PS: This won't work if you compile to a flat executable. Instead, you will cause undefined behaviour.
On some platforms, a standard-conforming C program can fail with a segmentation fault if it requests too many resources from the system. For instance, allocating a large object with malloc can appear to succeed, but later, when the object is accessed, it will crash.
Note that such a program is not strictly conforming; programs which meet that definition have to stay within each of the minimum implementation limits.
A standard-conforming C program cannot produce a segmentation fault otherwise, because the only other ways are via undefined behavior.
The SIGSEGV signal can be raised explicitly, but there is no SIGSEGV symbol in the standard C library.
(In this answer, "standard-conforming" means: "Uses only the features described in some version of the ISO C standard, avoiding unspecified, implementation-defined or undefined behavior, but not necessarily confined to the minimum implementation limits.")
The simplest form considering the smallest number of characters is:
++*(int*)0;
Most of the answers to this question are talking around the key point, which is: The C standard does not include the concept of a segmentation fault. (Since C99 it includes the signal number SIGSEGV, but it does not define any circumstance where that signal is delivered, other than raise(SIGSEGV), which as discussed in other answers doesn't count.)
Therefore, there is no "strictly conforming" program (i.e. program that uses only constructs whose behavior is fully defined by the C standard, alone) that is guaranteed to cause a segmentation fault.
Segmentation faults are defined by a different standard, POSIX. This program is guaranteed to provoke either a segmentation fault, or the functionally equivalent "bus error" (SIGBUS), on any system that is fully conforming with POSIX.1-2008 including the Memory Protection and Advanced Realtime options, provided that the calls to sysconf, posix_memalign and mprotect succeed. My reading of C99 is that this program has implementation-defined (not undefined!) behavior considering only that standard, and therefore it is conforming but not strictly conforming.
#define _XOPEN_SOURCE 700
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(void)
{
size_t pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) {
fprintf(stderr, "sysconf: %s\n", strerror(errno));
return 1;
}
void *page;
int err = posix_memalign(&page, pagesize, pagesize);
if (err || !page) {
fprintf(stderr, "posix_memalign: %s\n", strerror(err));
return 1;
}
if (mprotect(page, pagesize, PROT_NONE)) {
fprintf(stderr, "mprotect: %s\n", strerror(errno));
return 1;
}
*(long *)page = 0xDEADBEEF;
return 0;
}
It's hard to define a method to segmentation fault a program on undefined platforms. A segmentation fault is a loose term that is not defined for all platforms (eg. simple small computers).
Considering only the operating systems that support processes, processes can receive notification that a segmentation fault occurred.
Further, limiting operating systems to 'unix like' OSes, a reliable method for a process to receive a SIGSEGV signal is kill(getpid(),SIGSEGV)
As is the case in most cross platform problems, each platform may (an usually does) have a different definition of seg-faulting.
But to be practical, current mac, lin and win OSes will segfault on
*(int*)0 = 0;
Further, it's not bad behaviour to cause a segfault. Some implementations of assert() cause a SIGSEGV signal which might produce a core file. Very useful when you need to autopsy.
What's worse than causing a segfault is hiding it:
try
{
anyfunc();
}
catch (...)
{
printf("?\n");
}
which hides the origin of an error and all you've got to go on is:
?
.
Here's another way I haven't seen mentioned here:
int main() {
void (*f)(void);
f();
}
In this case f is an uninitialized function pointer, which causes a segmentation fault when you try to call it.

Resources