Declaration of wchar_t by myself - c

Please tell me, where can I find the wchar_t declaration.
I use linux, and I think it is 32 bits.
I need to declarate this type, because i can't use the standart library (it is used in my boot programm).
The files /usr/include/wchar.h and /usr/include/linux/stddef.h
don't contain the declaration of it.
Also what about mbstate_t?

If you are not using the standard library, then you do not need wchar_t. Or at least the standard library's idea of wchar_t doesn't matter to you. How could it? If you want wide character handling then you'll need to write whatever functions are needed for it, and you are free to define and use whatever types are most suitable / convenient for that purpose.
You probably will need the kernel headers, though, if you intend make system calls. I can't imagine how your boot program could avoid that, or why it would want to do. Those types you would need for that purpose will be defined among the kernel headers.
You will not need kernel headers, either: since the point of your program is to load and start the kernel, it cannot rely on system calls into the kernel to do its job.
The bottom line is that since you have to provide everything not available directly from the hardware / firmware -- which does not present a C interface -- no C definitions outside your own code are relevant to you. In particular, wchar_t is not a characteristic of any system; rather, it is a characteristic of a particular C library. Different C libraries for the very same system, including yours, can freely define it differently. If you choose to implement your own wide-character functions, there is no advantage whatever to choosing a wchar_t definition drawn from some other C library.

You don't need to go to the files containing the definition of wchar_t.
In the standard header stdint.h it can be found the information about integer types in your particular implementation.
For example, the constants WCHAR_MIN and WCHAR_MAX bring information about the range of values of wchar_t.
On the ohter hand, if WCHAR_MIN == 0, then wchar_t is an unsigned integer type. Else, it's a signed integer type (in this case it would prefirable to check the condition WCHAR_MIN < 0).
To know the amount of bytes used to represent a wchar_t object, you have, of course, the expression sizeof(wchar_t).

Related

Binary compatibility of struct in separately compiled code

Given a CPU architecture, is the exact binary form of a struct determined exactly?
For example, struct stat64 is used by glibc and the Linux kernel. I see glibc define it in sysdeps/unix/sysv/linux/x86/bits/stat.h as:
struct stat64 {
__dev_t st_dev; /* Device. */
# ifdef __x86_64__
__ino64_t st_ino; /* File serial number. */
__nlink_t st_nlink; /* Link count. */
/* ... et cetera ... */
}
My kernel was compiled already. Now when I compile new code using this definition, they have binary compatibility. Where is this guaranteed? The only guarantees I know of are:
The first element has offset 0
Elements declared later have higher offsets
So if the kernel code declares struct stat64 in the exact same way (in the C code), then I know that the binary form has:
st_dev # offset 0
st_ino # offset at least sizeof(__dev_t)
But I'm not currently aware of any way to determine the offset of st_ino. Kernighan & Ritchie give the simple example
struct X {
char c;
int i;
}
where on my x86-64 machine, offsetof(struct X, i) == 4. Perhaps there are some general alignment rules that determine the exact binary form of a struct for each CPU architecture?
Given a CPU architecture, is the exact binary form of a struct determined exactly?
No, the representation or layout (“binary form”) of a structure is ultimately determined by the C implementation, not by the CPU architecture. Most C implementations intended for normal purposes follow recommendations provided by the manufacturer and/or the operating system. However, there may be circumstances where, for example, a certain alignment for a particular type might give slightly better performance but is not required, and so one C implementation might choose to require that alignment while another does not, and this can result in different structure layout.
In addition, a C implementation might be designed for special purposes, such as providing compatibility with legacy code, in which case it might choose to replicate the alignment of some old compiler for another architecture rather than to use the alignment required by the target processor.
However, let’s consider structures in separate compilations using one C implementation. Then C 2018 6.2.7 1 says:
… Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths…
Therefore, if two structures are declared identically in separate translation units, or with the minor variations permitted in that passage, then they are compatible, which effectively means they have the same layout or representation.
Technically, that passage applies only to separate translation units of the same program. The C standard defines behaviors for one program; it does not explicitly define interactions between programs (or fragments of programs, such as kernel extensions) and the operating system, although to some extent you might consider the operating system and everything running in it as one program. However, for practical purposes, it applies to everything compiled with that C implementation.
This means that as long as you use the same C implementation as the kernel is compiled with, identically declared structures will have the same representation.
Another consideration is that we might use different compilers for compiling the kernel and compiling programs. The kernel might be compiled with Clang while a user prefers to use GCC. In this case, it is a matter for the compilers to document their behaviors. The C standard does not guarantee compatibility, but the compilers can, if they choose to, perhaps by both documenting that they adhere to a particular Application Binary Interface (ABI).
Also note that a “C implementation” as discussed above is not just a particular compiler but a particular compiler with particular switches. Various switches may change how a compiler behaves in ways that cause to be effectively a different C implementation, such as switches to conform to one version of the C standard or another, switches affecting whether structures are packed, switches affecting sizes of integer types, and so on.

C compiler - list __builtin_ types

here is a small (and working) C code:
typedef __builtin_va_list __va_list;
int main() {
return 0;
}
I found an answer, how gcc found the base type:
Pycparser not working on preprocessed code
But how can I list all of the __builtin_ "base" types, which not defined explicitly?
Thanks,
a.
how can I list all of the __builtin_ "base" types, which not defined explicitly?
TL;DR: there is no general-purpose way to do it.
The standard does not define any such types, so they can only be implementation-specific. In particular, C does not attribute any significance to the __builtin_ name prefix (though such identifiers are reserved), nor does it acknowledge that any types exist that are not derived from those it does define. Thus, for most purposes, the types you are asking about should be considered an implementation detail.
If there were a way to list implementation-specific built-in types, it would necessarily be implementation-specific itself. For example, you might be able to find a list such as you are after in the compiler's documentation. You could surely derive one from the compiler's own source code, if that's available to you. You could maybe extract strings from the compiler binary, and filter for a characteristic name pattern, such as strings starting with "__builtin_".
You could also consider parsing all the standard library headers (with the assumption that they are correct) to find undeclared types, though that's not guaranteed to find all the available types. Moreover, with some systems, for example GNU's, the C standard library (to which the headers belong) is separate from the compiler.

How do most embedded C compilers define symbols for memory mapped I/O?

I often times write to memory mapped I/O pins like this
P3OUT |= BIT1;
I assumed that P3OUT was being replaced with something like this by my preprocessor:
*((unsigned short *) 0x0222u)
But I dug into an H file today and saw something along these lines:
volatile unsigned short P3OUT # 0x0222u;
There's some more expansion going on before that, but it is generally that. A symbol '#' is being used. Above that there are some #pragma's about using an extended set of the C language. I am assuming this is some sort of directive to the linker and effectively a symbol is being defined as being at that location in the memory map.
Was my assumption right for what happens most of the time on most compilers? Does it matter one way or the other? Where did that # notation come from, is it some sort of standard?
I am using IAR Embedded workbench.
This question is similar to this one: How to place a variable at a given absolute address in memory (with GCC).
It matches what I assumed my compiler was doing anyway.
Although an expression like (unsigned char *)0x1234 will, on many compilers, yield a pointer to hardware address 0x1234, nothing in the standard requires any particular relationship between an integer which is cast to a pointer and the resulting address. The only thing which the standard specifies is that if a particular integer type is at least as large as intptr_t, and casting a pointer to that particular type yields some value, then casting that particular value back to the original pointer type will yield a pointer equivalent to the original.
The IAR compiler offers a non-standard extension which allows the compiler to request that variables be placed at specified hard-coded addresses. This offers some advantages compared to using macros to create pointer expressions. For one thing, it ensures that such variables will be regarded syntactically as variables; while pointer-kludge expressions will generally be interpreted correctly when used in legitimate code, it's possible for illegitimate code which should fail with a compile-time error to compile but produce something other than the desired effect. Further, the IAR syntax defines symbols which are available to the linker and may thus be used within assembly-language modules. By contrast, a .H file which defines pointer-kludge macros will not be usable within an assembly-language module; any hardware which will be used in both C and assembly code will need to have its address specified in two separate places.
The short answer to the question in your title is "differently". What's worse is that compilers from different vendors for the same target processor will use different approaches. This one
volatile unsigned short P3OUT # 0x0222u;
Is a common way to place a variable at a fixed address. But you will also see it used to identify individual bits within a memory mapped location = especially for microcontrollers which have bit-wide instructions like the PIC families.
These are things that the C Standard does not address, and should IMHO, as small embedded microcontrollers will eventually end up being the main market for C (yes, I know the kernel is written in C, but a lot of user-space stuff is moving to C++).
I actually joined the C committee to try and drive for changes in this area, but my sponsorship went away and it's a very expensive hobby.
A similar area is declaring a function to be an ISR.
This document shows one of the approaches we considered

Type specifications in platform ABIs

Which of these items can safely be assumed to be defined in any practically-usable platform ABI?
Value of CHAR_BIT
Size, alignment requirements and object representation of:
void*, size_t, ptrdiff_t
unsigned char and signed char
intptr_t and uintptr_t
float, double and long double
short and long long
int and long (but here I expect a "no")
Pointer to an object type for which the platform ABI specifies these properties
Pointer to function whose type only involves types for which the platform ABI specifies these properties
Object representation of a null object pointer
Object representation of a null function pointer
For example, if I have a library (compiled by an unknown, but ABI-conforming compiler) which publishes this function:
void* foo(void *bar, size_t baz, void* (*qux)());
can I assume to be able to safely call it in my program regardless of the compiler I use?
Or, taken the other way round, if I am writing a library, is there a set of types such that if I limit the library's public interface to this set, it will be guaranteed to be usable on all platforms where it builds?
I don't see how you can expect any library to be universally compatible. If that were possible, there would not be so many compiled variations of libraries.
For example, you could call a 64-bit library from a 16-bit program as long as you set up the call correctly. But you would have to know you're calling a 64-bit based library.
Portability is a much-talked about goal, but few truly achieve it. After 30+ years of system-level, firmware and application programming, I think of it as more of a fantasy versus a goal. Unfortunately, hardware forces us to optimize for the hardware. Therefore, when I write a library, I use the following:
Compile for ABI
Use a pointer to a structure for input and output for all function calls:
int lib_func(struct *input, struct *output);
Where the returning int indicates errors only. I make all error codes unique. I require the user to call an init function prior to any use of the library. The user calls it as:
lib_init(sizeof(int), sizeof(char *), sizeof(long), sizeof(long long));
So that I can decide if there will be any trouble or modify any assumptions if needed. I also add a function allowing the user to learn my data sizes and alignment in addition to version numbers.
This is not to say the user or I am expected to "on-the-fly" modify code or spend lots of CPU power reworking structures. But this allows the application to make absolutely sure it's compatible with me and vice-versa.
The other option which I have employed in the past, is to simply include several entry-point functions with my library. For example:
int lib_func32();
int lib_func16();
int lib_func64();
It makes a bit of a mess for you, but you can then fix it up using the preprocessor:
#ifdef LIB_USE32
#define lib_function lib_func32
#endif
You can do the same with data structures but I'd recommend using the same size data structure regardless of CPU size -- unless performance is a top-priority. Again, back to the hardware!
The final option I explore is whether to have entry functions of all sizes and styles which convert the input to my library's expectations, as well as my library's output.
For example, your lib_func32(&input, &output) can be compiled to expect a 32-bit aligned, 32-bit pointer but it converts the 32-bit struct into your internal 64-bit struct then calls your 64 bit function. When that returns, it reformats the 64-bit struct to its 32-bit equivalent as pointed to by the caller.
int lib_func32(struct *input32, struct *output32)
{
struct input64;
struct output64;
int retval;
lib_convert32_to_64(input32, &input64);
retval = lib_func64(&input64, &output64);
lib_convert64_to_32(&output64, output32);
return(retval);
}
In summary, a totally portable solution is not viable. Even if you begin with total portability, eventually you will have to deviate. This is when things truly get messy. You break your style for deviations which then breaks your documentation and confuses users. I think it's better to just plan it from the start.
Hardware will always cause you to have deviations. Just consider how much trouble 'endianness' causes -- not to mention the number of CPU cycles which are used each day swapping byte orders.
The C standard contains an entire section in the appendix summarizing just that:
J.3 Implementation-defined behavior
A completely random subset:
The number of bits in a byte
Which of signed char and unsigned char is the same as char
The text encodings for multibyte and wide strings
Signed integer representation
The result of converting a pointer to an integer and vice versa (6.3.2.3). Note that this means any pointer, not just object pointers.
Update: To address your question about ABIs: An ABI (application binary interface) is not a standardized concept, and it isn't said anywhere that an implementation must even specify an ABI. The ingredients of an ABI are partly the implementation-defined behaviour of the language (though not all of it; e.g. signed-to-unsigned conversion is implementation defined, but not part of an ABI), and most of the implementation-defined aspects of the language are dictated by the hardware (e.g. signed integer representation, floating point representation, size of pointers).
However, more important aspects of an ABI are things like how function calls work, i.e. where the arguments are stored, who's responsible for cleaning up the memory, etc. It is crucial for two compilers to agree on those conventions in order for their code to be binarily compatible.
In practice, an ABI is usually the result of an implementation. Once the compiler is complete, it determines -- by virtue of its implementation -- an ABI. It may document this ABI, and other compilers, and future versions of the same compiler, may like to stick to those conventions. For C implementations on x86, this has worked rather well and there are only a few, usually well documented, free parameters that need to be communicated for code to be interoperable. But for other languages, most notably C++, you have a completely different picture: There is nothing coming near a standard ABI for C++ at all. Microsoft's compiler breaks the C++ ABI with every release. GCC tries hard to maintain ABI compatibility across versions and uses the published Itanium ABI (ironically for a now dead architecture). Other compilers may do their own, completely different thing. (And then you have of course issues with C++ standard library implementations, e.g. does your string contain one, two, or three pointers, and in which order?)
To summarize: many aspects of a compiler's ABI, especially pertaining to C, are dictated by the hardware architecture. Different C compilers for the same hardware ought to produce compatible binary code as long as certain aspects like function calling conventions are communicated properly. However, for higher-level languages all bets are off, and whether two different compilers can produce interoperable code has to be decided on a case-by-case basis.
If I understand your needs correctly, uint style ones are the only ones that will give you binary compatibility guarantee and of cause int, char will but others tend to differ. i.e long on Windows and Linux, Windows considers it 4byte and Linux as 8byte. If you are really dependent on ABI, you have to plan for the platforms you are going to deliver and may be use typedefs to make things standardized and readable.

C: Why isn't size_t a C keyword?

sizeof is a C keyword. It returns the size in a type named size_t. However, size_t is not a keyword, but is defined primarily in stddef.h and probably other C standard header files too.
Consider a scenario where you want to create a C program which does not include any C standard headers or libraries. (Like for example, if you are creating an OS kernel.) Now, in such code, sizeof can be used (it is a C keyword, so it is a part of the language), but the type that it returns (size_t) is not available!
Does not this signify some kind of a problem in the C standard specification? Can you clarify this?
It does not literally return a value of type size_t since size_t is not a concrete type in itself, but rather a typedef to an unspecified built-in type. Typedef identifiers (such as size_t) are completely equivalent to their respective underlying types (and are converted thereto at compile time). If size_t is defined as an unsigned int on your platform, then sizeof returns an unsigned int when it is compiled on your system. size_t is just a handy way of maintaining portability and only needs to be included in stddef.h if you are using it explicitly by name.
sizeof is a keyword because, despite it's name and usage, it is an operator like + or = or < rather than a function like printf() or atoi() or fgets(). A lot of people forget (or just don't know) that sizeof is actually an operator, and is always resolved at compile-time rather than at runtime.
The C language doesn't need size_t to be a usable, consistent language. That's just part of the standard library. The C language needs all operators. If, instead of +, C used the keyword plus to add numbers, you would make it an operator.
Besides, I do semi-implicit recasting of size_ts to unsigned ints (and regular ints, but Kernighan and Ritchie will someday smite me for this) all the time. You can assign the return type of a sizeof to an int if you like, but in my work I'm usually just passing it straight on to a malloc() or something.
Some headers from the C standard are defined for a freestanding environment, i.e. fit for use e.g. in an operating system kernel. They do not define any functions, merely defines and typedefs.
They are float.h, iso646.h, limits.h, stdarg.h, stdbool.h, stddef.h and stdint.h.
When working on an operating system, it isn't a bad idea to start with these headers. Having them available makes many things easier in your kernel. Especially stdint.h will become handy (uint32_t et al.).
Does not this signify some kind of a problem in the C standard specification?
Look up the difference between a hosted implementation of C and a freestanding C implementation. The freestanding (C99) implementation is required to provide headers:
<float.h>
<iso646.h>
<limits.h>
<stdarg.h>
<stdbool.h>
<stddef.h>
<stdint.h>
These headers do not define any functions at all. They define parts of the language that are somewhat compiler specific (for example, the offsetof macro in <stddef.h>, and the variable argument list macros and types in <stdarg.h>), but they can be handled without actually being built into the language as full keywords.
This means that even in your hypothetical kernel, you should expect the C compiler to provide these headers and any underlying support functions - even though you provide everything else.
I think that the main reasons that size_t is not a keyword are:
there's no compelling reason for it to be. The designers of the C and C++ languages have always preferred to have language features be implemented in the library if possible and reasonable
adding keywords to a language can create problems for an existing body of legacy code. This is another reason they are generally resistant to adding new keywords.
For example, in discussing the next major revision of the C++ standard, Stroustrup had this to say:
The C++0x improvements should be done in such a way that the resulting language is easier to learn and use. Among the rules of thumb for the committee are:
...
Prefer standard library facilities to language extensions
...
There is no reason not to include stddef.h, even if you are working on a kernel - it defines type sizes for your specific compiler that any code will need.
Note also that almost all C compilers are self-compiled. The actual compiler code for the sizeof operator will therefore use size_t and reference the same stddef.h file as does user code.
From MSDN:
When the sizeof operator is applied
to an object of type char, it yields 1
Even if you don't have stddef.h available/included and don't know about size_t, using sizeof you can get the size of objects relative to char.
size_t is actually a type - often an unsigned int. Sizeof is an operator that gives the size of a type. The type returned by sizeof is actually implementation-specific, not a C standard. It's just an integer.
Edit:
To be very clear, you do not need the size_t type in order to use sizeof. I think the answer you're looking for is - Yes, it is inconsistent. However, it doesn't matter. You can still practically use sizeof correctly without having a size_t definition from a header file.
size_t is not a keyword by necessity. Different architectures often have different sizes for integral types. For example a 64 bit machine is likely to have an unsigned long long as size_t if they didn't decide to make int a 64 bit datatype.
If you make sizeof a builtin type to the compiler, then it will take away the power to do cross compilation.
Also, sizeof is more like a magic compile time macro (think c++ template) which explains why it is a keyword instead of defined type.
The simple reason is because it is not a fundamental type. If you look up the C standard you will find that fundamental types include int, char etc but not size_t. Why so? As others have already pointed out, size_t is an implementation specific type (i.e. a type capable of holding the size in number of "C bytes" of any object).
On the other hand, sizeof is an (unary) operator. All operators are keywords.

Resources