Reading registers in the GCC using a char pointer

Reading registers in the GCC using a char pointer - c

I have recently started learning how to use the inline assembly in C Code and came across an interesting feature where you can specify registers for local variables (https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables).
The usage of this feature is as follows:
register int *foo asm ("r12");
Then I started to wonder whether it was possible to insert a char pointer such as
const char d[4] = "r12";
register int *foo asm (d);
but got the error: expected string literal before ‘d’ (as expected)
I can understand why this would be a bad practice, but is there any possible way to achieve a similar effect where I can use a char pointer to access the register? If not, is there any particular reason why this is not allowed besides the potential security issues?
Additionally, I read this StackOverflow question: String literals: pointer vs. char array
Thank you.

The syntax to initialize the variable would be register char *foo asm ("r12") = d; to point an asm-register variable at a string. You can't use a runtime-variable string as the register name; register choices have to get assembled into machine code at compile time.
If that's what you're trying to do, you're misunderstanding something fundamental about assembly language and/or how ahead-of-time compiled languages compile into machine code. GCC won't make self-modifying code (and even if it wanted to, doing that safely would require redoing register allocation done by the ahead-of-time optimizer), or code that re-JITs itself based on a string.
(The first time I looked at your question, I didn't understand what you were even trying to do, because I was only considering things that are possible. #FelixG's comment was the clue I needed to make sense of the question.)
(Also note that registers aren't indexable; even in asm you can't use a single instruction to read a register number selected by an integer in another register. You could branch on it, or store all the registers in memory and index that like variadic functions do for their incoming register args.)
And if you do want a compile-time constant string literal, just use it with the normal syntax. Use a CPP macro if you want the same string to initialize a char array.

Related

Can a string literal in C be modified?

I recently had a question, I know that a pointer to a constant array initialized as it is in the code below, is in the .rodata region and that this region is only readable.
However, I saw in pattern C11, that writing in this memory address behavior will be undefined.
I was aware that the Borland's Turbo-C compiler can write where the pointer points, this would be because the processor operated in real mode on some systems of the time, such as MS-DOS? Or is it independent of the operating mode of the processor? Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?
#include <stdio.h>
int main(void) {
char *st = "aaa";
*st = 'b';
return 0;
}
In this code compiling with Turbo-C in MS-DOS, you will be able to write to memory

As has been pointed out, trying to modify a constant string in C results in undefined behavior. There are several reasons for this.
One reason is that the string may be placed in read-only memory. This allows it to be shared across multiple instances of the same program, and doesn't require the memory to be saved to disk if the page it's on is paged out (since the page is read-only and thus can be reloaded later from the executable). It also helps detect run-time errors by giving an error (e.g. a segmentation fault) if an attempt is made to modify it.
Another reason is that the string may be shared. Many compilers (e.g., gcc) will notice when the same literal string appears more than once in a compilation unit, and will share the same storage for it. So if a program modifies one instance, it could affect others as well.
There is also never a need to do this, since the same intended effect can easily be achieved by using a static character array. For instance:
#include <stdio.h>
int main(void) {
static char st_arr[] = "aaa";
char *st = st_arr;
*st = 'b';
return 0;
}
This does exactly what the posted code attempted to do, but without any undefined behavior. It also takes the same amount of memory. In this example, the string "aaa" is used as an array initializer, and does not have any storage of its own. The array st_arr takes the place of the constant string from the original example, but (1) it will not be placed in read-only memory, and (2) it will not be shared with any other references to the string. So it's safe to modify it, if in fact that's what you want.

Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?
GCC 3 and earlier used to support gcc -fwriteable-strings to let you compile old K&R C where this was apparently legal, according to https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Incompatibilities.html. (It's undefined behaviour in ISO C and thus a bug in an ISO C program). That option will define the behaviour of the assignment which ISO C leaves undefined.
GCC 3.3.6 manual - C Dialect options
-fwritable-strings
Store string constants in the writable data segment and don't uniquize them. This is for compatibility with old programs which assume they can write into string constants.
Writing into string constants is a very bad idea; “constants” should be constant.
GCC 4.0 removed that option (release notes); the last GCC3 series was gcc3.4.6 in March 2006. Although apparently it had become buggy in that version.
gcc -fwritable-strings would treat string literals like non-const anonymous character arrays (see #gnasher's answer), so they go in the .data section instead of .rodata, and thus get linked into a segment of the executable that's mapped to read+write pages, not read-only. (Executable segments have basically nothing to do with x86 segmentation, it's just a start+range memory-mapping from the executable file to memory.)
And it would disable duplicate-string merging, so char *foo() { return "hello"; } and char *bar() { return "hello"; } would return different pointer values, instead of merging identical string literals.
Related:
How can some GCC compilers modify a constant char pointer?
https://softwareengineering.stackexchange.com/questions/294748/why-are-c-string-literals-read-only
Linker option: still Undefined Behaviour so probably not viable
On GNU/Linux, linking with ld -N (--omagic) will make the text (as well as data) section read+write. This may apply to .rodata even though modern GNU Binutils ld puts .rodata in its own section (normally with read but not exec permission) instead of making it part of .text. Having .text writeable could easily be a security problem: you never want a page with write+exec at the same time, otherwise some bugs like buffer overflows can turn into code-injection attacks.
To do this from gcc, use gcc -Wl,-N to pass on that option to ld when linking.
This doesn't do anything about it being Undefined Behaviour to write const objects. e.g. the compiler will still merge duplicate strings, so writing into one char *foo = "hello"; will affect all other uses of "hello" in the whole program, even across files.
What to use instead:
If you want something writeable, use static char foo[] = "hello"; where the quoted string is just an array initializer for a non-const array. As a bonus, this is more efficient than static char *foo = "hello"; at global scope, because there's one fewer level of indirection to get to the data: it's just an array instead a pointer stored in memory.

You are asking whether or not the platform may cause undefined behavior to be defined. The answer to that question is yes.
But you are also asking whether or not the platform defines this behavior. In fact it does not.
Under some optimization hints, the compiler will merge string constants, so that writing to one constant will write to the other uses of that constant. I used this compiler once, it was quite capable of merging strings.
Don't write this code. It's not good. You will regret writing code in this style when you move onto a more modern platform.

Your literal "aaa" produces a static array of four const char 'a', 'a', 'a', '\0' in an anonymous location and returns a pointer to the first 'a', cast to char*.
Trying to modify any of the four characters is undefined behaviour. Undefined behaviour can do anything, from modifying the char as intended, pretending to modify the char, doing nothing, or crashing.
It's basically the same as static const char anonymous[4] = { 'a', 'a', 'a', '\0' }; char* st = (char*) &anonymous [0];

To add to the correct answers above, DOS runs in real mode, so there is no read only memory. All memory is flat and writable. Hence, writing to the literal was well defined (as it was in any sort of const variable) at the time.

Assigning a "string" to a varible previously declared as "int"

I am new to programming and on learning dynamic typing in python, it arisess a doubt in "static typing". I tried out this code (assigning a string to an integer variable which was previously declared) and printing the variable as printf(var_name) and its gives output; can anyone explain this concept?
#include<stdio.h>
#include<conio.h>
void main()
{
int i = 20 ;
i = "hello";
printf(i);
}

Besides your question might be a duplicate, let me append something missing of the read worthy answer https://stackoverflow.com/a/430414/3537677
C is strongly/statically typed but weakly checked
This is one of the biggest core language features which sets C apart from other languages like C++. (Which people are used to mistake a simply "C with classes"
Meaning although C has a strong type system in the context of needing and using it for knowing sizes of types at compile time, the C languages does not have a type system in order to check them for misuse. So compilers are neither mandated to check it nor are they allowed to error your code, because its legal C code. Modern compilers will issue a warning dough.
C compilers are only ensuring "their type system" for the mentioned size management. Meaning, if you just type int i = 42; this variable has so called automatic storage duration or what many people are calling more or less correctly "the stack". It means the compiler will take care of getting space for the variable and cleaning it up. If it can not know the size of it, but needs it then it will indeed generate an error. But this can be circumvented by doing things at run-time and using of types without any type whats so ever, i.e. pointers and void* aka void-pointers.
Regarding your code
Your code seems to be an old, non standard C compiler judging by the #include<conio.h> and void returning main. With a few modifications one can compile your code, but by calling printf with an illegal format string, you are causing so called undefined behaviour (UB), meaning it might work on your machine, but crashes on mine.

How do most embedded C compilers define symbols for memory mapped I/O?

I often times write to memory mapped I/O pins like this
P3OUT |= BIT1;
I assumed that P3OUT was being replaced with something like this by my preprocessor:
*((unsigned short *) 0x0222u)
But I dug into an H file today and saw something along these lines:
volatile unsigned short P3OUT # 0x0222u;
There's some more expansion going on before that, but it is generally that. A symbol '#' is being used. Above that there are some #pragma's about using an extended set of the C language. I am assuming this is some sort of directive to the linker and effectively a symbol is being defined as being at that location in the memory map.
Was my assumption right for what happens most of the time on most compilers? Does it matter one way or the other? Where did that # notation come from, is it some sort of standard?
I am using IAR Embedded workbench.
This question is similar to this one: How to place a variable at a given absolute address in memory (with GCC).
It matches what I assumed my compiler was doing anyway.

Although an expression like (unsigned char *)0x1234 will, on many compilers, yield a pointer to hardware address 0x1234, nothing in the standard requires any particular relationship between an integer which is cast to a pointer and the resulting address. The only thing which the standard specifies is that if a particular integer type is at least as large as intptr_t, and casting a pointer to that particular type yields some value, then casting that particular value back to the original pointer type will yield a pointer equivalent to the original.
The IAR compiler offers a non-standard extension which allows the compiler to request that variables be placed at specified hard-coded addresses. This offers some advantages compared to using macros to create pointer expressions. For one thing, it ensures that such variables will be regarded syntactically as variables; while pointer-kludge expressions will generally be interpreted correctly when used in legitimate code, it's possible for illegitimate code which should fail with a compile-time error to compile but produce something other than the desired effect. Further, the IAR syntax defines symbols which are available to the linker and may thus be used within assembly-language modules. By contrast, a .H file which defines pointer-kludge macros will not be usable within an assembly-language module; any hardware which will be used in both C and assembly code will need to have its address specified in two separate places.

The short answer to the question in your title is "differently". What's worse is that compilers from different vendors for the same target processor will use different approaches. This one
volatile unsigned short P3OUT # 0x0222u;
Is a common way to place a variable at a fixed address. But you will also see it used to identify individual bits within a memory mapped location = especially for microcontrollers which have bit-wide instructions like the PIC families.
These are things that the C Standard does not address, and should IMHO, as small embedded microcontrollers will eventually end up being the main market for C (yes, I know the kernel is written in C, but a lot of user-space stuff is moving to C++).
I actually joined the C committee to try and drive for changes in this area, but my sponsorship went away and it's a very expensive hobby.
A similar area is declaring a function to be an ISR.
This document shows one of the approaches we considered

C string literals linking

As far as I am aware, the linker tries to merge two string literals into one single literal, if they are both the same, e.g.:
file1.c
char const* firstString = "foo";
file2.c
char const* secondString = "foo";
Would result in only one occurence of foo\0 in the respective memory section (saving 4 Bytes). This is especially important for embedded applications (how does avr-gcc vs. gcc behave).
But I was wondering if I can actually count on this to happen, and rely, that if two strings are equal, also their pointers are equal (provided that in the whole program, you only pass string literals around and no runtime generated strings exist -- which is a reasonable assumption in my case). Obviously, I want to speed up speed comparisons with this, and allow a commonly used function to receive a string literal like so:
void lock(char const*);
void unlock(char const*);
lock("test");
dosmth();
unlock("test");
In essence, I want to avoid having a huge enum and huge switches inside the lock/unlock functions.

Actually they can't point to the same memory, since you can write to them.
Now, if you were to say:
char *s1 = "MyString";
char *s2 = "MyString";
then indeed it is possible that s1 == s2. Don't think it's guaranteed.

I would probably not count on it as it is compiler dependent on how symbols get resolved like this. Most compilers create 1 instance of the constant string and then simply reference that but I would not depend on this as there are cases when this does not work. Personally I wouldn't use strings like that in your lock and unlock method. An enum will probably serve you better.

The linker doesn't have to merge anything. It just has to map a declared type to a defined type. That involves finding the defined type and filling out the address offsets to jump to the right item.
What you are talking about would be an optimizing linker. Many linkers don't optimize at all, and those that do aren't held to an optimizing standard, so you'll never be able to generalize beyond the observed findings for the linker you discover (on that machine, at that time).

I think you are on the wrong track. Not only that the linker doesn't merge string literals from different compilation units, it most probably doesn't even create an external symbol for them at all.
Even inside the same compilation unit, the compiler may merge two occurrences of the same string literal into one, but it is not obliged to do so:
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values.
Now to the real problem that seems to be at the source of your question. Doing lock/unlock pairs based on string literals is probably not a good idea in C. As you say this has the tendency to bloat your code with switch or similar stuff.
More natural would be to have a lock type and to declare a global variable with a distinguishable name for each of your distinctive lock/unlock events. This forces you to declare each such event in a way that is visible in all your compilation units, and to pick out one compilation unit where you actually define it.

How do I know if gcc agrees that something is volatile?

Consider the following:
volatile uint32_t i;
How do I know if gcc did or did not treat i as volatile? It would be declared as such because no nearby code is going to modify it, and modification of it is likely due to some interrupt.
I am not the world's worst assembly programmer, but I play one on TV. Can someone help me to understand how it would differ?
If you take the following stupid code:
#include <stdio.h>
#include <inttypes.h>
volatile uint32_t i;
int main(void)
{
if (i == 64738)
return 0;
else
return 1;
}
Compile it to object format and disassemble it via objdump, then do the same after removing 'volatile', there is no difference (according to diff). Is the volatile declaration just too close to where its checked or modified or should I just always use some atomic type when declaring something volatile? Do some optimization flags influence this?
Note, my stupid sample does not fully match my question, I realize this. I'm only trying to find out if gcc did or did not treat the variable as volatile, so I'm studying small dumps to try to find the difference.

Many compilers in some situations don't treat volatile the way they should. See this paper if you deal much with volatiles to avoid nasty surprises: Volatiles are Miscompiled, and What to Do about It. It also contains the pretty good description of the volatile backed with the quotations from the standard.
To be 100% sure, and for such a simple example check out the assembly output.

Try setting the variable outside a loop and reading it inside the loop. In a non-volatile case, the compiler might (or might not) shove it into a register or make it a compile time constant or something before the loop, since it "knows" it's not going to change, whereas if it's volatile it will read it from the variable space every time through the loop.
Basically, when you declare something as volatile, you're telling the compiler not to make certain optimizations. If it decided not to make those optimizations, you don't know that it didn't do them because it was declared volatile, or just that it decided it needed those registers for something else, or it didn't notice that it could turn it into a compile time constant.

As far as I know, volatile helps the optimizer. For example, if your code looked like this:
int foo() {
int x = 0;
while (x);
return 42;
}
The "while" loop would be optimized out of the binary.
But if you define 'x' as being volatile (ie, volatile int x;), then the compiler will leave the loop alone.

Your little sample is inadequate to show anything. The difference between a volatile variable and one that isn't is that each load or store in the code has to generate precisely one load or store in the executable for a volatile variable, whereas the compiler is free to optimize away loads or stores of non-volatile variables. If you're getting one load of i in your sample, that's what I'd expect for volatile and non-volatile.
To show a difference, you're going to have to have redundant loads and/or stores. Try something like
int i = 5;
int j = i + 2;
i = 5;
i = 5;
printf("%d %d\n", i, j);
changing i between non-volatile and volatile. You may have to enable some level of optimization to see the difference.
The code there has three stores and two loads of i, which can be optimized away to one store and probably one load if i is not volatile. If i is declared volatile, all stores and loads should show up in the object code in order, no matter what the optimization. If they don't, you've got a compiler bug.

It should always treat it as volatile.
The reason the code is the same is that volatile just instructs the compiler to load the variable from memory each time it accesses it. Even with optimization on, the compiler still needs to load i from memory once in the code you've written, because it can't infer the value of i at compile time. If you access it repeatedly, you'll see a difference.

Any modern compiler has multiple stages. One of the fairly easy yet interesting questions is whether the declaration of the variable itself was parsed correctly. This is easy because the C++ name mangling should differ depending on the volatile-ness. Hence, if you compile twice, once with volatile defined away, the symbol tables should differ slightly.

Read the standard before you misquote or downvote. Here's a quote from n2798:
7.1.6.1 The cv-qualifiers
7 Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C.
The keyword volatile acts as a hint. Much like the register keyword. However, volatile asks the compiler to keep all its optimizations at bay. This way, it won't keep a copy of the variable in a register or a cache (to optimize speed of access) but rather fetch it from the memory everytime you request for it.
Since there is so much of confusion: some more. The C99 standard does in fact say that a volatile qualified object must be looked up every time it is read and so on as others have noted. But, there is also another section that says that what constitutes a volatile access is implementation defined. So, a compiler, which knows the hardware inside out, will know, for example, when you have an automatic volatile qualified variable and whose address is never taken, that it will not be put in a sensitive region of memory and will almost certainly ignore the hint and optimize it away.
This keyword finds usage in setjmp and longjmp type of error handling. The only thing you have to bear in mind is that: You supply the volatile keyword when you think the variable may change. That is, you could take an ordinary object and manage with a few casts.
Another thing to keep in mind is the definition of what constitutes a volatile access is left by standard to the implementation.
If you really wanted different assembly compile with optimization

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight