Is strlen of a const char* optimised? - c

Can I use strlen of a const char* in a loop like this and expect O(n) time complexity, or would it result in O(n^2)? I think it looks cleaner than using a variable for the string length.
void send_str( const char* data ) {
for (uint8_t i = 0; i < strlen(data); i++)
send_byte( data[i] );
}
Does it depend on the optimization level?

I don't think you can ever depend on an optimization happening.
Why not do it like this, if you really want to avoid an extra variable:
void send_str(const char *data)
{
for(size_t i = strlen(data); i != 0; --i)
send_byte(*data++);
}
Or, less silly, and more like an actual production-quality C program:
void send_str(const char *data)
{
while(*data != '\0')
send_byte(*data++);
}
There's no point at all in iterating over the characters twice, so don't call strlen() at all, detect the end-of-string yourself.

The compiler might optimize that, but perhaps not to the point you want it to. Obviously depending on the compiler version and the optimization level.
you might consider using the MELT probe to understand what GCC is doing with your code (e.g. what internal Gimple representation is it transformed to by the compiler). Or gcc -fverbose-asm -O -S to get the produced assembler code.
However, for your example, it is simpler to code:
void send_str(const char*data) {
for (const char*p = data; *p != 0; p++)
send_byte(*p);
}

If the definition of send_byte() is not available when this code is compiled (eg. it's in another translation unit), then this probably can't be optimised in the way you describe. This is because the const char * pointer doesn't guarantee that the object pointed to really is const, and therefore the send_byte() function could legally modify it (send_byte() may have access to that object via a global variable).
For example, were send_byte() defined like so:
int count;
char str[10];
void send_byte(char c)
{
if (c == 'X')
str[2] = 0;
count++;
}
and we called the provided send_str() function like so:
count = 0;
strcpy(str, "AXAXAX");
send_str(str);
then the result in count must be 2. If the compiler hoisted the strlen() call in send_str() out of the loop, then it would instead be 6 - so this would not be a legal transform for the compiler to make.

Related

Specify layout of global variables in C

Consider this piece of code, where two global variables are defined:
int a;
int b;
As far as I know, the compiler may or may not place a and b in adjacent memory locations (please let me know if this is incorrect). For example, with GCC one may compile with -fdata-sections and reorder the two sections or whatever.
Is it possible to specify that a and b must be adjacent (in the sense that &a + 1 == &b), in either standard or GNU extended C?
Background: I am making an OpenGL loader, which is literally (omitting casts):
void (*glActiveShaderProgram)(GLuint, GLuint);
void (*glActiveTexture)(GLenum);
...
void load_gl(void (*(*loader)(char *))()) {
glActiveShaderProgram = load("glActiveShaderProgram");
glActiveTexture = load("glActiveTexture");
...
}
Simple enough, but every call to load compiles into a call to load. Since there is a relatively large number of functions to load, that can take up a lot of code space. (That is the reason I dropped glad.)
So I had something like this, which reduces binary size by ~30kB, which is extremely important for me:
char names[] = "glActiveShaderProgram glActiveTexture ...";
char *p = names, *pp;
for (int i = 0; i < COUNT; ++i) {
pp = strchr(names, ' ');
*pp = '\0';
(&glActiveShaderProgram)[i] = load(p);
p = pp + 1;
}
But this does assume the specific layout of these function pointers. Currently I wrap the function pointers in a struct which is type-punned into an array of pointers, like this:
union { struct {
void (*glActiveShaderProgram)(GLuint, GLuint);
void (*glActiveTexture)(GLenum);
...
}; void (*table[COUNT])(); } gl;
But then one #define for every function is required to make the user happy. So I wonder if there exists some more elegant way to specify the layout of global variables.
As Ted suggested in the comment. You could put the variables next to each other inside an array?
int ab[2] = {a, b};
Another way to ensure adjacent memory placement is with a packed struct. example
more info

Is it possible to assign data to this "static array" in C using for-loop?

There is a line of code inside a method similar to:
static char data[] = "123456789";
I want to fill the above data array with a million characters not just nine.
But since it is tedious to type it, I want to do that in for loop.
Is that possible to do it keeping it as "static char data[]"?
edit:
static char data[1000000];
for(int i=0; i<1000000; i++)
{
data[i] = 1;
}
There are multiple ways to achieve this in C:
you can declare the global static array as uninitialized, write an initialization function and call this function at the beginning of the program. Unlike C++, C does not have a standard way to invoke such an initialisation function at program startup time, yet some compilers might provide an extension for this.
static char data[1000000];
void init_data(void) {
//the loop below will generate the same code as
//memset(data, 1, sizeof data);
for (int i = 0; i < 1000000; i++) {
data[i] = 1;
}
}
int main() {
init_data();
...
}
you can change your program logic so the array can be initialized to 0 instead of 1. This will remove the need for an initialization function and might simplify the code and reduce the executable size.
you can create the initializer for the array using an external program and include its output:
static char data[1000000] = {
#include "init_data.def"
};
you can initialize the array using macros
#define X10(s) s,s,s,s,s,s,s,s,s,s
#define X100(s) X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s)
#define X1000(s) X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s)
#define X10000(s) X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s)
#define X100000(s) X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s)
static char data[1000000] = {
X100000(1), X100000(1), X100000(1), X100000(1), X100000(1),
X100000(1), X100000(1), X100000(1), X100000(1), X100000(1),
};
Note however that this approach will be a stress test for both your compiler and readers of your code. Here are some timings:
clang: 1.867s
gcc: 5.575s
tcc: 0.690s
The last 2 solutions allow for data to be defined as a constant object.
Is that possible to do it keeping it as "static char data[]"?
No, you have to specify the size explicitly. If you wish to compile-time initialize the array rather than assigning to it in run-time with a for loop or memset, you can use tricks such as this.
Another option might be to use dynamic allocation with malloc instead, but then you have to assign everything in run-time.
You can define statically allocated arrays in various ways, incidentally, this has nothing to do with the static keyword, see this if you need more information about static variables. The following discussion won't have anything to do with that, hence I will be omitting your static keyword for simplicity.
An array declared as:
char data[] = "123456789";
is allocated in the stack in the compile time. Compiler can do that since the size of the array is implicitly given with the string "123456789" to be 10 characters, 9 for the data and +1 for the terminating null character.
char data[];
On the other hand, will not compile, and your compiler will complain about missing array sizes. As I said, since this declaration allocates the array in the compile time, your compiler wants to know how much to allocate.
char data[1000000];
This on the other hand will compile just fine. Since now the compiler knows how much to allocate. And you can assign elements as you did in a for loop:
for(int i=0; i<1000000; i++)
{
data[i] = 1;
}
Note:
An array of million chars has quite a respectable size, typically 1Mb, and may overflow your stack. Whether or not that it actually will depends on pretty much everything that it can depend on, but it certainly will rise some eyebrows even if your code works fine. And eventually, if you keep increasing the size you will end up overflowing your buffer.
If you have truly large arrays you need to work with, you can allocate them on the heap, i.e., in the wast empty oceans of your ram.
The part above hopefully should have answered your question. Below is simply an alternative way to assign a fixed value, such as your (1), to a char array, instead of using for loops. This is nothing but a more convenient way (and perhaps a better practice), you are free to ignore it if it causes confusion.
#include <string.h>
#define SIZE 100000
// Create the array, at this point filled with garbage.
static char data[SIZE];
int main( void )
{
// Initialise the array: assigns *integer 1* to each element.
memset( data, 1, sizeof data )
//^___ This single line is equivalent of:
// for ( int i = 0; i < SIZE; i++ )
// {
// data[i] = 1;
// }
.
.
.
return 0;
}

Can I convert char*[20] to char[][20]?

I've corrected the program by myself now.
this is still the -Never- answered question:
I have a 2D array of chars that will contain a word every array. I split a char* word by word with a function to place them in the array. My problem is that it doesn't print the word but random characters. May it be a problem of pointers? I'm not sure about the conversion of char*[20] to char[][20]
because I want filter a char*spamArray[20] into a char[][20]
I need to pass char*[20] to the filter which has an argument char[][20].
This is the call:
char* spam = "this is a string";
//spam isn't actually initialized this way, but this is just for explaining what it contains
//OLD QUESTION CODE:char (*spamArray)[20] = (char(*)[20])malloc((sizeof(char) * 20) * nSpam);
//new:
char spamArray[nSpam][20];
//nSpam is the number of words
splitstring(spam, &spamArray[0], nSpam);
This is the function splitstring into words
inline void splitstring(char *str, char (*arr)[20], size_t t)
{
size_t i = 0; //index char nella stringa dell'array
while(t != 0)
{
if (*str != ' ' && *str != '\0')
{
(*arr)[i] = *str;
*str++;
i++;
}
else
{
t--;
*str++;
(*arr)[i] = '\0';
*arr++;
i = 0;
}
}
}
then I'll call a function which is for testing and printing the words in the 2D array (spamArray)
filter_mail(&txt, spamArray) //i call the function this way
void filter_mail(char **line, char spam[][20], int nSpam)
{
char *line_tmp = *line;
bool isSpam = 0;
int a = 0;
int c = 0;
while(nSpam!= 0)
{
if (spam[a][c] != '\0')
{
printf("%c", spam[a][c]);
c++;
}
else
{
a++;
c = 0;
nSpam--;
}
}
}
Then it prints random things every time and the program crashes.
Also, how should I free a spamArray?
is it correct to free it this way?
free(spamArray)
I haven't got any answer right now because everyone pointed out that using char[][] doesn't work. Well of course it doesn't. I don't even use it in the source code. That was just the title of the question. Please read everything before any other answer.
i have a 2D Array
No, you don't. 2D arrays don't exist in C99 or C11, and don't exist in C++11. BTW, even if C++17 added more containers to the C++11 and C++14 standards, they did not add matrixes.
Arrays (both in C and in C++) are always unidimensional. In some weird cases, you could have arrays of arrays (each component should have the same type, so same dimension, same size, same alignment), but this is so confusing that you should not even try.
(and your code and your numerous comments show that you are very confused. It is OK, programming is difficult to learn; you need years of work)
Can i convert char*[] to char[][]?
No, because the char[][] type does not exist and cannot exist (both in C++11 or C++14 and in C99 or C11) because arrays should have elements of the same fixed and known size and type.
Look into existing libraries (such as Glib), at least for inspiration. Study the source code of relevant free software projects (e.g. on github).
Beware of undefined behavior; it may happen that a code (like yours) is wrong but don't crash properly. Be scared of UB!
Then it prints random things every time and the program crashes
Typical case of UB (probably elsewhere in your code). You are lucky to observe a crash. Sometimes UB is much more insidious.
coding in C99 or C11
First, spend more time in reading documentation. Read several books first. Then look into some reference site. At last, read carefully the n1570 standard of C11. Allocate a week of dense work for that purpose (and don't touch your code at all during that time; perhaps carry on some tiny experiments on toy code unrelated to your project and use the debugger to understand what is going on in the computer).
You may have an array of 16-byte wide strings; I often don't do that, but if I did I prefer to name intermediate types:
typedef char sixteenbytes_ty[16];
extern sixteenbytes_ty array[];
You might code extern char array[][16]; but that is so confusing that I got it wrong -because I never do that- and you really should never code that.
This declares a global array containing elements of 16 bytes arrays. Again, I don't recommend doing that.
As a rule of thumb: never use so called "2D arrays" (in reality arrays of arrays) in C. If you need matrixes of variable dimensions (and you probably don't) implement them as an abstract data type like here.
If you manipulate data which happens to have 16 byte, make a struct of them:
struct mydata_st {
char bytes[16];
};
it is much more readable.
You may have an array of pointers, e.g. char*[] (each pointer has a fixed size, 8 bytes on my Linux/x86-64 machine, which is not the same as the allocated size of the memory zone pointed by it).
You probably should start your code entirely (and throw away your code) and think in terms of abstract data types. I strongly recommend reading SICP (freely downloadable). So first, write on paper the specification (the complete list of operations, or the interface or API of your library), using some natural language like English or Italian.
Perhaps you want some kind of vector of strings, or matrix of chars (I don't understand what you want, and you probably did not specify it clearly enough on paper).
If coding in C99, consider using some flexible array members in some of your (internal) type implementations.
Perhaps you decide that you handle some array of dynamically allocated strings (each obtained by strdup or asprintf etc...).
So perhaps you want some kind of dynamic vector of dynamically allocated types. Then define first the exact list of operations on them. Read the wikipage on flexible array members. It could be very useful.
BTW, compile with all warnings and debug info, so compile with gcc -Wall -Wextra -g your C code (if using GCC). Use the debugger gdb to understand better the behavior of your program on your system, run it step by step, query the state of the debugged process.
coding in C++11 or newer
If coding in C++11 (which is not the same language as C99) use existing types and containers. Again, read some good book (like this) more documentation and reference. C++ is a very difficult programming language, so spend several weeks in reading.
No, you can't. That's because char[][] is an array of incomplete type, thus is invalid (so it doesn't exist at all). Array elements must be of complete types, that is, the size, alignment and layout of the type must be determined at compile time.
Please stop arguing the existence of char[][]. I can assure you that it doesn't exist, at all.
Or go Google.
Fixed-length array is a candidate solution:
char[][32]
but dynamic memory allocation (with an array of pointers) is better because the size of the allocated memory is flexibly changeable. Then you can declare the function like:
void filter_mail(char **line, char spam**);
Or as suggested. At the very least, you should do it like this# (but you can't omit m):
void foo(size_t m, char (*)[m]);
You can never declare char[][] because pointer-array conversion can be done only at the top level, i.e. char*[] and char[][] (that's because of operator precedence).
I bet you don't know at all what you're doing here in splitstring():
while(t != 0)
{
if (*str != ' ' && *str != '\0')
{
*arr[c] = *str;
*str++;
c++;
}
else
{
t--;
*str++;
*arr[c] = '\0';
*arr++;
c = 0;
}
}
Because *arr[c] is equivalent to arr[c][0], you're just copying str to the first element in each string of arr. Get parentheses so it looks like (*arr)[c]. Then remove the asterisk before pointer increment (you don't use the value from dereferencing at all):
while(t != 0)
{
if (*str != ' ' && *str != '\0')
{
(*arr)[c] = *str;
str++;
c++;
}
else
{
t--;
str++;
(*arr)[c] = '\0';
arr++;
c = 0;
}
It should be fine now.
Finally, don't cast the result of malloc. Freeing spamArray
with free() is just the standard way and it should be fine.
This is a program that's a version of your program that does what you seem to want to do:
#include <stdlib.h>
#include <ctype.h>
#include <assert.h>
#include <stdio.h>
const int nSpam = 30;
char* spam = "this is a string";
char spamArray[30][20];
//nSpam is the number of words
void splitstring(char *str, char arr[][20], size_t nSpam)
{
int word_num = 0;
int char_num = 0;
int inspace = 0;
for (char *i = str; *i != '\0'; ++i)
{
if (!isspace(*i))
{
inspace = 0;
assert(word_num < nSpam);
arr[word_num][char_num++] = *i;
assert(char_num < 20);
}
else
{
if (!inspace)
{
arr[word_num++][char_num] = '\0';
char_num = 0;
inspace = 1;
}
}
}
if (!inspace)
{
arr[word_num++][char_num] = '\0';
}
while (word_num < nSpam)
{
arr[word_num++][0] = '\0';
}
}
void filter_mail(char const * const *line, char spam[][20], size_t nSpam)
{
int a = 0;
while (a < nSpam)
{
int c = 0;
while (spam[a][c] != '\0')
{
printf("%c", spam[a][c++]);
}
printf("\n");
++a;
}
}
char const * const mail_msg[] = {
"This is a line.\n",
"This is another line.\n",
0
};
int main()
{
splitstring(spam, spamArray, 30);
filter_mail(mail_msg, spamArray, 30);
return 0;
}
I warn you that this is a poor design that will cause you no end of problems. It's very much the wrong approach.
There is no need to free anything here because it's all statically allocated.

Jump directly to certain parts of a function

I would like a hint on how to complete this puzzle. I need to be able to print what is normally printed out in reverse. Normally this prints, hello there. I need it to print there hello. I am only allowed to add code where it is commented.
Here are some of my thoughts that don't work.
I thought about using the heap to store some data but I can't because stdlib.h is not included.
Can't use goto because I can't add labels and what I would goto is all in one line
Can't use recursion because there are no input parameters and no global variables to modify.
Can't possibly think of a way assembly could help but hey maybe?
I can't do anything obvious like just calling printf's and quitting the program early.
Thoughts on something to do with function pointers? I still don't see how they would help.
#include <stdio.h>
void f1(); void f2(); void f3();
int main() { f1(); printf("\n"); return 0; }
void f1() { f2(); printf(" there "); }
void f2() { f3(); printf(" hello "); }
void f3(){
int x;
//Can add whatever under here
}
I think the only purpose of int x; is to get the stack pointer without using inline assembly.
The solution on how to do this exactly will depend on your platform, the compiler you use and the optimization levels you have used.
I would say first you need to analyze the call stack.
You can do this -
int i;
for (i = 0; i< 10; i++) {
printf ("%p\n", *(void**)((char*) &x - i * 8)); // I am assumming 64 bit machines. If 32 bit replace 8 with 4
}
This will give you the top 10 8 byte values on the stack. Now you need to find the two that look like return addresses. One way to recognize them would be to print the function pointer value of f1 and f2 and see the values close to them.
You now know the indexes where they are stored. Just go ahead and swap them.
For swapping them, say the indices are 12 and 14.
Then you can do this -
*(void**)&x = *((void**)&x + 12);
*((void**)&x + 12) = *((void**)&x + 14);
*((void**)&x + 14) = *(void**)&x;
Also make sure you don't change the stack layout once you get the indices. This means don't remove/add any variables. Don't apply the & operator to any new variables (or remove from any) and don't remove any function calls.
Also another suggestion - Instead of using int x, you could declare another unsigned long long y and use that for the swap instead. Because it would have enough bytes to hold a pointer (on 64 bit machines). Usually there will be padding in case of int x too which should save you from the problem but rather be safe.
Here's an alternate solution that doesn't depend on stack manipulation. The fundamental 'trick' is that the program provides its own implementation of printf() instead of using the standard library's.
Tested on gcc (Mingw, Linux x86 and Linux x64) and MSVC:
#include <stdio.h>
void f1(); void f2(); void f3();
int main() { f1(); printf("\n"); return 0; }
void f1() { f2(); printf(" there "); }
void f2() { f3(); printf(" hello "); }
void f3(){
int x;
//Can add whatever under here
return;
}
void putstr( char const* s)
{
for (;*s;++s) {
putchar(*s);
}
}
int printf(char const* fmt, ...)
{
static char const* pushed_fmt = 0;
if (*fmt == '\n') {
putstr(fmt);
return 0;
}
if (pushed_fmt == 0) {
pushed_fmt = fmt;
return 0;
}
putstr(fmt);
putstr(pushed_fmt);
return 0;
}
I think the idea is that from f3() you have the return addresses on the stack all the way up, and nothing has been printed so far.
You'd have to play with the stack contents so that f3() returns into f1(), which would then return into f2().
Can you take it from here? Depending on the compiler, there will be different ways to accomplish this. Inline assembly might or might not be required.
EDIT: specifically for GCC, see the GCC return address intrinsics.
This should be a portable solution that doesn't mess with the stack and return addresses. Most probably not what was expected by who wrote that challenge, but it's way more fun to think out of the box.
void f3(){
int x;
//Can add whatever under here
static count = 0;
static char buf[256];
if(count==0) {
setvbuf(stdout, buf, _IOFBF, sizeof(buf));
int atexit (void (*func)(void));
atexit(f3);
count = 1;
} else {
const char *src = " there hello \n";
char *dest = buf;
for(; *src;) *dest++ = *src++;
}
}
http://ideone.com/S4zMHP
This works by first using setvbuf to replace the stdout buffer with one that we provide, and switching it to full buffering (instead of line buffering) to make sure that no flush happens before the end of the program (notice that no output has been written yet, so calling setvbuf is legal). We also call atexit to make sure we get called before the end of the program (we don't have stdlib.h, but who needs headers when the required prototypes are already known).
When we are called again (thanks to atexit), the two printf have been called, but the buffer hasn't been flushed yet. We outright replace its content with the string of our interest (which is just as big as what has been written), and return. The subsequent implicit fclose will dump out the modified content of our buffer instead of what was written by the printf.

Why isn't this zeroization code optimized out by most c compilers?

Many crypto libraries include code similar to the following snippet:
/* Implementation that should never be optimized out by the compiler */
static void optimize_proof_zeroize( void *v, size_t n )
{
volatile unsigned char *p = v;
while( n-- ) *p++ = 0;
}
But my naive implementation doesn't survive an optimizing compiler:
/* Naive zeroization implementation */
static void naive_zeroize( unsigned char *c, size_t n)
{
int i;
for( i = 0; i < n; i++ )
c[i] = 0;
}
The code is used to zeroize sensitive data before freeing the memory. Since the buffer is not used again, optimizing compilers assume that they can safely remove the zeriozation from the compiled code.
What prevents the first implemention from being optimized out?
The key word here is volatile. When a variable is declared as volatile, it tells the compiler that this variable can be modified/accessed outside that program (by hardware for example) thus it forces the compiler not to optimize that variable and access the memory each time that variable is referenced.
The usage of it in crypto is usually to clear out the secret (keys) from the stack (local variables). Since the stack is used for the local variable, the regular code (like in your /* Naive zeroization implementation */) might not seem to have any impact on the other variables/state of the program, thus the compiler might (and probably will) optimize that code out. To prevent it, the volatile qualifier is used which makes the compiler leave that code and zero the memory content of the local variables.
Edit
Example:
void decrypt(void* src, void* dest, crypto_stuff_t* params)
{
crypto_key_t decryption_key; // will hold the decryption key
....
....
// end of flow
// we want to zero the content of decryption_key, otherwise its value
// will remain on the stack
// this:
// decryption_key <-- 0;
// will be just optimized out by the compiler
// but this won't:
volatile uint8_t* key_ptr = (uint8_t*)&decryption_key;
int i;
for(i = 0; i < sizeof(crypto_key_t); i++)
key_ptr[i] = 0;
}

Resources