Related
There is a line of code inside a method similar to:
static char data[] = "123456789";
I want to fill the above data array with a million characters not just nine.
But since it is tedious to type it, I want to do that in for loop.
Is that possible to do it keeping it as "static char data[]"?
edit:
static char data[1000000];
for(int i=0; i<1000000; i++)
{
data[i] = 1;
}
There are multiple ways to achieve this in C:
you can declare the global static array as uninitialized, write an initialization function and call this function at the beginning of the program. Unlike C++, C does not have a standard way to invoke such an initialisation function at program startup time, yet some compilers might provide an extension for this.
static char data[1000000];
void init_data(void) {
//the loop below will generate the same code as
//memset(data, 1, sizeof data);
for (int i = 0; i < 1000000; i++) {
data[i] = 1;
}
}
int main() {
init_data();
...
}
you can change your program logic so the array can be initialized to 0 instead of 1. This will remove the need for an initialization function and might simplify the code and reduce the executable size.
you can create the initializer for the array using an external program and include its output:
static char data[1000000] = {
#include "init_data.def"
};
you can initialize the array using macros
#define X10(s) s,s,s,s,s,s,s,s,s,s
#define X100(s) X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s),X10(s)
#define X1000(s) X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s),X100(s)
#define X10000(s) X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s),X1000(s)
#define X100000(s) X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s),X10000(s)
static char data[1000000] = {
X100000(1), X100000(1), X100000(1), X100000(1), X100000(1),
X100000(1), X100000(1), X100000(1), X100000(1), X100000(1),
};
Note however that this approach will be a stress test for both your compiler and readers of your code. Here are some timings:
clang: 1.867s
gcc: 5.575s
tcc: 0.690s
The last 2 solutions allow for data to be defined as a constant object.
Is that possible to do it keeping it as "static char data[]"?
No, you have to specify the size explicitly. If you wish to compile-time initialize the array rather than assigning to it in run-time with a for loop or memset, you can use tricks such as this.
Another option might be to use dynamic allocation with malloc instead, but then you have to assign everything in run-time.
You can define statically allocated arrays in various ways, incidentally, this has nothing to do with the static keyword, see this if you need more information about static variables. The following discussion won't have anything to do with that, hence I will be omitting your static keyword for simplicity.
An array declared as:
char data[] = "123456789";
is allocated in the stack in the compile time. Compiler can do that since the size of the array is implicitly given with the string "123456789" to be 10 characters, 9 for the data and +1 for the terminating null character.
char data[];
On the other hand, will not compile, and your compiler will complain about missing array sizes. As I said, since this declaration allocates the array in the compile time, your compiler wants to know how much to allocate.
char data[1000000];
This on the other hand will compile just fine. Since now the compiler knows how much to allocate. And you can assign elements as you did in a for loop:
for(int i=0; i<1000000; i++)
{
data[i] = 1;
}
Note:
An array of million chars has quite a respectable size, typically 1Mb, and may overflow your stack. Whether or not that it actually will depends on pretty much everything that it can depend on, but it certainly will rise some eyebrows even if your code works fine. And eventually, if you keep increasing the size you will end up overflowing your buffer.
If you have truly large arrays you need to work with, you can allocate them on the heap, i.e., in the wast empty oceans of your ram.
The part above hopefully should have answered your question. Below is simply an alternative way to assign a fixed value, such as your (1), to a char array, instead of using for loops. This is nothing but a more convenient way (and perhaps a better practice), you are free to ignore it if it causes confusion.
#include <string.h>
#define SIZE 100000
// Create the array, at this point filled with garbage.
static char data[SIZE];
int main( void )
{
// Initialise the array: assigns *integer 1* to each element.
memset( data, 1, sizeof data )
//^___ This single line is equivalent of:
// for ( int i = 0; i < SIZE; i++ )
// {
// data[i] = 1;
// }
.
.
.
return 0;
}
I am writing my own OS and had to implement my own malloc realloc functions. However I think that what I have written may not be safe and may also cause a memory leak because the variable isn't really destroyed, its memory is set to zero, but the variable name still exists. Could someone tell me if there are any vulnerabilities in this code? The project will be added to github soon as its finished under user subado512.
Code:
void * malloc(int nbytes)
{
char variable[nbytes];
return &variable;
}
void * free(string s) {
s= (string)malloc(0);
return &s;
}
void memory_copy(char *source, char *dest, int nbytes) {
int i;
for (i = 0; i < nbytes; i++) {
*(dest + i) = *(source + i); // dest[i] = source[i]
}
}
void *realloc(string s,uint8_t i) {
string ret;
ret=(string)malloc(i);
memory_copy(s,ret,i);
free(s);
return &ret;
}
Context in which code is used : Bit of pseudo code to increase readability
string buffstr = (string) malloc(200);
uint8_t i = 0;
while(reading)
{
buffstr=(string)realloc(buffstr,i+128);
buffstr[i]=readinput();
}
The behaviour on your using the pointer returned by your malloc is undefined: you are returning the address of an array with automatic storage duration.
As a rough start, consider using a static char array to model your memory pool, and return segments of this back to the caller; building up a table of that array that is currently in use. Note that you'll have to do clever things with alignment here to guarantee that the returned void* meets the alignment requirements of any type. free will then be little more than your releasing a record in that table.
Do note that the memory management systems that a typical C runtime library uses are very sophisticated. With that in mind, do appreciate that your undertaking may be little more than a good programming exercise.
Can I use strlen of a const char* in a loop like this and expect O(n) time complexity, or would it result in O(n^2)? I think it looks cleaner than using a variable for the string length.
void send_str( const char* data ) {
for (uint8_t i = 0; i < strlen(data); i++)
send_byte( data[i] );
}
Does it depend on the optimization level?
I don't think you can ever depend on an optimization happening.
Why not do it like this, if you really want to avoid an extra variable:
void send_str(const char *data)
{
for(size_t i = strlen(data); i != 0; --i)
send_byte(*data++);
}
Or, less silly, and more like an actual production-quality C program:
void send_str(const char *data)
{
while(*data != '\0')
send_byte(*data++);
}
There's no point at all in iterating over the characters twice, so don't call strlen() at all, detect the end-of-string yourself.
The compiler might optimize that, but perhaps not to the point you want it to. Obviously depending on the compiler version and the optimization level.
you might consider using the MELT probe to understand what GCC is doing with your code (e.g. what internal Gimple representation is it transformed to by the compiler). Or gcc -fverbose-asm -O -S to get the produced assembler code.
However, for your example, it is simpler to code:
void send_str(const char*data) {
for (const char*p = data; *p != 0; p++)
send_byte(*p);
}
If the definition of send_byte() is not available when this code is compiled (eg. it's in another translation unit), then this probably can't be optimised in the way you describe. This is because the const char * pointer doesn't guarantee that the object pointed to really is const, and therefore the send_byte() function could legally modify it (send_byte() may have access to that object via a global variable).
For example, were send_byte() defined like so:
int count;
char str[10];
void send_byte(char c)
{
if (c == 'X')
str[2] = 0;
count++;
}
and we called the provided send_str() function like so:
count = 0;
strcpy(str, "AXAXAX");
send_str(str);
then the result in count must be 2. If the compiler hoisted the strlen() call in send_str() out of the loop, then it would instead be 6 - so this would not be a legal transform for the compiler to make.
The code I'm looking at is this:
for (i = 0; i < linesToFree; ++i ){
printf("Parsing line[%d]\n", i);
memset( &line, 0x00, 65 );
strcpy( line, lines[i] );
//get Number of words:
int numWords = 0;
tok = strtok(line , " \t");
while (tok != NULL) {
++numWords;
printf("Number of words is: %d\n", numWords);
println(tok);
tok = strtok(NULL, " \t");
}
}
My question centers around the use of numWords. Does the runtime system reuse this variable or does it allocate a new int every time it runs through the for loop? If you're wondering why I'm asking this, I'm a Java programmer by trade who wants to get into HPC and am therefore trying to learn C. Typically I know you want to avoid code like this, so this question is really exploratory.
I'm aware the answer is probably reliant upon the compiler... I'm looking for a deeper explanation than that. Assume the compiler of your choice.
Your conception about how this works in Java might be misinformed - Java doesn't "allocate" a new int every time through a loop like that either. Primitive type variables like int aren't allocated on the Java heap, and the compiler will reuse the same local storage for each loop iteration.
On the other hand, if you call new anything in Java every time through a loop, then yes, a new object will be allocated every time. However, you're not doing that in this case. C also won't allocate anything from the heap unless you call malloc or similar (or in C++, new).
Please note the difference between automatic and dynamic memory allocation. In Java only the latter exists.
This is automatic allocation:
int numWords = 0;
This is dynamic allocation:
int *pNumWords = malloc(sizeof(int));
*pNumWords = 0;
The dynamic allocation in C only happens explicitly (when you call malloc or its derivatives).
In your code, only the value is set to your variable, no new one is allocated.
From a performance standpoint, it's not going to matter. (Variables map to registers or memory locations, so it has to be reused.)
From a logical standpoint, yes, it will be reused because you declared it outside the loop.
From a logical standpoint:
numWords will not be reused in the outer loop because it is declared inside it.
numWords will be reused in the inner loop because it isn't declared inside.
This is what is called "block", "automatic" or "local" scope in C. It is a form of lexical scoping, i.e., a name refers to its local environment. In C, it is top down, meaning that it happens as the file is parsed and compiled and visible only after defined in the program.
When the variable goes out of scope, the lexical name is no longer valid (visible) and the memory may be reused.
The variable is declared in a local scope or a block defined by curly braces { /* block */ }. This defines a whole group of C and C99 idioms, such as:
for(int i=0; i<10; ++i){ // C99 only. int i is local to the loop
// do something with i
} // i goes out of scope here...
There are subtleties, such as:
int x = 5;
int y = x + 10; // this works
int x = y + 10;
int y = 5; // compiler error
and:
int g; // static by default and init to 0
extern int x; // defined and allocated elsewhere - resolved by the linker
int main (int argc, const char * argv[])
{
int j=0; // automatic by default
while (++j<=2) {
int i=1,j=22,k=3; // j from outer scope is lexically redefined
for (int i=0; i<10; i++){
int j=i+10,k=0;
k++; // k will always be 1 when printed below
printf("INNER: i=%i, j=%i, k=%i\n",i,j,k);
}
printf("MIDDLE: i=%i, j=%i, k=%i\n",i,j,k); // prints middle j
}
// printf("i=%i, j=%i, k=%i\n",i,j,k); compiler error
return 0;
}
There are idiosyncrasies:
In K&R C, ANSI C89, and Visual Studio, All variables must be declared at the beginning of the function or compound statement (i.e., before the first statement)
In gcc, Variables may be declared anywhere in the function or compound statement and is only visible from that point on.
In C99 and C++, Loop variables may be declared in for statement and are visible until end of loop body.
In a loop block, the allocation is performed ONCE and the RH assignment (if any) is performed each time.
In the particular example you posted, you enquired about int numWords = 0; and if a new int is allocated each time through the loop. No, there is only one int allocated in a loop block, but the right hand side of the = is executed every time. This can be demonstrated so:
#include <stdio.h>
#include <time.h>
#include <unistd.h>
volatile time_t ti(void){
return time(NULL);
}
void t1(void){
time_t t1;
for(int i=0; i<=10; i++){
time_t t2=ti(); // The allocation once, the assignment every time
sleep(1);
printf("t1=%ld:%p t2=%ld:%p\n",t1,(void *)&t1,t2,(void *)&t2);
}
}
Compile that with any gcc (clang, eclipse, etc) compatible compiler with optimizations off (-O0) or on. The address of t2 will always be the same.
Now compare with a recursive function:
int factorial(int n) {
if(n <= 1)
return 1;
printf("n=%i:%p\n",n,(void *)&n);
return n * factorial(n - 1);
}
The address of n will be different each time because a new automatic n is allocated with each recursive call.
Compare with an iterative version of factorial forced to used a loop-block allocation:
int fac2(int num) {
int r=0; // needed because 'result' goes out of scope
for (unsigned int i=1; i<=num; i++) {
int result=result*i; // only RH is executed after the first time through
r=result;
printf("result=%i:%p\n",result,(void *)&result); // address is always the same
}
return r;
}
In conclusion, you asked about int numWords = 0; inside the for loop. The variable is reused in this example.
The way the code is written, the programmer is relying on the RH of int numWords = 0; after the first to be executed and resetting the variable to 0 for use in the while loop that follows.
The scope of the numWords variable is inside the for loop. Just as Java, you can only use the variable inside the loop, so theoretically its memory would have to be freed on exit - since it is also on the stack in your case.
Any good compiler however would use the same memory and simply re-set the variable to 0 on each iteration.
If you were using a class instead of an int, you would see the destructor being called every time the for loops.
Even consider this:
class A;
A* pA = new A;
delete pA;
pA = new A;
The two objects created here will probably reside at the same memory.
It will be allocated every time through the loop (the compiler can optimize out that allocation)
for (i = 0; i < 100; i++) {
int n = 0;
printf("%d : %p\n", i, (void*)&n);
}
No guarantees all 100 lines will have the same address (though probably they will).
Edit: The C99 Standard, in 6.2.4/5 says: "[the object] lifetime extends from entry into the block with which it is associated until execution of that block ends in any way." and, in 6.8.5/5, it says that the body of a for statement is in fact a block ... so the paragraph 6.2.4/5 applies.
I have this code
#define BUFFER_LEN (2048)
static float buffer[BUFFER_LEN];
int readcount;
while ((readcount = sf_read_float(handle, buffer, BUFFER_LEN))) {
// alsa play
}
which reads BUFFER_LEN floats from buffer, and returns the number of floats it actually read. "handle" tells sf_rad_float how big buffer is.
E.g. if buffer contains 5 floats, and BUFFER_LEN is 3, readcount would first return 3, and next time 2, and the while-loop would exit.
I would like to have a function that does the same.
Update
After a lot of coding, I think this is the solution.
#include <stdio.h>
int copy_buffer(double* src, int src_length, int* src_pos,
float* dest, int dest_length) {
int copy_length = 0;
if (src_length - *src_pos > dest_length) {
copy_length = dest_length;
printf("copy_length1 %i\n", copy_length);
} else {
copy_length = src_length - *src_pos;
printf("copy_length2 %i\n", copy_length);
}
for (int i = 0; i < copy_length; i++) {
dest[i] = (float) src[*src_pos + i];
}
// remember where to continue next time the copy_buffer() is called
*src_pos += copy_length;
return copy_length;
}
int main() {
double src[] = {1,2,3,4,5};
int src_length = 5;
float dest[] = {0,0};
int dest_length = 2;
int read;
int src_pos = 0;
read = copy_buffer(src, src_length, &src_pos, dest, dest_length);
printf("read %i\n", read);
printf("src_pos %i\n", src_pos);
for (int i = 0; i < src_length; i++) {
printf("src %f\n", src[i]);
}
for (int i = 0; i < dest_length; i++) {
printf("dest %f\n", dest[i]);
}
return 0;
}
Next time copy_buffer() is called, dest contains 3,4. Running copy_buffer() again only copies the value "5". So I think it works now.
Although it is not very pretty, that I have int src_pos = 0; outside on copy_buffer().
It would be a lot better, if I instead could give copy_buffer() a unique handle instead of &src_pos, just like sndfile does.
Does anyone know how that could be done?
If you would like to create unique handles, you can do so with malloc() and a struct:
typedef intptr_t HANDLE_TYPE;
HANDLE_TYPE init_buffer_traverse(double * src, size_t src_len);
int copy_buffer(HANDLE_TYPE h_traverse, double * dest, size_t dest_len);
void close_handle_buffer_traverse(HANDLE_TYPE h);
typedef struct
{
double * source;
size_t source_length;
size_t position;
} TRAVERSAL;
#define INVALID_HANDLE 0
/*
* Returns a new traversal handle, or 0 (INVALID_HANDLE) on failure.
*
* Allocates memory to contain the traversal state.
* Resets traversal state to beginning of source buffer.
*/
HANDLE_TYPE init_buffer_traverse(double *src, size_t src_len)
{
TRAVERSAL * trav = malloc(sizeof(TRAVERSAL));
if (NULL == trav)
return INVALID_HANDLE;
trav->source = src;
trav->source_len = src_len;
trav->position = 0;
return (HANDLE_TYPE)trav;
}
/*
* Returns the system resources (memory) associated with the traversal handle.
*/
void close_handle_buffer_traverse(HANDLE_TYPE h)
{
TRAVERSAL * trav = NULL;
if (INVALID_HANDLE != h)
free((TRAVERSAL *)h);
}
int copy_buffer(HANDLE_TYPE h,
float* dest, int dest_length)
{
TRAVERSAL * trav = NULL;
if (INVALID_HANDLE == h)
return -1;
trav = (TRAVERSAL *)h;
int copy_length = trav->source_length - trav->position;
if (dest_length < copy_length)
copy_length = dest_length;
for (int i = 0; i*emphasized text* < copy_length; i++)
dest[i] = trav->source[trav->position + i];
// remember where to continue next time the copy_buffer() is called
trav->position += copy_length;
return copy_length;
}
This sort of style is what some C coders used before C++ came into being. The style involves a data structure, which contains all the data elements of our 'class'. Most API for the class takes as its first argument, a pointer to one of these structs. This pointer is similar to the this pointer. In our example this parameter was named trav.
The exception for the API would be those methods which allocate the handle type; these are similar to constructors and have the handle type as a return value. In our case named init_buffer_traverse might as well have been called construct_traversal_handle.
There are many other methods than this method for implementing an "opaque handle" value. In fact, some coders would manipulate the bits (via an XOR, for example) in order to obscure the true nature of the handles. (This obscurity does not provide security where such is needed.)
In the example given, I'm not sure (didn't look at sndlib) whether it would make most sense for the destination buffer pointer and length to be held in the handle structure or not. If so, that would make it a "copy buffer" handle rather than a "traversal" handle and you would want to change all the terminology from this answer.
These handles are only valid for the lifetime of the current process, so they are not appropriate for handles which must survive restarts of the handle server. For that, use an ISAM database and the column ID as handle. The database approach is much slower than the in-memory/pointer approach but for persistent handles, you can't use in-memory values, anyway.
On the other hand, it sounds like you are implementing a library which will be running within a single process lifetime. In which case, the answer I've written should be usable, after modifying to your requirements.
Addendum
You asked for some clarification of the similarity with C++ that I mention above. To be specific, some equivalent (to the above C code) C++ code might be:
class TRAVERSAL
{
double * source;
size_t source_length;
size_t position;
public TRAVERSAL(double *src, size_t src_len)
{
source = src;
source_length = src_len;
position = 0;
}
public int copy_buffer(double * dest, size_t dest_len)
{
int copy_length = source_length - position;
if (dest_length < copy_length)
copy_length = dest_length;
for (int i = 0; i < copy_length; i++)
dest[i] = source[position + i];
// remember where to continue next time the copy_buffer() is called
position += copy_length;
return copy_length;
}
}
There are some apparent differences. The C++ version is a little bit less verbose-seeming. Some of this is illusory; the equivalent of close_handle_buffer_traverse is now to delete the C++ object. Of course delete is not part of the class implementation of TRAVERSAL, delete comes with the language.
In the C++ version, there is no "opaque" handle.
The C version is more explicit and perhaps makes more apparent what operations are being performed by the hardware in response to the program execution.
The C version is more amenable to using the cast to HANDLE_TYPE in order to create an "opaque ID" rather than a pointer type. The C++ version could be "wrapped" in an API which accomplished the same thing while adding another layer. In the current example, users of this class will maintain a copy of a TRAVERSAL *, which is not quite "opaque."
In the function copy_buffer(), the C++ version need not mention the trav pointer because instead it implicitly dereferences the compiler-supplied this pointer.
sizeof(TRAVERSAL) should be the same for both the C and C++ examples -- with no vtable, also assuming run-time-type-identification for C++ is turned off, the C++ class contains only the same memory layout as the C struct in our first example.
It is less common to use the "opaque ID" style in C++, because the penalty for "transparency" is lowed in C++. The data members of class TRAVERSAL are private and so the TRAVERSAL * cannot be accidentally used to break our API contract with the API user.
Please note that both the opaque ID and the class pointer are vulnerable to abuse from a malicious API user -- either the opaque ID or class pointer could be cast directly to, e.g., double **, allowing the holder of the ID to change the source member directly via memory. Of course, you must trust the API caller already, because in this case the API calling code is in the same address space. In an example of a network file server, there could be security implications if "opaque ID" based on a memory address is exposed to the outside.
I would not normally make the digression into trustedness of the API user, but I want to clarify that the C++ keyword private has no "enforcement powers," it only specifies an agreement between programmers, which the compiler respects also unless told otherwise by the human.
Finally, the C++ class pointer can be converted to an opaque ID as follows:
typedef intptr_t HANDLE_TYPE;
HANDLE_TYPE init_buffer_traverse(double *src, size_t src_len)
{
return (HANDLE_TYPE)(new TRAVERSAL(src, src_len));
}
int copy_buffer(HANDLE_TYPE h_traverse, double * dest, size_t dest_len)
{
return ((TRAVERSAL *)h_traverse)->copy_buffer(dest, dest_len);
}
void close_handle_buffer_traverse(HANDLE_TYPE h)
{
delete ((TRAVERSAL *)h);
}
And now our brevity of "equivalent" C++ may be further questioned.
What I wrote about the old style of C programming which relates to C++ was not meant to say that C++ is better for this task. I only mean that encapsulation of data and hiding of implementation details could be done in C via a style that is almost isomorphic to a C++ style. This can be good to know if you find yourself programming in C but unfortunately having learned C++ first.
PS
I just noticed that our implementation to date had used:
dest[i] = (float)source[position + i];
when copying the bytes. Because both dest and source are double * (that is, they both point to double values), there is no need for a cast here. Also, casting from double to float may lose digits of precision in the floating-point representation. So this is best removed and restated as:
dest[i] = source[position + i];
I started to look at it, but you could probably do it just as well: libsndfile is open source, so one could look at how sf_read_float() works and create a function that does the same thing from a buffer. http://www.mega-nerd.com/libsndfile/ has a download link.