why `execv` can't use implicit convert from char** to char* const*?

why `execv` can't use implicit convert from char** to char* const*? - c

Consider the following code:
#include <stdio.h>
#include <unistd.h>
void foo(char * const arg[]) {
printf("success\n");
}
int main() {
char myargs[2][64] = { "/bin/ls", NULL };
foo(myargs);
execv(myargs[0], myargs);
return 0;
}
Both foo and execv require char * const * argument, but while my foo works (I get success in the output) the system call execv fails.
I would like to know why. Does this have something to do with the implementation of execv?
Also, assuming I have a char** variable - how can I send it to execv?

A two-dimensional array looks like this:
char myargs[2][16];
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
I reduced the size from 64 to 16 to keep the diagram from being annoyingly big.
With an initializer, it can look like this:
char myargs[2][16] = { "/bin/ls", "" }
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Notice I didn't try to put a null pointer in the second row. It doesn't make sense to do that, since that's an array of chars. There's no place in it for a pointer.
The rows are contiguous in memory, so if you look at a lower level, it's actually more like this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
When you pass myargs to a function, the famous "array decay" produces a pointer. That looks like this:
void foo(char (*arg)[16]);
...
char myargs[2][16] = { "/bin/ls", "" }
foo(myargs);
+-----------+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| POINTER==|===>| /| b| i| n| /| l| s|\0| | | | | | | | |\0| | | | | | | | | | | | | | | |
+-----------+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The pointer is arg contains a value which locates the beginning of the array. Notice there is no pointer pointing to the second row. If foo wants to find the value in the second row, it needs to know how big the rows are so it can break down this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
into this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
That's why arg must be char (*arg)[16] and not char **arg or the equivalent char *arg[].
The exec family of functions doesn't work with this data layout. It wants this:
+-----------+ +-----------+-----------+
| POINTER==|===>| POINTER | NULL |
+-----------+ +-----|-----+-----------+
|
/----------------------/
|
|
| +--+--+--+--+--+--+--+--+
\--->| /| b| i| n| /| l| s|\0|
+--+--+--+--+--+--+--+--+
And when you want to add more arguments, it wants this:
+-----------+ +-----------+-----------+- -+-----------+
| POINTER==|===>| POINTER | POINTER | ... | NULL |
+-----------+ +-----|-----+-----|-----+- -+-----------+
| |
/----------------------/ |
| |
| /--------------------------------/
| |
| |
| | +--+--+--+--+--+--+--+--+
\-+->| /| b| i| n| /| l| s|\0|
| +--+--+--+--+--+--+--+--+
|
| +--+--+--+--+--+--+
\->| /| h| o| m| e|\0|
+--+--+--+--+--+--+
If you compare this to the two-dimensional array diagram, hopefully you can understand why this can't be an implicit conversion. It actually involves moving stuff around in memory.

Both foo and execv require char * const * argument,
Yes.
but while my foo works (I get success in the output) the system call execv fails.
Getting the output you expect does not prove that your code is correct. The call exhibits undefined behavior because its argument does not match the parameter type, but it is plausible that that has little practical effect because the implementation of foo() does not use the parameter in any way. More generally, your code could, in principle, exhibit absolutely any behavior at all, because that's what "undefined" means.
I would like to know why. Does this have something to do with the implementation of execv?
From the standard's perspective, both calls exhibit equally undefined behavior. As a practical matter, however, we know that execv does use its arguments, so it would be much more surprising for that call to produce the behavior you expected than it is for the call to foo to produce the behavior you expected.
The main problem is that 2D arrays are arrays of arrays, and arrays are not pointers. Thus, your 2D array myargs does not at all have the correct type for an argument to either function.
Also, assuming I have a char** variable - how can I send it to execv?
You do not have such a variable in your code, but if you did have, you could cast it to the appropriate type:
char *some_args[] = { "/bin/ls", NULL };
execv((char * const *) some_args);
In practice, most compilers would probably accept it if you omitted the cast, too, although the standard does require it. Best would be to declare a variable that has the correct type in the first place:
char * const correct_args[] = { "/bin/ls", NULL };
execv(correct_args);
Note also that although arrays are not pointers, they are converted to pointers in most contexts -- which I use in the example code -- but only the top level. An array of arrays thus "decays" to a pointer to an array, not a pointer to a pointer.

Related

Valgrind warns of overlap when trying to copy a string into a struct member variable

This is how the struct looks like for reference:
struct thread_data {
struct ringbuf_t *rb;
char *file_name;
};
I need to take command line arguments and store it inside struct member variables for each thread_data element in the threads array, like so:
for (int index = optind; index < argc; index++) {
threads[length].rb = rb;
memmove(&threads[length].file_name, &argv[index], strlen(argv[index]));
strcpy(threads[length].file_name, argv[index]);
++length;
}
Prevously used memcpy and it worked when I printed the variable. However, Valgrind is giving me this:
==465645== Source and destination overlap in strcpy(0x1fff000b54, 0x1fff000b54)
==465645== at 0x4C3C180: strcpy (vg_replace_strmem.c:523)
==465645== by 0x400F85: main (bytemincer.c:55)
So I used memmove and I still got the same Valgrind result. Any solution for this?

This is what you want to end up with:
(I'm using "fn" instead of "file_name" in the post.)
*(argv[0]) # 0x2000
+---+---+- -+---+
+--------------->| | | … | 0 |
argv # 0x1000 | +---+---+- -+---+
+---------------+ |
| 0x2000 -------+ *(argv[1]) # 0x2100
+---------------+ +---+---+- -+---+
| 0x2100 -----------+----------->| | | … | 0 |
+---------------+ | +---+---+- -+---+
| 0x2200 -----------)----+
+---------------+ | | *(argv[2]) # 0x2200
| ⋮ | | | +---+---+- -+---+
| +------->| | | … | 0 |
rb # 0x3000 | | +---+---+- -+---+
+---------------+ | |
| 0x4000 -------+ | | *rb # 0x4000
+---------------+ | | | +---------------+
+---)---)------->| |
threads # 0x5000 | | | +---------------+
+---------------+ | | |
| +-----------+| | | |
|rb| 0x4000 --------+ | |
| +-----------+| | | |
|fn| 0x2100 --------)---+ |
| +-----------+| | |
+---------------+ | |
| +-----------+| | |
|rb| 0x4000 --------+ |
| +-----------+| |
|fn| 0x2200 ----------------+
| +-----------+|
+---------------+
| ⋮ |
(This assumes threads is an array rather than a pointer to an array. This doesn't affect the rest of the post.)
All addresses are made up, of course. But you can see how more than once variable have the same address for value. Because it's perfectly fine to have multiple pointers point to the same memory block. All we need to do is copy the pointer (the address).
To copy a pointer, all you need to do is
dst = src;
So all you need is
threads[length].rb = rb;
threads[length].fn = argv[index];
While
memmove(&threads[length].rb, &rb, sizeof(threads[length].rb));
memmove(&threads[length].fn, &argv[index], sizeof(threads[length].fn));
and
memmove(&threads[length].rb, &rb, sizeof(rb));
memmove(&threads[length].fn, &argv[index], sizeof(argv[index]));
are equivalent to the assignments, it doesn't make sense to do something that complicated:
(Note the use of sizeof(argv[index]) rather than strlen(argv[index]). It's the pointer we're copying, so we need the size of the pointer.)
The warning came from trying to copy the string that's in the buffer at 0x2100 into the buffer at 0x2100. Remember that threads[length].fn and argv[index] both have the same value (address) after the memmove.

What happens when I add character array with unsigned long in c?

#include <stdio.h>
#include<string.h>
void double_string(char ary[])
{
char *start = ary;
// dont exactly know what is happening here nothing is getting printed when consoled
char *end = ary + strlen(ary);
char *org = end;
while(start<org)
{
*end = *start;
start++;
end++;
}
*end = '\0';
}
int main(void) {
char word[255] = {"TACsasasa"};
double_string(word);
printf("%s",word);
return 0;
}
I am unable to understand what is getting stored in character array "*end", I tried printing it but I am not getting any output printed..

char *end = ary + strlen(ary);
This line of code is taking the starting address of the char array in memory, adding the number of bytes inside the array, returned by strlen(), and essentially moving the pointer to the end. The end of the array is the null terminator. Printing that will show nothing as it takes it as an empty string.

Adding an unsigned int (or even a signed int!) is known as pointer arithmetic. It's legal and quite common in C code. You do have to be very careful about going out of the bounds of your memory buffer, though, or you will experience undefined behavior. This is bad. Fortunately, this code appears to be quite well behaved as long as the original string is less than half the length of its memory buffer.
Allow me to try some ASCII art to see if I can make clear what's going on in the double_string function. It starts with this:
char *start = ary;
char *end = ary + strlen(ary);
char *org = end;
At this point, your pointers look like this:
start end
| |
| org
| |
v V
----------------------------------------------------------------------------------
ary | T | A | C | s | a | s | a | s | a |\0 | | | | | | | | | | | ...
----------------------------------------------------------------------------------
Then we have the loop.
while(start<org)
{
*end = *start;
start++;
end++;
}
After the first loop iteration, it looks like this:
start end
| |
| org |
| | |
v V V
----------------------------------------------------------------------------------
ary | T | A | C | s | a | s | a | s | a | T | | | | | | | | | | | ...
----------------------------------------------------------------------------------
Second iteration:
start end
| |
| org |
| | |
v V V
----------------------------------------------------------------------------------
ary | T | A | C | s | a | s | a | s | a | T | A | | | | | | | | | | ...
----------------------------------------------------------------------------------
And so on. The loop continues as long as start is less than (to the left of, in my illustration) org:
start end
| |
org |
| |
V V
----------------------------------------------------------------------------------
ary | T | A | C | s | a | s | a | s | a | T | A | C | s | a | s | a | s | a | | | ...
----------------------------------------------------------------------------------
Now start<org is no longer true, because they're equal. They point to the same location. The loop terminates. All that's left to do is terminate the string with *end = '\0';:
start end
| |
org |
| |
V V
----------------------------------------------------------------------------------
ary | T | A | C | s | a | s | a | s | a | T | A | C | s | a | s | a | s | a |\0 | | ...
----------------------------------------------------------------------------------

Not understand struct Barnyard2's sf_ip header file

I am looking at Barnyard2's sf_ip.h source code. I am not understanding the sfip_t stuct, particulary the union block.
typedef struct _ip {
int family;
int bits;
/* see sfip_size(): these address bytes
* must be the last field in this struct */
union
{
u_int8_t u6_addr8[16];
u_int16_t u6_addr16[8];
u_int32_t u6_addr32[4];
// u_int64_t u6_addr64[2];
} ip;
#define ip8 ip.u6_addr8
#define ip16 ip.u6_addr16
#define ip32 ip.u6_addr32
// #define ip64 ip.u6_addr64
} sfip_t;
Why is it using arrays? I tried to look for documentation but Google has been of no luck. Can anyone explain what is being done here please?

A union in C uses the same memory block for all its elements. This is distinct from a structure, in which the elements are consecutive in memory.
So, while struct {int x; int y;} would be laid out thus if your variable started at memory location 0x40000000:
+-------------+
0x40000000 | x (4 bytes) |
+-------------+
0x40000004 | y (4 bytes) |
+-------------+
a related union {int x; int y;} exists like this:
Address
+-------------+-------------+
0x40000000 | x (4 bytes) | y (4 bytes) |
+-------------+-------------+
In other words, it can only be used for one thing at a time and, technically, it's undefined behaviour to use y when you last used x to set the variable - though in this case, you'll most likely find it will work since the two possibilities are the same type.
In your particular case, you have the following memory layout (assuming your variable was located at 0x40000000):
+--------------+--------------+--------------+
0x40000000 | u6_addr8[ 0] | | |
+--------------+ u6_addr16[0] | |
0x40000001 | u6_addr8[ 1] | | |
+--------------+--------------+ u6_addr32[0] |
0x40000002 | u6_addr8[ 2] | | |
+--------------+ u6_addr16[1] | |
0x40000003 | u6_addr8[ 3] | | |
+--------------+--------------+--------------+
0x40000004 | u6_addr8[ 4] | | |
+--------------+ u6_addr16[2] | |
0x40000005 | u6_addr8[ 5] | | |
+--------------+--------------+ u6_addr32[1] |
0x40000006 | u6_addr8[ 6] | | |
+--------------+ u6_addr16[3] | |
0x40000007 | u6_addr8[ 7] | | |
+--------------+--------------+--------------+
0x40000008 | u6_addr8[ 8] | | |
+--------------+ u6_addr16[4] | |
0x40000009 | u6_addr8[ 9] | | |
+--------------+--------------+ u6_addr32[2] |
0x4000000a | u6_addr8[10] | | |
+--------------+ u6_addr16[5] | |
0x4000000b | u6_addr8[11] | | |
+--------------+--------------+--------------+
0x4000000c | u6_addr8[12] | | |
+--------------+ u6_addr16[6] | |
0x4000000d | u6_addr8[13] | | |
+--------------+--------------+ u6_addr32[3] |
0x4000000e | u6_addr8[14] | | |
+--------------+ u6_addr16[7] | |
0x4000000f | u6_addr8[15] | | |
+--------------+--------------+--------------+
Assuming you understand how your particular C implementation lays out various types, this provides a way to reference the same data in different ways.

How to show several loops and doing stuff in it in a sequence diagramm?

I want to show what my UserControl/Control is doing when I plug a list of data in it, what happens when the user press certain keys, selecting text etc...
I feel somehow a sequence diagramm is not really suited for showing several loops and doing stuff within the loops.
Am I wrong or how can I cope with that case?

If you are talking about a loop, then you have a series of operations that take place for all elements in the loop.
I would model the operations done in the loop as a sequence diagram by itself, if the operation in the loop are fairly complex.
I don't think we can have rules of thumb here, but when the process with the loop itself is complex, and the loop is relatively less complex, we can have them in a single sequence diagram.
If the process that has the loop is not very complex, but the loop is complex, then I would draw a sequence diagram for the operations of the loop and have a note that this entire sequence is called by a loop.
You can also have both sequence diagrams if needed.
Update:
We have to add some notes to the diagram because it is not straightforward to denote a "condition" in a sequence diagram.
The validate is part is something like
do validation
if validation succeeds
proceed to next (business or other) logic
if validation fails
feedback to user (or some other logic)
+----+ +----+ +----------------+ +----------------+
|User| | UI | | Your Validator | | Business Logic |
+----+ +----+ +----------------+ +----------------+
| select | | |
|--------------->| doValidation | |
| |------------------>|----+ |
| | | | Validate |
| | |<---+ |
| | | |
| | | (validation fails: |
| | Validation Fail | feedback to client) |
| |<------------------| |
| | | |
| | | |
| | | (validation succeeds: |
| | | proceed to |
| | | business logic) |
| | | |
| | | someLogic |
| | |----------------------->|
| | | |
UPDATE 2
Why use sequence diagram in a case as mine?
Because you still have to show the sequence of operations, and the developer still needs this information for coding :-)
With UML, as you probably already know, nothing is imposed. You are at your freedom to denote something in some fashion, provided your team also understands it the way you intended. These notes are also helpful.
I should have mentioned this before, some use an "option" fragment to denote a if else. This is more or less a note (I see it this way) but is perhaps more evident. I use them only when both the IF and the ELSE parts are both complex.
+----+ +----+ +----------------+ +----------------+
|User| | UI | | UI - Backend | | Busines Logic |
+----+ +----+ +----------------+ +----------------+
| Add Record | | |
|--------------->| doinsertOrUpdate | |
| |------------------>| |
| | | exists(record) |
| | |----------------------->|
| | | |
____|________________|___________________|________________________|__________
|[Record exists] | | | |
| | | | Get Record | |
| | | |----------------------->| |
| | | | | |
| | | |--------+ | |
| | | | | Set UI Values | |
| | | |<-------+ | |
| | | | | |
| | | | Update Record | |
| | | |----------------------->| |
| | | | | |
| | | Send Message | | |
| | |<------------------| | |
| | | "Record found, | | |
| | | Updated" | | |
|___|________________|___________________|________________________|_________|
| | | |
| | | |
______|________________|___________________|________________________|_________
| [Record does not | | | |
| exist] | | | |
| | | |--------+ | |
| | | | | Generate | |
| | | | | Seqeuence | |
| | | |<-------+ | |
| | | | | |
| | | | Create New Record | |
| | | |----------------------->| |
| | | Send Message | | |
| | |<------------------| | |
| | | "New Record | | |
| | | Created" | | |
|_____|________________|___________________|________________________|_________|
| | | |
| | | |
| | | |
See this for an example using an alt block.

How do I find the most “Naturally" direct route using A-star (A*)

I have implemented the A* algorithm in AS3 and it works great except for one thing.
Often the resulting path does not take the most “natural” or smooth route to the target.
In my environment the object can move diagonally as inexpensively as it can move horizontally or vertically.
Here is a very simple example; the start point is marked by the S, and the end (or finish) point by the F.
| | | | | | | | | |
|S| | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
|F| | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
As you can see, during the 1st round of finding, nodes [0,2], [1,2], [2,2] will all be added to the list of possible node as they all have a score of N.
The issue I’m having comes at the next point when I’m trying to decide which node to proceed with. In the example above I am using possibleNodes[0] to choose the next node. If I change this to possibleNodes[possibleNodes.length-1] I get the following path.
| | | | | | | | | |
|S| | | | | | | | |
| |x| | | | | | | |
| | |x| | | | | | |
| | | |x| | | | | |
| | |x| | | | | | |
| |x| | | | | | | |
|F| | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
And then with possibleNextNodes[Math.round(possibleNextNodes.length / 2)-1]
| | | | | | | | | |
|S| | | | | | | | |
|x| | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
x| | | | | | | | | |
|F| | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
All these paths have the same cost as they all contain the same number of steps but, in this situation, the most sensible path would be as follows...
| | | | | | | | | |
|S| | | | | | | | |
|x| | | | | | | | |
|x| | | | | | | | |
|x| | | | | | | | |
|x| | | | | | | | |
|x| | | | | | | | |
|F| | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
Is there a formally accepted method of making the path appear more sensible rather than just mathematically correct?

You need to add a Tie-breaker to your heuristic function. The problem here is that there are many paths with the same costs.
For a simple Tie-breaker that favors the direct route you can use the cross-product. I.e. if S is the start and E is the end, and X is the current position in the algorithm, you could calculate the cross-products of S-E and X-E and add a penalty to the heuristic the further it deviates from 0 (= the direct route).
In code:
dx1 = current.x - goal.x
dy1 = current.y - goal.y
dx2 = start.x - goal.x
dy2 = start.y - goal.y
cross = abs(dx1*dy2 - dx2*dy1)
heuristic += cross*0.001
See also http://theory.stanford.edu/~amitp/GameProgramming/Heuristics.html#S12, which is an excellent tutorial about A* in general.

If you want paths that look natural, you need to make sure that your costs correspond to the length on a cartesian coordinate system. That means the cost of moving diagonally should be sqrt(2) times the cost of moving vertically or horizontally.

You can add 'control effort' to the cost calculations for each square. The actor will try not to turn or change direction too much as that will add a cost to the path:
http://angryee.blogspot.com/2009/03/better-pathfinding.html

If I remember correctly, the trick to this is to add an extra parameter to the cost function (for every step between adjacent nodes, or squares in your case) that penalises turns slightly more than normal (for example, having a relative cost of greater than sqrt(2) for digonal moves). Now, there's probably a fine line between smoothing out the path and actually decreasing the optimality of the route (elongating it), however, and you're not going to be able to avoid this in any way. There's a certain trade-off you'll need to discover specific to your own application, and this can only really be achieved by testing.
There was an article on a game dev site, I believe, that detailed exactly how this could be done, but I can't seem to find it at the moment. Have a play around with your cost function anyway and see what results you get - I'm pretty sure that's the way to go.

What is more 'sensible'? Straighter? You need to quantify it properly if the algorithm is going to do anything about it.
Since moving diagonally is as inexpensive as moving horizontally/vertically, all the paths are equivalent according to all the criterion available to A*. If you want a more 'sensible' path, you need to tell the algorithm that some paths are more desirable than others, effectively weighting horizontal/vertical as 'better' than diagonal. As far as I can see, that would be altering the parameters of your environment.