Could anyone explain this strange behaviour of appending to golang slices - arrays

The program below has unexpected output.
func main(){
s:=[]int{5}
s=append(s,7)
s=append(s,9)
x:=append(s,11)
y:=append(s,12)
fmt.Println(s,x,y)
}
output: [5 7 9] [5 7 9 12] [5 7 9 12]
Why is the last element of x 12?

A slice is only a window over part of an array, it has no specific storage.
This means that if you have two slices over the same part of an array, both slices must "contain" the same values.
Here's exactly what happens here :
When you do the first append, you get a new slice of size 2 over an underlying array of size 2.
When you do the next append, you get a new slice of size 3 but the underlying array is of size 4 (append usually allocates more space than the immediately needed one so that it doesn't need to allocate at every append).
This means the next append doesn't need a new array. So x and y both will use the same underlying array as the precedent slice s. You write 11 and then 12 in the same slot of this array, even if you get two different slices (remember, they're just windows).
You can check that by printing the capacity of the slice after each append :
fmt.Println(cap(s))
If you want to have different values in x and y, you should do a copy, for example like this :
s := []int{5}
s = append(s, 7)
s = append(s, 9)
x := make([]int,len(s))
copy(x,s)
x = append(x, 11)
y := append(s, 12)
fmt.Println(s, x, y)
Another solution here might have been to force the capacity of the array behind the s slice to be not greater than the needed one (thus ensuring the two following append have to use a new array) :
s := []int{5}
s = append(s, 7)
s = append(s, 9)
s = s[0:len(s):len(s)]
x := append(s, 11)
y := append(s, 12)
fmt.Println(s, x, y)
See also Re-slicing slices in Golang

dystroy explained it very well. I like to add a visual explanation to the behaviour.
A slice is only a descriptor of an array segment. It consists of a pointer to the array (ptr), the length of the segment (len), and capacity (cap).
+-----+
| ptr |
|*Elem|
+-----+
| len |
|int |
+-----+
| cap |
|int |
+-----+
So, the explanation of the code is as follow;
func main() {
+
|
s := []int{5} | s -> +-----+
| []int | ptr +-----> +---+
| |*int | [1]int| 5 |
| +-----+ +---+
| |len=1|
| |int |
| +-----+
| |cap=1|
| |int |
| +-----+
|
s = append(s,7) | s -> +-----+
| []int | ptr +-----> +---+---+
| |*int | [2]int| 5 | 7 |
| +-----+ +---+---+
| |len=2|
| |int |
| +-----+
| |cap=2|
| |int |
| +-----+
|
s = append(s,9) | s -> +-----+
| []int | ptr +-----> +---+---+---+---+
| |*int | [4]int| 5 | 7 | 9 | |
| +-----+ +---+---+---+---+
| |len=3|
| |int |
| +-----+
| |cap=4|
| |int |
| +-----+
|
x := append(s,11) | +-------------+-----> +---+---+---+---+
| | | [4]int| 5 | 7 | 9 |11 |
| | | +---+---+---+---+
| s -> +--+--+ x -> +--+--+
| []int | ptr | []int | ptr |
| |*int | |*int |
| +-----+ +-----+
| |len=3| |len=4|
| |int | |int |
| +-----+ +-----+
| |cap=4| |cap=4|
| |int | |int |
| +-----+ +-----+
|
y := append(s,12) | +-----> +---+---+---+---+
| | [4]int| 5 | 7 | 9 |12 |
| | +---+---+---+---+
| |
| +-------------+-------------+
| | | |
| s -> +--+--+ x -> +--+--+ y -> +--+--+
| []int | ptr | []int | ptr | []int | ptr |
| |*int | |*int | |*int |
| +-----+ +-----+ +-----+
| |len=3| |len=4| |len=4|
| |int | |int | |int |
| +-----+ +-----+ +-----+
| |cap=4| |cap=4| |cap=4|
| |int | |int | |int |
+ +-----+ +-----+ +-----+
fmt.Println(s,x,y)
}

Related

DataFrame column (Array type) contains Null values and empty array (len =0). How to convert Null to empty array?

I've Spark DataFrame with a Array column (StringType)
Sample DataFrame:
df = spark.createDataFrame([
[None],
[[]],
[['foo']]
]).toDF("a")
Current Output:
+-----+
| a|
+-----+
| null|
| []|
|[foo]|
+-----+
Desired Output:
+-----+
| a|
+-----+
| []|
| []|
|[foo]|
+-----+
I need to convert the Null values to an empty Array to concat with another array column.
Already tried this, but it's not working
df.withColumn("a",F.coalesce(F.col("a"),F.from_json(F.lit("[]"), T.ArrayType(T.StringType()))))
Convert null values to empty array in Spark DataFrame
Use array function.
df = spark.createDataFrame([
[None],
[[]],
[['foo']]
]).toDF("a")
import pyspark.sql.functions as F
df.withColumn('a', F.coalesce(F.col('a'), F.array(F.lit(None)))).show(10, False)
+-----+
|a |
+-----+
|[] |
|[] |
|[foo]|
+-----+
The result is now array(string), so there is no null value. Please check the results.
temp = spark.sql("SELECT a FROM table WHERE a is NULL")
temp.show(10, False)
temp = spark.sql("SELECT a FROM table WHERE a = array(NULL)")
temp.show(10, False)
temp = spark.sql("SELECT a FROM table")
temp.show(10, False)
+---+
|a |
+---+
+---+
+---+
|a |
+---+
|[] |
+---+
+-----+
|a |
+-----+
|[] |
|[] |
|[foo]|
+-----+

How to return first not empty cell from importrange values?

my google sheet excel document contain data like this
+---+---+---+---+---+---+
| | A | B | C | D | E |
+---+---+---+---+---+---+
| 1 | | c | | x | |
+---+---+---+---+---+---+
| 2 | | r | | 4 | |
+---+---+---+---+---+---+
| 3 | | | | m | |
+---+---+---+---+---+---+
| 4 | | | | | |
+---+---+---+---+---+---+
Column B and D contain data provided by IMPORTRANGE function, which are store in different files.
And i would like to fill column A with first not empty value in row, in other words: desired result must look like this:
+---+---+---+---+---+---+
| | A | B | C | D | E |
+---+---+---+---+---+---+
| 1 | c | c | | x | |
+---+---+---+---+---+---+
| 2 | r | r | | 4 | |
+---+---+---+---+---+---+
| 3 | m | | | m | |
+---+---+---+---+---+---+
| 4 | | | | | |
+---+---+---+---+---+---+
I tried ISBLANK function, but apperantly if column is imported then, even if the value is empty, is not blank, so this function dosn't work for my case. Then i tried QUERY function in 2 different variant:
1) =QUERY({B1;D1}; "select Col1 where Col1 is not null limit 1"; 0) but result in this case is wrong when row contain cells with numbers. Result with this query is following:
+---+---+---+---+---+---+
| | A | B | C | D | E |
+---+---+---+---+---+---+
| 1 | c | c | | x | |
+---+---+---+---+---+---+
| 2 | 4 | r | | 4 | |
+---+---+---+---+---+---+
| 3 | m | | | m | |
+---+---+---+---+---+---+
| 4 | | | | | |
+---+---+---+---+---+---+
2) =QUERY({B1;D1};"select Col1 where Col1 <> '' limit 1"; 0) / =QUERY({B1;D1};"select Col1 where Col1 != '' limit 1"; 0) and this dosn't work at all, result is always #N/A
Also i would like to avoid using nested IFs and javascript scripts, if possible, as solution with QUERY function suits for my case best due to easy expansion to another columns without any deeper knowladge about programming. Is there any way how to make it simply, just with QUERY, and i am just missing something, or i have to use IFs/javascript?
try:
=ARRAYFORMULA(SUBSTITUTE(INDEX(IFERROR(SPLIT(TRIM(TRANSPOSE(QUERY(
TRANSPOSE(SUBSTITUTE(B:G, " ", "♦")),,99^99))), " ")),,1), "♦", " "))
selective columns:

why `execv` can't use implicit convert from char** to char* const*?

Consider the following code:
#include <stdio.h>
#include <unistd.h>
void foo(char * const arg[]) {
printf("success\n");
}
int main() {
char myargs[2][64] = { "/bin/ls", NULL };
foo(myargs);
execv(myargs[0], myargs);
return 0;
}
Both foo and execv require char * const * argument, but while my foo works (I get success in the output) the system call execv fails.
I would like to know why. Does this have something to do with the implementation of execv?
Also, assuming I have a char** variable - how can I send it to execv?
A two-dimensional array looks like this:
char myargs[2][16];
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
I reduced the size from 64 to 16 to keep the diagram from being annoyingly big.
With an initializer, it can look like this:
char myargs[2][16] = { "/bin/ls", "" }
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Notice I didn't try to put a null pointer in the second row. It doesn't make sense to do that, since that's an array of chars. There's no place in it for a pointer.
The rows are contiguous in memory, so if you look at a lower level, it's actually more like this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
When you pass myargs to a function, the famous "array decay" produces a pointer. That looks like this:
void foo(char (*arg)[16]);
...
char myargs[2][16] = { "/bin/ls", "" }
foo(myargs);
+-----------+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| POINTER==|===>| /| b| i| n| /| l| s|\0| | | | | | | | |\0| | | | | | | | | | | | | | | |
+-----------+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The pointer is arg contains a value which locates the beginning of the array. Notice there is no pointer pointing to the second row. If foo wants to find the value in the second row, it needs to know how big the rows are so it can break down this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
into this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| /| b| i| n| /| l| s|\0| | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|\0| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
That's why arg must be char (*arg)[16] and not char **arg or the equivalent char *arg[].
The exec family of functions doesn't work with this data layout. It wants this:
+-----------+ +-----------+-----------+
| POINTER==|===>| POINTER | NULL |
+-----------+ +-----|-----+-----------+
|
/----------------------/
|
|
| +--+--+--+--+--+--+--+--+
\--->| /| b| i| n| /| l| s|\0|
+--+--+--+--+--+--+--+--+
And when you want to add more arguments, it wants this:
+-----------+ +-----------+-----------+- -+-----------+
| POINTER==|===>| POINTER | POINTER | ... | NULL |
+-----------+ +-----|-----+-----|-----+- -+-----------+
| |
/----------------------/ |
| |
| /--------------------------------/
| |
| |
| | +--+--+--+--+--+--+--+--+
\-+->| /| b| i| n| /| l| s|\0|
| +--+--+--+--+--+--+--+--+
|
| +--+--+--+--+--+--+
\->| /| h| o| m| e|\0|
+--+--+--+--+--+--+
If you compare this to the two-dimensional array diagram, hopefully you can understand why this can't be an implicit conversion. It actually involves moving stuff around in memory.
Both foo and execv require char * const * argument,
Yes.
but while my foo works (I get success in the output) the system call execv fails.
Getting the output you expect does not prove that your code is correct. The call exhibits undefined behavior because its argument does not match the parameter type, but it is plausible that that has little practical effect because the implementation of foo() does not use the parameter in any way. More generally, your code could, in principle, exhibit absolutely any behavior at all, because that's what "undefined" means.
I would like to know why. Does this have something to do with the implementation of execv?
From the standard's perspective, both calls exhibit equally undefined behavior. As a practical matter, however, we know that execv does use its arguments, so it would be much more surprising for that call to produce the behavior you expected than it is for the call to foo to produce the behavior you expected.
The main problem is that 2D arrays are arrays of arrays, and arrays are not pointers. Thus, your 2D array myargs does not at all have the correct type for an argument to either function.
Also, assuming I have a char** variable - how can I send it to execv?
You do not have such a variable in your code, but if you did have, you could cast it to the appropriate type:
char *some_args[] = { "/bin/ls", NULL };
execv((char * const *) some_args);
In practice, most compilers would probably accept it if you omitted the cast, too, although the standard does require it. Best would be to declare a variable that has the correct type in the first place:
char * const correct_args[] = { "/bin/ls", NULL };
execv(correct_args);
Note also that although arrays are not pointers, they are converted to pointers in most contexts -- which I use in the example code -- but only the top level. An array of arrays thus "decays" to a pointer to an array, not a pointer to a pointer.

Binary Search Trees Switching subtrees

So I'm trying to write a function that when given two pointers to nodes in the BST, will 'switch' the subtree locations.
typedef struct NODE {
struct NODE* parent;
struct NODE* left;
struct NODE* right;
}node_t;
This is the node struct I have for the BST.
My function goes along the line of :
void switch_subtree(node_t* a, node_t* b)
{
if (a==NULL || b==NULL)
{
return;
}
if (a->parent->left == a)
{
a->parent->left = b;
}
else
{
a->parent->right = b;
}
if (b->parent->left == b)
{
b->parent->left = a;
}
else
{
b->parent->right = a;
}
nodes * temp = a;
a->parent = b->parent;
b->parent = temp->parent;
}
However, when I run it, it does not properly switch the subtrees.
Can anyone point out any errors Im making and point me in the right direction?
Thanks!!!
Your problem is here:
nodes * temp = a;
a->parent = b->parent;
b->parent = temp->parent;
correctly it should read:
nodes * temp = a->parent;
a->parent = b->parent;
b->parent = temp;
otherwise a->parent is forever lost after line 2.
rationale
wrong approach
The line temp = a will make both pointers temp and a point to the same NODE structure:
+- > +--------+ +- > +--------+
| | | | | |
| | ... | | | ... |
| +--------+ | +--------+
| |
+--------------+ +----------------+
| |
+--------+- > +--------+ | +- > +---------+ |
| | | parent |-+ | | parent |-+
| | | ... | | | ... |
| | +--------+ | +---------+
| | |
+------+ | +---+ | +---+ |
| temp |-+ | a |-+ | b |-+
+------+ +---+ +---+
Changing a->parent in line 2 (a->parent = b->parent) will also change temp->parent as both are just different names for the same component (parent) of the same NODE structure:
+--------+ +---+- > +--------+
| | | | | |
| ... | | | | ... |
+--------+ | | +--------+
| |
| +--------------+
| |
+--------+- > +--------+ | +- > +---------+
| | | parent |---+ | | parent |
| | | ... | | | ... |
| | +--------+ | +---------+
| | |
+------+ | +---+ | +---+ |
| temp |-+ | a |-+ | b |-+
+------+ +---+ +---+
The assignment b->parent = temp->parent doesn't change anything at all, as both b->parent and temp->parent are already pointing at the same node.
- mistake !
alternative
Taking a look at the proposed alternative, temp = a->parent will leave you with the situation sketched below:
+---------+- > +--------+ +- > +--------+
| | | | | | |
| | | ... | | | ... |
| | +--------+ | +--------+
| | |
| +--------------+ +----------------+
| | |
| +- > +--------+ | +- > +---------+ |
| | | parent |-+ | | parent |-+
| | | ... | | | ... |
| | +--------+ | +---------+
| | |
+------+ | +---+ | +---+ |
| temp |-+ | a |-+ | b |-+
+------+ +---+ +---+
After a->parent = b->parent temp is still pointing to the original parent node of the node pointed to by a:
+----------- > +--------+ +- > +--------+
| | | | | |
| | ... | | | ... |
| +--------+ | +--------+
| |
| +-----+----------------+
| | |
| +- > +--------+ | +- > +---------+ |
| | | parent |-+ | | parent |-+
| | | ... | | | ... |
| | +--------+ | +---------+
| | |
+------+ | +---+ | +---+ |
| temp |-+ | a |-+ | b |-+
+------+ +---+ +---+
Finally assigning b->parent = temp will give the node pointed to by b the right parent:
+--------+-- > +--------+ +----- > +--------+
| | | | | | |
| | | ... | | | ... |
| | +--------+ | +--------+
| | |
| +-----------------|--------------------+
| | |
| +- > +--------+ | +- > +---------+ |
| | | parent |---+ | | parent |-+
| | | ... | | | ... |
| | +--------+ | +---------+
| | |
+------+ | +---+ | +---+ |
| temp |-+ | a |-+ | b |-+
+------+ +---+ +---+

Not understand struct Barnyard2's sf_ip header file

I am looking at Barnyard2's sf_ip.h source code. I am not understanding the sfip_t stuct, particulary the union block.
typedef struct _ip {
int family;
int bits;
/* see sfip_size(): these address bytes
* must be the last field in this struct */
union
{
u_int8_t u6_addr8[16];
u_int16_t u6_addr16[8];
u_int32_t u6_addr32[4];
// u_int64_t u6_addr64[2];
} ip;
#define ip8 ip.u6_addr8
#define ip16 ip.u6_addr16
#define ip32 ip.u6_addr32
// #define ip64 ip.u6_addr64
} sfip_t;
Why is it using arrays? I tried to look for documentation but Google has been of no luck. Can anyone explain what is being done here please?
A union in C uses the same memory block for all its elements. This is distinct from a structure, in which the elements are consecutive in memory.
So, while struct {int x; int y;} would be laid out thus if your variable started at memory location 0x40000000:
+-------------+
0x40000000 | x (4 bytes) |
+-------------+
0x40000004 | y (4 bytes) |
+-------------+
a related union {int x; int y;} exists like this:
Address
+-------------+-------------+
0x40000000 | x (4 bytes) | y (4 bytes) |
+-------------+-------------+
In other words, it can only be used for one thing at a time and, technically, it's undefined behaviour to use y when you last used x to set the variable - though in this case, you'll most likely find it will work since the two possibilities are the same type.
In your particular case, you have the following memory layout (assuming your variable was located at 0x40000000):
+--------------+--------------+--------------+
0x40000000 | u6_addr8[ 0] | | |
+--------------+ u6_addr16[0] | |
0x40000001 | u6_addr8[ 1] | | |
+--------------+--------------+ u6_addr32[0] |
0x40000002 | u6_addr8[ 2] | | |
+--------------+ u6_addr16[1] | |
0x40000003 | u6_addr8[ 3] | | |
+--------------+--------------+--------------+
0x40000004 | u6_addr8[ 4] | | |
+--------------+ u6_addr16[2] | |
0x40000005 | u6_addr8[ 5] | | |
+--------------+--------------+ u6_addr32[1] |
0x40000006 | u6_addr8[ 6] | | |
+--------------+ u6_addr16[3] | |
0x40000007 | u6_addr8[ 7] | | |
+--------------+--------------+--------------+
0x40000008 | u6_addr8[ 8] | | |
+--------------+ u6_addr16[4] | |
0x40000009 | u6_addr8[ 9] | | |
+--------------+--------------+ u6_addr32[2] |
0x4000000a | u6_addr8[10] | | |
+--------------+ u6_addr16[5] | |
0x4000000b | u6_addr8[11] | | |
+--------------+--------------+--------------+
0x4000000c | u6_addr8[12] | | |
+--------------+ u6_addr16[6] | |
0x4000000d | u6_addr8[13] | | |
+--------------+--------------+ u6_addr32[3] |
0x4000000e | u6_addr8[14] | | |
+--------------+ u6_addr16[7] | |
0x4000000f | u6_addr8[15] | | |
+--------------+--------------+--------------+
Assuming you understand how your particular C implementation lays out various types, this provides a way to reference the same data in different ways.

Resources