How does CPython match variables to values? - heap-memory

From what I read on other forum posts, the way CPython handles variables is, and please correct me if I'm wrong, the virtual machine creates a heap where values are stored, and the variables names, the identifiers are stored somewhere in the stack for that thread. So if num = 5 is in my code, the value 5 would be somewhere in the heap, and the string num would appear somewhere in the stack.
What I can't find is how does CPython know how to match some variable name from the stack for its matching value. I understand it doesn't user pointers, so how DOES it work?

In Python, every variable is stored in a dictionary.
>>> def foo():
... bar = 123
... baz = 'Mike!'
... print locals()
...
>>> foo()
{'bar': 123, 'baz': 'Mike!'}
The locals() method returns all the variables that are created in the local scope.
When you label 5 with 'num', there is a dictionary which stores the string 'num' and it's associated value of 5. Python objects are all stored in the heap, i.e. the dictionary, the string 'num', the number 5. They are all in the heap.
>>> num = 5
>>> print globals()
{'__builtins__': ..., 'num': 5, ...}
When you call a function in Python, e.g.
foo(123, 456, name='Mike', is_valid=True)
Python treats it like
foo(*(123, 456), **{'name': 'Mike', 'is_valid': True})
which maps to the following C code
PyObject_CallObject(PyObject* function, PyObject* arguments, PyObject* keywords)
i.e. foo the function is actually a Python object, and so are arguments and keywords. arguments is a Tuple, while keywords is a dictionary. They are all Python objects!
So what is left on the stack frame, just a standard C function's stack frame, which contains the return address, and pointers to 3 python objects.
If you dig into the sources, you will find that Python is a beautiful piece of work.
Reading dictionary values
Python dictionaries are Python objects which implement the Mapping protocol. The way dictionaries are stored are internal to the implementation. I've linked to the official API to read values from dictionaries. But here's a taste of it
PyObject* PyMapping_GetItemString(PyObject *o, char *key)
which is equivalent to o[key] in Python. The Python dictionary implements the function prototype above in
PyObject* PyDict_GetItemString(PyObject *p, const char *key)¶
The implementation of PyDict_GetItem is here but I'm not familiar with it. I had a quick read of the comments on the file, and the answer to your question appears to be both. A dictionary may store it's values and keys in a single block, or swap to storing them separately. My guess is one optimises for the case of array-like indexers, while the other is more like associative arrays.

Related

Statically allocate array with struct containing a string

I am writing a program that has an array which size I know, it is a fixed instruction set. Each entry in the array maps to an struct of the opcode and some metadata, such as the function that implements the opcode and its name, a String.
I need to allocate the array before I actually compute and fill the instruction set on each opcode.
Rust won't let me allocate such array statically, even if I know the size and when the initialized state does NOT require ANY pointers, as I would like all strings to be empty, which means no allocations and you could have a "zero value" of the string that is preferably fine to be copied.
Any suggestions? Is there anyway to do it?
Seems Rust really likes to you use a Vec for everything? Even when static memory, which should be preferable on a non GC program language, would have been possible in many cases such as this? This is confusing.
From the rust reference:
If the length operand has a value greater than 1 then this requires that the type of the repeat operand is Copy or that it must be a path to a constant item.
Therefore, to initialize an array using [VALUE; NUM] syntax, either the type implements the Copy trait or that VALUE is a const.
As pointed out by Chayim Friedman, you can use code like below:
const EMPTY: Instruction = Instruction { name: String::new() };
static ARRAY: [Instruction; 5] = [EMPTY; 5];
However, if you are working with a fixed set of data, it might be easier to just populate the list with predefined names, using &'static str.
If there is a need to create an array from a value that cannot be made to a constant and is not Copy, you can use std::array::from_fn like so:
let value = String::new();
let array: [_; 6] = std::array::from_fn(|_| value.clone());

In Swift why is char * becoming UnsafePointer<Int8>?

I have added files containing Objective-C classes from another project into my Swift project. In one of these Obj-C classes, there is a data member called "decodedBuffer" of type char *. I'm not well-versed in C, but I thought that a char pointer was C's way of representing a String. When I try to get this data member while in my Swift code, for instance by writing the following line
myObjCClass.decodedBuffer = "a new value"
I get an error saying decodedBuffer is of type UnsafePointer<Int8>. So I have two questions:
Why is a char * a UnsafePointer<Int8> in Swift? If anything I would expect it to be UnsafePointer<CChar>.
How do I work with this value in Swift? The library I'm using is kind of a black box, as well as being completely in Objective-C, and so I'm not 100% how the data I pass by assigning a value to decodedBuffer will be processed. Any tips on how to interface with C objects in Swift? Yes I have read the "Interacting With C API's" portion of the Swift docs, but I would like some user-gained wisdom on interfacing with C API's if possible.
Just like Swift, in C you can declare mutable (var) or immutable (let) variables.
Your C library is passing you an UnsafePointer<Int8> = an immutable value.
One way to get your solution is to access the C code you can change it a mutable value. Then you can deal with the: UnsafeMutablePointer = a mutable value.
Your second question about why is it represented as an < Int8 > type gets to the heart of pointers in C. You don't pass around strings of data into different memory locations. You pass the pointer. The pointer points to the memory location of the first character of the char array (the string) held within memory. Much more efficient.
A Swift example to show the int value of a C pointer:
let greeting: String = "Hello World."
let greetingPointer = strdup(greeting) // UnsafeMutablePointer<Int8>?
print("I am your \(greetingPointer). I point to memory location of Greeting.")

Efficiently "Wrapping" Constant Strings in C

I'm currently writing a parser in C, and one of the things that I needed when designing it was a mutable string "class" (a set of functions that operate on opaque structs representing instances), which I called my_string. Instances of the string class are little more than structs that wrap a char *, along with some metadata.
A problem arises, though, with constant strings. For example, I have several methods that return my_string * pointers, but sometimes I want to return a constant string. Consider this contrived pseudo-code:
my_string *element_get_data(my_element *el)
{
if (element_has_character_data(el))
return element_get_character_data(el); /* returns a (my_string *) */
else
return my_string_constant("#data"); /* ditto */
}
… where in some cases I want to fetch a pre-built my_string instance, but in others I want to just return the string "#data" wrapped in a my_string struct.
The problem with this code is that it creates a new (heap-allocated) my_string instance every time element_get_data(...) is called. C constant strings have nice semantics in that they're statically allocated in the program's DATA section, so every time a constant string is encountered, the address of that string is always the same.
It therefore seems silly to have several different my_string instances all pointing to the exact same char *. What's an efficient way to deduplicate this? Should I keep a hash table of const char * -> my_string * mappings? Or is there a way to use similar semantics to C constant strings? On the Mac, Core Foundation manages to do this with the CFSTR(...) macro.
The ideal solution to me would be to somehow craft a macro like my_string_constant(...) that stores a my_string struct in the DATA section of the program, so it too can be constant. Is such a thing possible?
While I was writing up this question (or rather, almost immediately after I finished), I found the answer to my question in the form of GNUStep's implementation of Core Foundation's CFSTR() macro. My similar implementation looks like this:
#define MY_STR(str) ({\
static struct { const char *buffer; my_bool shouldFree; my_bool mutable; my_bool constant; } s = {NULL, MY_FALSE, MY_FALSE, MY_TRUE};\
s.buffer = str;\
(my_string *)&s;\
})
The reason this works is because the block of code gets inlined during compile time, which means it creates a statically-allocated struct relative to the local scope. Thus, if (e.g.) a function containing MY_STR("Hello, world!") is called multiple times, the same statically-allocated struct will always be returned, resulting in our desired behavior.
This concept could easily be extended beyond things like strings, allowing you to easily create your own statically-allocated object types. Neat!

When to use array and when to use cell array?

In Matlab, I was trying to put anonymous functions in an array:
>> a=[#(k)0.1/(k+1) #(k)0.1/(k+1)^0.501]
??? Error using ==> horzcat
Nonscalar arrays of function handles are not allowed; use cell arrays
instead.
So I wonder what kinds of elements are allowed in an array, and in a cell array?
For example, I know that in an array, the elements can be numerical or strings. What else?
In short: Cell array is a heterogeneous container, regular array is homogeneous. This means that in a regular array all of the elements are of the same type, whereas in cell array, they can be different. You can read more about cell array here.
Use cell array when:
You have different types in your array
You are not sure whether in the future you might extend it to another types
You are working with objects that have an inheritance pattern
You are working with an array of strings - almost in any occasion it is preferable to char(n,m)
You have a large array, and you often update a single element in a function - Due to Matlabs copy-on-write policy
You are working with function handles (as #Pursuit explained)
Prefer regular array when:
All of the elements have the same type
You are updating the whole array in one shot - like mathematical operations.
You want to have type safety
You will not change the data type of the array in the future
You are working with mathematical matrices.
You are working with objects that have no inheritance
More explanation about copy-on-write:
When you pass an array to a function, a pointer/reference is passed.
function foo(x)
disp(x);
end
x= [1 2 3 4 5];
foo(x); %No copy is done here! A pointer is passed.
But when you change it (or a part of it), a copy is created.
function foo(x)
x(4) = x(4) + 1;
end
x= [1 2 3 4 5];
foo(x); %x is being copied! At least twice memory amount is needed.
In a cell array, only the cell is copied.
function foo(x)
x{4} = x{4} + 1;
end
x= {1 2 3 4 5}; %Only x{4} will be copied
Thus, if you call a function that changes a single element on a large array, you are making a lot of copies - that makes it slower. But in a cell array, it is not the case.
Function handles are actually the exception here, and the reason is that the Matlab syntax becomes surprising if you allow function handles to be a part of non-cell array. For example
a = #(x)x+1;
a(2); %This returns 2
But, if arrays of function handles were supported, then
b = [#(x)x+1, #(x)x+2];
b(2); %This would return #(x)x+2
b(3) = #(x)x+3; %This would extend the size of the array
So then would this be allowed?
a(2) = #(x)x+2; %Would this extend the size of the previously scalar array
Longwinded edit: This is documented in the release notes accompanying release R14, which was the first release allowing anonymous functions. Prior to R14 you could create function handles as references to m-file functions, and they could be placed in non-cell arrays. These could only be called using feval (e.g.: fnSin = #sin; output = feval(fnSin, pi)).
When anonymous functions were introduced, the Mathworks updated the syntax to allow a simpler calling convention (e.g. fnSin = #sin; output = fnSin(pi)) which had the effect of causing an ambiguity when using non-cell array of function handles. It looks like they did their best to grandfather this new behavior in, but those grandfathered conditions have certainly expired (this was 2004).
The arrays can store only data with a fixed length. For instance, double, single, char, logical, integer.
The reason is that (I guess) they are stored directly in a block of memory. On the other hand cells are stored as a list of pointers, each pointer can point to a data of different size.
That's why arrays cannot store strings, function handle, arrays, and multiple data types.
Those type can have different length. For instance 'bla' has 3 bytes, 'blabla' has 6 bytes. Therefore if they are stored in the same memory block, if you want to change 'bla' into 'blabla' you would have to shift all the rest of the memory, which would be very slow, and so it's not handled.

a few beginner C questions

I'm sort of learning C, I'm not a beginner to programming though, I "know" Java and python, and by the way I'm on a mac (leopard).
Firstly,
1: could someone explain when to use a pointer and when not to?
2:
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
And then all but the last would set indexes 0, 1, 2 and 3 to 'f', 'u', 'n' and '\0' respectively. My question is, why isn't the second one a pointer? Why char fun[4] and not char *fun[4]? And how come it seems that a pointer to a struct or an int is always an array?
3:
I understand this:
typedef struct car
{
...
};
is a shortcut for
struct car
{
...
};
typedef struct car car;
Correct? But something I am really confused about:
typedef struct A
{
...
}B;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
4. I understand what pointers do, but I don't understand what the point of them is (no pun intended). And when does something get allocated on the stack vs. the heap? How do I know where it gets allocated? Do pointers have something to do with it?
5. And lastly, know any good tutorial for C game programming (simple) ? And for mac/OS X, not windows?
PS. Is there any other name people use to refer to just C, not C++? I hate how they're all named almost the same thing, so hard to try to google specifically C and not just get C++ and C# stuff.
Thanks!!
It was hard to pick a best answer, they were all great, but the one I picked was the only one that made me understand my 3rd question, which was the only one I was originally going to ask. Thanks again!
My question is, why isn't the second one a pointer?
Because it declares an array. In the two other cases, you have a pointer that refers to data that lives somewhere else. Your array declaration, however, declares an array of data that lives where it's declared. If you declared it within a function, then data will die when you return from that function. Finally char *fun[4] would be an array of 4 pointers - it wouldn't be a char pointer. In case you just want to point to a block of 4 chars, then char* would fully suffice, no need to tell it that there are exactly 4 chars to be pointed to.
The first way which creates an object on the heap is used if you need data to live from thereon until the matching free call. The data will survive a return from a function.
The last way just creates data that's not intended to be written to. It's a pointer which refers to a string literal - it's often stored in read-only memory. If you write to it, then the behavior is undefined.
I understand what pointers do, but I don't understand what the point of them is (no pun intended).
Pointers are used to point to something (no pun, of course). Look at it like this: If you have a row of items on the table, and your friend says "pick the second item", then the item won't magically walk its way to you. You have to grab it. Your hand acts like a pointer, and when you move your hand back to you, you dereference that pointer and get the item. The row of items can be seen as an array of items:
And how come it seems that a pointer to a struct or an int is always an array?
item row[5];
When you do item i = row[1]; then you first point your hand at the first item (get a pointer to the first one), and then you advance till you are at the second item. Then you take your hand with the item back to you :) So, the row[1] syntax is not something special to arrays, but rather special to pointers - it's equivalent to *(row + 1), and a temporary pointer is made up when you use an array like that.
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
typedef struct car
{
...
};
That's not valid code. You basically said "define the type struct car { ... } to be referable by the following ordinary identifier" but you missed to tell it the identifier. The two following snippets are equivalent instead, as far as i can see
1)
struct car
{
...
};
typedef struct car car;
2)
typedef struct car
{
...
} car;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
In our case, the identifier car was declared two times in the same scope. But the declarations won't conflict because each of the identifiers are in a different namespace. The two namespaces involved are the ordinary namespace and the tag namespace. A tag identifier needs to be used after a struct, union or enum keyword, while an ordinary identifier doesn't need anything around it. You may have heard of the POSIX function stat, whose interface looks like the following
struct stat {
...
};
int stat(const char *path, struct stat *buf);
In that code snippet, stat is registered into the two aforementioned namespaces too. struct stat will refer to the struct, and merely stat will refer to the function. Some people don't like to precede identifiers always with struct, union or enum. Those use typedef to introduce an ordinary identifier that will refer to the struct too. The identifier can of course be the same (both times car), or they can differ (one time A the other time B). It doesn't matter.
3) It's bad style to use two different names A and B:
typedef struct A
{
...
} B;
With that definition, you can say
struct A a;
B b;
b.field = 42;
a.field = b.field;
because the variables a and b have the same type. C programmers usually say
typedef struct A
{
...
} A;
so that you can use "A" as a type name, equivalent to "struct A" but it saves you a lot of typing.
Use them when you need to. Read some more examples and tutorials until you understand what pointers are, and this ought to be a lot clearer :)
The second case creates an array in memory, with space for four bytes. When you use that array's name, you magically get back a pointer to the first (index 0) element. And then the [] operator then actually works on a pointer, not an array - x[y] is equivalent to *(x + y). And yes, this means x[y] is the same as y[x]. Sorry.
Note also that when you add an integer to a pointer, it's multiplied by the size of the pointed-to elements, so if you do someIntArray[1], you get the second (index 1) element, not somewhere inbetween starting at the first byte.
Also, as a final gotcha - array types in function argument lists - eg, void foo(int bar[4]) - secretly get turned into pointer types - that is, void foo(int *bar). This is only the case in function arguments.
Your third example declares a struct type with two names - struct A and B. In pure C, the struct is mandatory for A - in C++, you can just refer to it as either A or B. Apart from the name change, the two types are completely equivalent, and you can substitute one for the other anywhere, anytime without any change in behavior.
C has three places things can be stored:
The stack - local variables in functions go here. For example:
void foo() {
int x; // on the stack
}
The heap - things go here when you allocate them explicitly with malloc, calloc, or realloc.
void foo() {
int *x; // on the stack
x = malloc(sizeof(*x)); // the value pointed to by x is on the heap
}
Static storage - global variables and static variables, allocated once at program startup.
int x; // static
void foo() {
static int y; // essentially a global that can only be used in foo()
}
No idea. I wish I didn't need to answer all questions at once - this is why you should split them up :)
Note: formatting looks ugly due to some sort of markdown bug, if anyone knows of a workaround please feel free to edit (and remove this note!)
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
The first one can be set to any size you want at runtime, and be resized later - you can also free the memory when you are done.
The second one is a pointer really 'fun' is the same as char ptr=&fun[0].
I understand what pointers do, but I don't understand what the point of
them is (no pun intended). And when
does something get allocated on the
stack vs. the heap? How do I know
where it gets allocated? Do pointers
have something to do with it?
When you define something in a function like "char fun[4]" it is defined on the stack and the memory isn't available outside the function.
Using malloc (or new in C++) reserves memory on the heap - you can make this data available anywhere in the program by passing it the pointer. This also lets you decide the size of the memory at runtime and finaly the size of the stack is limited (typically 1Mb) while on the heap you can reserve all the memory you have available.
edit 5. Not really - I would say pure C. C++ is (almost) a superset of C so unless you are working on a very limited embedded system it's usualy OK to use C++.
\5. Chipmunk
Fast and lightweight 2D rigid body physics library in C.
Designed with 2D video games in mind.
Lightweight C99 implementation with no external dependencies outside of the Std. C library.
Many language bindings available.
Simple, read the documentation and see!
Unrestrictive MIT license.
Makes you smarter, stronger and more attractive to the opposite gender!
...
In your second question:
char *fun = malloc(sizeof(char) * 4);
vs
char fun[4];
vs
char *fun = "fun";
These all involve an array of 4 chars, but that's where the similarity ends. Where they differ is in the lifetime, modifiability and initialisation of those chars.
The first one creates a single pointer to char object called fun - this pointer variable will live only from when this function starts until the function returns. It also calls the C standard library and asks it to dynamically create a memory block the size of an array of 4 chars, and assigns the location of the first char in the block to fun. This memory block (which you can treat as an array of 4 chars) has a flexible lifetime that's entirely up to the programmer - it lives until you pass that memory location to free(). Note that this means that the memory block created by malloc can live for a longer or shorter time than the pointer variable fun itself does. Note also that the association between fun and that memory block is not fixed - you can change fun so it points to different memory block, or make a different pointer point to that memory block.
One more thing - the array of 4 chars created by malloc is not initialised - it contains garbage values.
The second example creates only one object - an array of 4 chars, called fun. (To test this, change the 4 to 40 and print out sizeof(fun)). This array lives only until the function it's declared in returns (unless it's declared outside of a function, when it lives for as long as the entire program is running). This array of 4 chars isn't initialised either.
The third example creates two objects. The first is a pointer-to-char variable called fun, just like in the first example (and as usual, it lives from the start of this function until it returns). The other object is a bit strange - it's an array of 4 chars, initialised to { 'f', 'u', 'n', 0 }, which has no name and that lives for as long as the entire program is running. It's also not guaranteed to be modifiable (although what happens if you try to modify it is left entirely undefined - it might crash your program, or it might not). The variable fun is initialised with the location of this strange unnamed, unmodifiable, long-lived array (but just like in the first example, this association isn't permanent - you can make fun point to something else).
The reason why there's so many confusing similarities and differences between arrays and pointers is down to two things:
The "array syntax" in C (the [] operator) actually works on pointers, not arrays!
Trying to pin down an array is a bit like catching fog - in almost all cases the array evaporates and is replaced by a pointer to its first element instead.

Resources