SystemVerilog: Are dynamic arrays (inside classes) guaranteed to be garbage-collected once the class-object is not referenced anymore? - arrays

So my question is: the "Title" and what are the garbage collection rules for dynamic arrays in SystemVerilog?
Context:
In my program, I found a bug where you can instantiate a dynamic array in a function (locally) and add elements to that array within that function, but if you don't delete the array, the entries remain there (i.e. memory and reference is preserved). So when you call the function again, all the entries that were previously entered can be accessed. The solution is to simply delete the dynamic array before you exit the function. I am assuming the array isn't deleted because the array is instantiated on the Heap instead of the Stack, and the compiler doesn't know when to garbage collect it because it could be a returned reference (please correct me if I am wrong - I am not familiar with the garbage collection rules for dynamic arrays).
However, what happens if the dynamic array is instantiated within a class (as a member variable)? How do you know if the dynamic array is deleted (i.e. reference and memory is removed)? What are the garbage collection rules for that case?
I have example code to demonstrate the issue if it is helpful but I don't think it is necessary to include it (let me know if you'd like an example).
P.S. The same thing happens for associative-arrays as well (because I think it is a form of dynamic array type in SystemVerilog).
Thanks!

SystemVerilog has three different kinds of variable lifetimes:
static -- exists for the entire life of the simulation. Initilized once at time 0. Can be referenced from outside the scope of where it's declared
automatic -- a new instance gets created and initialized for each entry to the scope where it is declared (must be a procedural scope). Its lifetime ends when exiting the scope, and all nested scopes exit (to handle fork/join_none) Can only be referenced from within the scope where it is declared
dynamic -- created by the execution of a procedural statement. It's lifetime can end a number of ways, but normally by executing a procedural statement.
Dynamically sized arrays have a compound concept of lifetimes. Individual elements have dynamic lifetimes, but the array as a whole aggregate can have any of the above lifetimes. For the purposes of your question, I think we can just consider the array as an aggregate. That means whenever the lifetime of an array variable ends, all the dynamically allocated elements are reclaimed
Class objects have dynamic lifetimes, but the class variables that hold handles referencing class objects can any of the above lifetimes. But since more than one class variable can reference the same class object, the lifetime of class object ends when there are no more class variables referencing that object. So if that class object contains dynamic array variables, those variables lifetime end when the objects lifetime ends.
SystemVerilog doesn't specify how garbage collection works. When the lifetime of something ends, you can no longer access it. There is no way to know when the memory actually gets reclaimed.
Your problem seems like you have a statically declared dynamic array inside a function, or a static function argument. In Verilog, all non-class functions have static lifetimes by default. Class methods can only have automatic lifetimes. If this explanation does not answer your question, you'll need to post some code.
BTW, this became the subject of my DVCon 2021 Paper and Presentation

Related

In the java visualizer, why do String items of an array have a pointer to them, whereas variable assignments to them don't?

I am trying to understand why the java visualizer draws pointers to array items assigned to Strings, but doesn't draw pointers from variables to a String when they were assigned to one. Here is:
array with pointers,
variable without pointers
does such a dichotomy between assigning things to Strings in the java visualizer have any implications for our programs? I am wondering if Strings are still immutable even in this pointer situation. Conceptually, is anything different happening in the way passing by is occurring between these diagrams?
I have tried seeing if this is consistent behaviour in the context of arrays, and that seems to be the case.
Arrays are Java objects that are kept in memory and can be accessed using pointers.
A reference to the array's memory location is saved in the variable when an array is assigned to it.
As a result, pointers to each individual array item, which are actually objects in memory, are displayed when the array is displayed by the Java visualizer.
When a String is assigned to a variable, a new object in memory is created, and the variable stores a reference to this new object.
Because Strings are immutable, the contents of the object cannot be changed once it is created. The Java visualizer does not display a pointer to the original String object when it displays the variable because it already points to it.
The way in which the Java visualizer shows arrays and Strings differently shouldn't have any implications for how your program works. Whether pointers are displayed or not, the strings remain immutable.
String is an immutable object that is saved in memory as a new object and accessed by reference, an array is an object that is stored in memory and accessed by reference.

Is it bad programming practice to store objects of type Foo into a static array of type Foo belonging to Foo in their construction?

Say I wanted to store objects statically inside their own class. Like this:
public class Foo
{
private static int instance_id = 0;
public static List<Foo> instances = new List<Foo>();
public Foo()
{
instances[instance_id++] = this;
}
}
Why?
I don't need to create unique array structures outside the class (one will do).
I want to map each object to a unique id according to their time of birth.
I will only have one thread with the class in use. Foo will only exist as one set in the program.
I did searching, but could find no mention of this data structure. Is this bad practice? If so, why? Thank you.
{please note, this question is not specific to any language}
There are a couple of potential problems I can see with this setup.
First, since you only have a single array of objects, if you need to update the code so that you have lots of different groups of objects in different contexts, you'll need to do a significant rewrite so that each object ends up getting associated with a different context. Depending on your setup this may not be a problem, but I suspect that in the long term this decision may come back to haunt you.
Second, this approach assumes that you never need to dispose of any objects. Imagine that you want to update your code so that you do a number of different simulations and aggregate the results. If you do this, then you'll end up having your giant array storing pointers to objects you're not using. This means that you'll (1) have a memory leak and (2) have to update all your looping code to skip over objects you no longer care about.
Third, this approach makes it the responsibility of the class, rather than the client, to keep track of all the instances. In some sense, if the purpose of what you're doing is to make it easier for clients to have access to a global list of all the objects that exist, you may want to consider just putting a different list somewhere else that's globally accessible so that the objects themselves aren't the ones responsible for keeping track of themselves.
I would recommend using one of a number of alternate approaches:
Just have the client do this. If the client needs to keep track of all the instances, just have them always create the array they need and populate it. That way, if multiple clients need different arrays, they can do so. You also avoid the memory leak issues if you do this properly.
Have each object take, as part of its constructor, a context in which to be constructed. For example, if all of these objects are nodes in a quadtree, have them take a pointer to the quadtree in which they'll live as a constructor parameter, then have the quadtree object store the list of the nodes in it. After all, it seems like it's really the quadtree's responsibility to keep track of everything.
Keep doing what you're doing, but using something with weak references. For example, you might consider using some variation on a WeakHashMap so that you do store everything, but if the objects are no longer needed, you at least don't have a memory leak.

Verifying data types/structs in a parser

I'm writing a recursive descent parser, and I'm at the point where I'm unsure how to validate everything. I'm not even sure if I should be doing this at the stage of the parser. What I mean is, I could have some syntax i.e:
int x = 5
int x = 5
And that would be valid, so would the parser check if x has already been defined? If so, would I use a hashmap? And what kind of information would I need to store, like how can I handle the scope of a variable, since x could be defined in a function in a local and global scope:
int x = 5;
void main() {
int x = 2;
}
And finally, when I store to the hashmap, how can I differentiate the types? For example, I could have a variable called foo, and a struct also called foo. So when I put foo in a hashmap, it will probably cause some errors. I'm thinking I could prefix it like storing this as the hashmaps key for a struct struct_xyz where xyz is the name of the struct, and for variables int_xyz?
Thanks :)
I'm going to assume that regardless of which approach you choose, your parser will be constructing some kind of abstract syntax tree. You now have two options. Either, the parser could populate the tree with identifier nodes that store the name of the variable or function that they are referencing. This leaves the issue of scope resolution to a later pass, as advocated in many compiler textbooks.
The other option is to have the parser immediately look the identifier up in a symbol table that it builds as it goes, and store a pointer to the symbol in the abstract syntax tree node instead. This approach tends to work well if your language doesn't allow implicit forward-references to names that haven't been declared yet.
I recently implemented the latter approach in a compiler that I'm working on, and I've been very pleased with the result so far. I will briefly describe my solution below.
Symbols are stored in a structure that looks something like this:
typedef struct symbol {
char *name;
Type *type;
Scope *scope; // Points to the scope in which the symbol was defined.
} Symbol;
So what is this Scope thing? The language I'm compiling is lexically scoped, and each function definition, block, etc, introduces a new scope. Scopes form a stack where the bottom element is the global scope. Here's the structure:
typedef struct scope {
struct scope *parent;
Symbol *buckets;
size_t nbuckets;
} Scope;
The buckets and nbuckets fields are a hash map of identifiers (strings) to Symbol pointers. By following the parent pointers, one can walk the scope stack while searching for an identifier.
With the data structures in place, it's easy to write a parser that resolves names in accordance with the rules of lexical scoping.
Upon encountering a statement or declaration that introduces a new scope (such as a function declaration or a block statement), the parser pushes a new Scope onto the stack. The new scope's parent field points to the old scope.
When the parser sees an identifier, it tries to look it up in the current scope. If the lookup fails in the current scope, it continues recursively in the parent scope, etc. If no corresponding Symbol can be found, an error is raised. If the lookup is successful, the parser creates an AST node with a pointer to the symbol.
Finally, when a variable or function declaration is encountered, it is bound in the current scope.
Some languages use more than one namespace. For instance, in Erlang, functions and variables occupy different namespaces, requiring awkward syntax like fun foo:bar/1 to get at the value of a function. This is easily implemented in the model I outlined above by keeping several Scope stacks - one for each namespace.
If we define "scope" or "context" as mapping from variable names to types (and possibly some more information, such as scope depth), then its natural implementation is either hashmap or some sort of search tree. Upon reaching any variable definition, compiler should insert the name with corresponding type into this data structure. When some sort of 'end scope' operator is encountered, we must already have enough information to 'backtrack' changes in this mapping to its previous state.
For hashmap implementation, for each variable definition we can store previous mapping for this name, and restore this mapping when we have reached the 'end of scope' operator. We should keep a stack of stacks of this changes (one stack for each currently open scope), and backtrack topmost stack of changes in the end of each scope.
One drawback of this approach is that we must either complete compilation in one pass, or store mapping for each identifier in program somewhere, as we can't inspect any scope more than once, or in order other than order of appearance in the source file (or AST).
For tree-based implemetation, this can be easily achieved with so called persistent trees. We just maintain a stack of trees, one for each scope, pushing as we 'open' some scope, and poping when the scope is ended.
The 'depth of scope' is enough for choose what to do in the situation where then new variable name conflicts with one already in mapping. Just check for old depth < new depth and overwrite on success, or report error on failure.
To differentiate between function and variable names you can use separate (yet similar or same) mappings for those objects. If some context permits only function or only variable name, you already know where to look. If both are permited in some context, perform lookup in both structures, and report "ambiguity error" if name corresponds to a function and a variable at the same time.
The best way is to use a class, where you define structures like HashMap, that lets you to do controls about the type and or the existence of a variable. This class should have static methods that interface with the grammar rules written in the parser.

Is there a way to ensure that a single reference to an object exists at a point in time?

I am not sure the practical value of such a thing, but I am wondering if in for example Java, an object can be instantiated so that if a variable holds a reference to it, no other variable can do so unless the first variable no longer does. The object could only be in a single list. Intuitively, this would correspond more to real life objects that can only be in one place at a time.

What are the internal differences of a T[] and a List<T> in terms of memory?

I was reading an article about array vs list, and the author says that an array is worse than a list, because (among other things) an array is a list of variables, but to me a list is also a list of variables. I mean, I can still do list[3] = new Item().
Actually, I have always somehow saw a List<T> like a wrapper for an array that allows me to use it easily without caring about handling its structure.
What are the internal differences between a T[] and a List<T> in terms of heap/stack memory usage?
Since an array is a static structure, after the initialization, it allocates the memory that you've demanded.
int arr[5];
For example here there are 5 int objects created in memory. But when you use lists, according to its implementation, it gives you first an array with predefined capacity. And while you are adding your elements, if you exceed the capacity then it scales up. In some implementations it just doubles its size, or in some implementations it enlarges itself when the granted capacity is half full.
The author's point about a "list of variables" wasn't about memory. It's that an array contains your internal variables, and returning it allows them to be reassigned by the caller. It comes down to this:
Only pass out an array if it is wrapped up by a read-only object.
If you pass out an internal List<T>, you have the same problem, but here's the key:
We have an extensibility model for lists because lists are classes. We
have no ability to make an “immutable array”. Arrays are what they are
and they’re never going to change.
And, at the time the article was written, the IReadOnlyList interface didn't exist yet (.NET 4.5), though he probably would have mentioned it if it had. I believe he was advocating implementing an IList<T> that would simply throw an exception if you tried to use the setter. Of course, if the user doesn't need the ability to access elements by index, you don't need a list interface at all -- you can just wrap it in a ReadOnlyCollection<T> and return it as an IEnumerable<T>.

Resources