How does the Length() function in Delphi work? - arrays

In other languages like C++, you have to keep track of the array length yourself - how does Delphi know the length of my array? Is there an internal, hidden integer?
Is it better, for performance-critical parts, to not use Length() but a direct integer managed by me?

There are three kinds of arrays, and Length works differently for each:
Dynamic arrays: These are implemented as pointers. The pointer points to the first array element, but "behind" that element (at a negative offset from the start of the array) are two extra integer values that represent the array's length and reference count. Length reads that value. This is the same as for the string type.
Static arrays: The compiler knows the length of the array, so Length is a compile-time constant.
Open arrays: The length of an open array parameter is passed as a separate parameter. The compiler knows where to find that parameter, so it replaces Length with that a read of that parameter's value.

Don't forget that the layout of dynamic arrays and the like would change in a 64-bit version of Delphi, so any code that relies on finding the length at a particular offset would break.
I advise just using Length(). If you're working with it in a loop, you might want to cache it, but don't forget that a for loop already caches the terminating bounds of the loop.

Yes, there are in fact two additional fields with dynamic arrays. First is the number of elements in the array at -4 bytes offset to the first element, and at -8 bytes offset there's the reference count. See Rudy's article for a detailed explanation.
For the second question, you'd have to use SetLength for sizing dynamic arrays, so the internal 'length' field would be available anyway. I don't see much use for additional size tracking.

Since Rob Kennedy gave such a good answer to the first part of your question, I'll just address the second one:
Is it better, for performance-critical parts, to not use Length() but a direct integer managed by me?
Absolutely not. First, as Rob mentioned, the compiler does it's thing to access the information extremely quickly, either by reading a fixed offset before the start of the array in the case of dynamic ones, using a compile-time constant in the case of static ones, and passing a hidden parameter in the case of open arrays, you're not going to gain any improvement in performance.
Secondly, the direct integer managed by you wouldn't be any faster, but would actually use more memory (an additional integer allocated along with the one Delphi already provides for dynamic and open arrays, and an extra integer entirely in the case of static arrays).
Even if you directly read the value Delphi stores already for dynamic arrays, you wouldn't gain any performance over Length(), and would risk your code breaking if the internal representation of that hidden header for arrays changes in the future.

Is there an internal, hidden integer
Yes.
to not use Length() but a direct integer managed by me?
Doesn't matter.
See Dynamic arrays item in Addressing pointers article by Rudy Velthuis.
P.S. You can also hit F1 button.

Related

`System.CopyArray` vs `System.Copy`?

Apart from the obvious difference that the first deals only with arrays, don't they do the same thing? I keep reading the help pages for the two functions, and can't understand when should I use one over the other and why. Internet search seems to indicate only the second is used whatsoever for array copying purposes if not written using loops.
System.Copy is really compiler magic. It's applicable both to strings and to dynamic arrays.
Compiler chooses needed version of intrinsic routine (different for short strings, long strings, dynamic arrays) and substitutes your call of Copy.
For dynamic arrays _DynArrayCopyRange prepares memory, provides reference counting, and calls System.CopyArray for deep copy of elements.
Usually you don't need to call the last procedure expicitly, it is compiler prerogative.

Determine the actual used size of an array

I have found the usual answer for determining the size on an array - 'variable = sizeof buffer/sizeof *buffer;' or similar which gives the declared size
Other than passing a counter variable around when an array is 'filled', is there a way (a command or function - something similar to the above 1 liner) to find the used number of elements in a given array?
And I don't mean searching for an end of line character or implementing a circular buffer. I only expect between 5 and 25 Bytes in any given transaction.
Edit:- for omissions in original post (as requested)
Language C (via Atmel Studio)
8bit Bytes (non characters)
I don't know the meaning of 'strong-typed' and 'associative', but eventually it will be semi random bytes from a device and not fixed numbers like in a string
Fixed size declared array (ultimately volatile)
Thanks
The C standard generally does not provide any way for tracking or determining which or how many elements of an array are used.

Advantages of null termination vs count variable

I need to use array of graph data, i.e. struct with x and y integers. This array will be passed through many functions, and I need to decide the API choice.
typedef struct {
int x;
int y;
} GraphData_t;
How should I choose whether to use NULL-termination for the array, or supply count variable?
I have three approaches for my API:
1: loadGraph(GraphData_t *data, int count); //use count variable
2: loadGraph(GraphData_t *data); // use null-termination (or any other termination value)
typedef struct {
GraphData_t *data;
int count;
} GraphArray_t;
3: loadGraph(GraphArray_t *data); //use a struct which has integrated count variable
So far these seem equal to me. Which one would be the preferable method, and why?
As a rather old dinosaur, I will use history here.
Anyway, the size + pointer idiom is the multi purpose and bullet proof way. If in doubt, use it.
The delimited way is just more common for human beings, specially when you want to initialize an array: no need to manually count the items (with the risk of a one off mistake specially if you later add or remove elements to the initialization list), you just add the delimitor as the last element. BTW, it is the way we use lines in text files... But anyway, the sizeof(array)/sizeof(array[0]) idiom allows to easily and automatically get the size...
The NULL terminated idiom comes from the begining of micro-processors, where code was close to the hardware for performance reasons: comparison to 0 was the fastest test, and memory was expensive. And programmers began to end their constant strings with a NULL character for that reason: only one byte overhead, even if the string was longer than 256 characters. You find reference to this ASCIIZ idiom in MS/DOS 2 manuals but it had been made popular by the pair Unix and K&R C language since the 70's.
It is still convenient, and still used in C strings, but many higher level tools like C++ std::string now prefere the counted idiom which does not require one forbidden value.
For daily programming, the (null) terminated idiom should only be used when an array can only be browsed forward, and when you have no special need for the size. But beware, if you simply want to copy a null terminated array, you have to scan it twice: once for its size and once for its data.
Null termination is a convention that can only be used if the null value is excluded from the set of legal values for the entries. For example the string array argv in main has a null pointer at the end because a null pointer cannot be a legal string.
In your case, the array elements are structures with 2 int coordinates. You would need to decide what values to consider invalid for these coordinates. If all values are OK, then you must pass the number of elements explicitly. Passing the array length explicitly is preferred in all cases, as it avoids unnecessary scans. Indeed main also gets the length of the argv array as a separate int argument argc.
Whether to encapsulate the array pointer and the length in a structure is a matter of style and convenience. For complex structures, it is preferable to group all characteristics in a structure, but for a simple array, it may be more convenient to pass a pointer and a size explicitly as it allows you to apply the function to a subset of the array with loadGraph(data + i, j).
While all approaches can of course get the job done, there are some differences which may or may not be relevant for your use-case.
Null-termination is very convenient if the user needs or wants to use hard-coded arrays, because then they can just add or delete entries, without needing to worry about possibly breaking the application (unless they remove the terminator of course).
Since the size is unknown, almost every function working with a null-terminated array needs to iterate over the whole thing. This might be a problem if the array is large, and many functions usually wouldn't actually need to access all entries (or not in the order they are stored).
The terminator itself obviously needs to be a value that can never occur in your actual data. So, depending on your data, there might not be an obvious candidate to use as terminator (or even none at all).
There are probably more subtle differences which might influence your decision, but these are the first ones that came to my mind.

Allocating arrays of the same size

I'd like to allocate an array B to be of the same shape and have the same lower and upper bounds as another array A. For example, I could use
allocate(B(lbound(A,1):ubound(A,1), lbound(A,2):ubound(A,2), lbound(A,3):ubound(A,3)))
But not only is this inelegant, it also gets very annoying for arrays of (even) higher dimensions.
I was hoping for something more like
allocate(B(shape(A)))
which doesn't work, and even if this did work, each dimension would start at 1, which is not what I want.
Does anyone know how I can easily allocate an array to have the same size and bounds as another array easily for arbitrary array dimensions?
As of Fortran 2008, there is now the MOLD optional argument:
ALLOCATE(B, MOLD=A)
The MOLD= specifier works almost in the same way as SOURCE=. If you specify MOLD= and source_expr is a variable, its value need not be defined. In addition, MOLD= does not copy the value of source_expr to the variable to be allocated.
Source: IBM Fortran Ref
You can either define it in a preprocessor directive, but that will be with a fixed dimensionality:
#define DIMS3D(my_array) lbound(my_array,1):ubound(my_array,1),lbound(my_array,2):ubound(my_array,2),lbound(my_array,3):ubound(my_array,3)
allocate(B(DIMS3D(A)))
don't forget to compile with e.g. the -cpp option (gfortran)
If using Fortran 2003 or above, you can use the source argument:
allocate(B, source=A)
but this will also copy the elements of A to B.
If you are doing this a lot and think it too ugly, you could write your own subroutine to take care of it, copy_dims (template, new_array), encapsulating the source code line you show. You could even set up a generic interface so that it could handle arrays of several ranks -- see how to write wrapper for 'allocate' for an example of that concept.

OK to use a terminator to manage fixed length arrays?

I'm working in ANSI C with lots of fixed length arrays. Rather than setting an array length variable for every array, it seems easier just to add a "NULL" terminator at the end of the array, similar to character strings. Fot my current app I'm using "999999" which would never occur in the actual arrays. I can execute loops and determine array lengths just by looking for the terminator. Is this a common approach? What are the issues with it? Thanks.
This approach is technically used by your main arguments, where the last value is a terminal NULL, but it's also accompanied by an argc that tells you the size.
Using just terminals sounds like it's more prone to mistakes in the future. What's wrong with storing the size along with an array?
Something like:
struct fixed_array {
unsigned long len;
int arr[];
};
This will also be more efficient and less error-prone.
The main problem I can think of is that keeping track of the length can be useful because there are built in functions in C that take length as a parameter, and you need it to know the length to know where to add the next element too.
In reality it depends on the size of your array, if it is a huge array than you should keep track of the length. Otherwise looping through it to determine the length every time you want to add an element to the end would be very expensive. O(n) instead of the O(1) time you normally get with arrays
The main problem with this approach is that you can't know the length in advance without looping to the end of the array - and that can affect the performance quite negatively if you only want to determine the length.
Why don't you just
Initialize it with a const int that you can use later in the code to check the size, or
Use int len = sizeof(my_array) / sizeof(the_type).
Since you're using 2-dimensional arrays to hold a ragged array, you could just use a ragged array: type *my_array[];. Or you could put the length in element 0 of each row and treat the rows as 1-indexed arrays. With some evil trickery you could even put the lengths at element -1 of each row![1]
Left as exercise ;)

Resources