PostgreSQL Array Structure - c

What's the layout of postgres array stored in memory? How can I get the real data?
For example, for array[0.1, 0.2, 0.3]::float8[], is the real data (0.1, 0.2, 0.3) stored like a standard c array? Could I use memcpy to copy an existing array? Does the pointer we get use ARR_DATA_PTR refer to the real data?

PostgreSQL uses a mutable C structure - so first n bytes contains a fixed length data and next bytes are used for data. First four bytes holds length, follow number of dimmensions, dataoffset and identification of element's data type - next data should be a bitmap that holds NULLs and after this bitmaps data are serialised.
PostgreSQL array is not compatible with C arrays - resp. in few cases C array is part of PostgreSQL array. ARR_DATA_PTR can or must not to refer to real data. Depends on current state - data should be toasted, detoasted, ...
People usually use a macros and supporting functions when work with pg arrays. There are ways for unpacking to C arrays or iteration over pg array.

Related

Difference between list and arrays

Are lists and arrays different in python?
Many articles refer to the following as an array: ar = [0]*N of N element. Which is a list. Not sure if these words are used interchangeably in python. And then there is a module called array
The terms "list" and "array" are not interchangeable in either Python or computer science (CS).
In CS an array data structure is defined, as you've already noted, as a contiguous block of data elements all of which are the same size. Python's list does not conform to this definition as each element of a Python list can not only be a completely different data type, but can even be another data structure type such as list, dictionary, or set.
Python stores each element of a list as a separate data object, and the references to those objects are stored in Python's list data type. My past reading indicates that the list is stored as an array, but I've measured the list's time complexity and I suspect that it may technically be a hash table.
Python's lists may be ordered, and you can use and access a Python list as if it were an array as I frequently do, but don't be confused by Python's use of square brackets: it's not an array.
Python's array module implements an actual array data type for storing numerical data (actually, I think you can store character data as well).

Are there Erlang arrays "with a defined representation"?

Context:
Erlang programs running on heterogeneous nodes, retrieving and storing data
from Mnesia databases. These database entries are meant to be used for a long
time (e.g. across multiple Erlang version releases) remains in the form of
Erlang objects (i.e. no serialization). Among the information stored, there are
currently two uses for arrays:
Large (up to 16384 elements) arrays. Fast access to an element
using its index was the basis for choosing this type of collection.
Once the array has been created, the elements are never modified.
Small (up to 64 elements) arrays. Accesses are mostly done using indices, but there are also some iterations (foldl/foldr). Both reading and replacement of the elements is done frequently. The size of the collection remains constant.
Problem:
Erlang's documentation on arrays states that "The representation is not
documented and is subject to change without notice." Clearly, arrays should not be used in my context: database entries containing arrays may be
interpreted differently depending on the node executing the program and
unannounced changes to how arrays are implemented would make them unusable.
I have noticed that Erlang features "ordsets"/"orddict" to address a similar
issue with "sets"/"dict", and am thus looking for the "array" equivalent. Do you know of any? If none exists, my strategy is likely going to be using lists of lists to replace my large arrays, and orddict (with the index as key) to replace the smaller ones. Is there a better solution?
An array is a tuple of nested tuples and integers, with each tuple being a fixed size of 10 and representing a segment of cells. Where a segment is not currently used an integer (10) acts as a place holder. This without the abstraction is I suppose the closet equivalent.You could indeed copy the array module from otp and add to your own app and thus it would be a stable representation.
As to what you should use devoid of array depends on the data and what you will do with it. If data that would be in your array is fixed, then a tuple makes since, it has constant access time for reads/lookups. Otherwise a list sounds like a winner, be it a list of lists, list of tuples, etc. However, once again, that's a shot in the dark, because I don't know your data or how you use it.
See the implementation here: https://github.com/erlang/otp/blob/master/lib/stdlib/src/array.erl
Also see Robert Virding's answer on the implementation of array here: Arrays implementation in erlang
And what Fred Hebert says about the array in A Short Visit to Common Data Structures
An example showing the structure of an array:
1> A1 = array:new(30).
{array,30,0,undefined,100}
2> A2 = array:set(0, true, A1).
{array,30,0,undefined,
{{true,undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
10,10,10,10,10,10,10,10,10,10}}
3> A3 = array:set(19, true, A2).
{array,30,0,undefined,
{{true,undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
{undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,true},
10,10,10,10,10,10,10,10,10}}
4>

Data structure - Array

Here it says:
Arrays are useful mostly because the element indices can be computed
at run time. Among other things, this feature allows a single
iterative statement to process arbitrarily many elements of an array.
For that reason, the elements of an array data structure are required
to have the same size and should use the same data representation.
Is this still true for modern languages?
For example, Java, you can have an array of Objects or Strings, right? Each object or string can have different length. Do I misunderstand the above quote, or languages like Java implements Array differently? How?
In java all types except primitives are referenced types meaning they are a pointer to some memory location manipulated by JVM.
But there are mainly two types of programming languages, fixed-typed like Java and C++ and dynamically-typed like python and PHP. In fixed-typed languages your array should consist of the same types whether String, Object or ...
but in dynamically-typed ones there's a bit more abstraction and you can have different data types in array (I don't know the actual implementation though).
An array is a regular arrangement of data in memory. Think of an array of soldiers, all in a line, with exactly equal spacing between each man.
So they can be indexed by lookup from a base address. But all items have to be the same size. So if they are not, you store pointers or references to make them the same size. All languages use that underlying structure, except for what are sometimes called "associative arrays", indexed by key (strings usually), where you have what is called a hash table. Essentially the hash function converts the key into an array index, with a fix-up to resolve collisions.

What is the difference between an Array Data Structure and an Array Data-type in the context of a programming language like C?

Wikipedia differentiates an Array Data Structure and an Array Data-type.
What is the difference between an Array Data Structure and an Array Data-type in the context of a programming language like C?
What is this : int array[]={1, 2, 3, 4, 5}; ?
Is it an Array Data Structure or an Array Data-type? Why?
Short answer: Do yourself a favor and just ignore both articles. I don't doubt the good intentions of the authors, but the articles are confusing at best.
What is this : int array[]={1, 2, 3, 4, 5}; ?
Is it an Array Data Structure or an Array Data-type? Why?
It's both. The array data structure discussed in the article by that name is supposed to relate specifically to arrays as implemented in C. The array data type concept is supposed to be more abstract, but C arrays certainly are one implementation of array data type.
Long answer: The difference those two articles consider is the difference between behavior and implementation. As used in the articles, array data structure refers to elements stored sequentially in memory, so that you can calculate the address of any element by:
address = (base address) + (element index * size of a single element)
where 'base address' is the address of the element at index 0.
Array data type, on the other hand, refers to any data type that provides a logical sequence of elements accessed by index. For example, C++ provides std::vector, and Objective-C provides NSArray and NSMutableArray, none of which are likely to be implemented as a contiguous sequence of elements in memory.
The terminology used in the articles isn't very helpful. The definition given at the top of the array data structure article is:
an array data structure or simply array is a data structure consisting
of a collection of elements (values or variables), each identified by
at least one index
while the definition given for array data type is:
an array type is a data type that is meant to describe a collection of
elements (values or variables), each selected by one or more indices
that can be computed at run time
It doesn't help that the array data structure article, which is apparently supposed to be about the C-style implementation of arrays, includes discussion of associative arrays and other material that would be far more appropriate in the array data type article. You can learn why this is by reading the discussion page, particularly Proposal to split the article and Array structure. The only thing that's clear about these articles is that the various authors can't make up their collective mind about how 'array' should be defined and explained.
A type is something that the programmer sees; a data structure is how something is implemented behind the scenes. It's conceivable that an array type is implemented behind the scenes with e.g. a hashtable (this is the case for PHP, I think).
In C, there is no distinction; an array type must be implemented with a contiguous block of memory.
The structure of your array determines how the array is implemented (storage and access), the data type refers to the types of data contain within the array. For your reading pleasure read each of these links.
Brackets [] is how you designate an Array Data Type in C
Similary, * is how you designate a Pointer Data Type in C
int array[]={1, 2, 3, 4, 5}; is an example of an Array Data Structure in C
Specifically, you have defined a data structure which has 5 integers arranged contiguously, you have allocated sufficient memory on the stack for that data structure, and you have initialized that data structure with values 1, 2, 3, 4, 5.
A Data Structure in C has a non-zero size which can be found by calling sizeof() on an instance of that structure.

How can I efficiently copy 2-dimensional arrays of bytes into a larger 2D array?

I have a structure called Patch that represents a 2D array of data.
newtype Size = (Int, Int)
data Patch = Patch Size Strict.ByteString
I want to construct a larger Patch from a set of smaller Patches and their assigned positions. (The Patches do not overlap.) The function looks like this:
newtype Position = (Int, Int)
combinePatches :: [(Position, Patch)] -> Patch
combinePatches plan = undefined
I see two sub-problems. First, I must define a function to translate 2D array copies into a set of 1D array copies. Second, I must construct the final Patch from all those copies.
Note that the final Patch will be around 4 MB of data. This is why I want to avoid a naive approach.
I'm fairly confident that I could do this horribly inefficiently, but I would like some advice on how to efficiently manipulate large 2D arrays in Haskell. I have been looking at the "vector" library, but I have never used it before.
Thanks for your time.
If the spec is really just a one-time creation of a new Patch from a set of previous ones and their positions, then this is a straightforward single-pass algorithm. Conceptually, I'd think of it as two steps -- first, combine the existing patches into a data structure with reasonable lookup for any give position. Next, write your new structure lazily by querying the compound structure. This should be roughly O(n log(m)) -- n being the size of the new array you're writing, and m being the number of patches.
This is conceptually much simpler if you use the Vector library instead of a raw ByteString. But it is simpler still if you simply use Data.Array.Unboxed. If you need arrays that can interop with C, then use Data.Array.Storable instead.
If you ditch purity, at least locally, and work with an ST array, you should be able to trivially do this in O(n) time. Of course, the constant factors will still be worse than using fast copying of chunks of memory at a time, but there's no way to keep that code from looking low-level.

Resources