How to return an array, without allocating it, if size is unknown to caller?

How to return an array, without allocating it, if size is unknown to caller? - c

Consider following function:
int get_something (int* array, int size);
It's purpose is to fill the passed array[size] with data from external resource (queryes to resource are expensive). The question is what to do, if resource has more elements, than provided array can handle? What is the best approach?
Edit: Current solution added:
Our approach, at the moment is following:
When user calls get_something() first time, with null argument we perform a full Query, allocate data in a cache (which is just a key-value storage) and return a number of items.
When user calls get_something() next time, with properly initialized buffer, we return him data from cache and clear a cache entry.
If user does not call get_something(), timeout occurs and cache for that item gets freed.
If user calls get_something() too late, and data has been cleared, we generate error state, so user knows that he has to repeat the request.

One option is to not modify the array at all and instead return the needed size as the return result. The caller must then call your function again with an array of at least this size.

Ok, your basic requirement is to Query a resource, and cache the returned data in memory, to avoid multiple accesses.
That means you will have to allocate memory within your program to store all of the data.
Problem #1 is to populate that cache. I will assume that you have that figured out, and there is some function get_resource();
problem #2 is how to design an api to allow client/user code to interact with that data.
In your example you you are using an array allocated by the client, as the cache, hoping to solve both problems with 1 buffer, but this doesn't solve the problem in all cases ( hence your posting ). So you really need to separate the 2 problems.
Model number #1 is to provide iterator / cursor functionality
iterator = get_something(); // Triggers caching of data from Resource
data = get_next_single_something( iterator );
status = release_something( iterator );
// The logic to release the data could be done automagically in get_next,
// after returning the last single_something, if that is always the use case.
Model #2 is to return the Whole object in a malloced buffer, and let the client manage the whole thing
data_type *pData=NULL;
unsigned size = get_something( &pData ); // Triggers caching of data from Resource
process( size, pData );
free( pData );
pData=NULL;
Model #3. If you are married to the client array, you can use Model #1 to return multiple values at once, but if there are more values, then get_something() will have to build a cache, and the client will still have to iterate.

Use realloc .
Reference link .

My choice would be to use the same model as fread() and turn the interface into a stream of sorts.
i.e.
either fill the buffer up or put all the items in it and return the number of items actually read
maintain some sort of state so that subsequent calls only get unread items
return 0 once all the items have been read
return a negative number if an error occurs.

allocate array dynamically i.e using malloc() and then, in the function, either use realloc() or free the previous list and allocate another, fill it and return the new size. For the second approach you can use the return value for returning new size but to update the callers address of array you will need to change the function to accept int** instead of int*

Can you do a check on how many elements the resource has? If so I'd do that then copy the array to an array as large as the resource.
or perhaps copying the array to an array double its size when you're reaching near the end?
http://www.devx.com/tips/Tip/13291

That depends on how your program should handle that situation, I guess.
One approach could be to fill the array to it's maximum, and return the total number of elements which are available. That way the caller could check if he needs to call the function again (see Mark Byers answer).
Logic behind that:
- Creates array with 100 items
- Calls your function and gets 150 returned
- Increases the array size or creates a second one
and calls your function again with that array
- Repeats that unless the returned item count is
equal or less the array size

Related

Why does `append(x[:0:0], x...)` copy a slice into a new backing array in Go?

On Go's slice tricks wiki and Go libraries (e.g., this example), you sometimes see code like the following to copy a slice into a new backing array.
// In a library at the end of a function perhaps...
return append(whateverSlice[:0:0], whateverSlice...)
// In an assignment, as in the wiki example...
b = append(a[:0:0], a...)
Here's what I think I understand:
All of the items in the slice that is the second parameter to append are copied over to a new backing array.
In the first parameter to append, the code uses a full slice expression. (We can rewrite the first parameter as a[0:0:0], but the first 0 will be supplied if omitted. I assume that's not relevant to the larger meaning here.)
Based on the spec, the resulting slice should have the same type as the original, and it should have a length and capacity of zero.
(Again, not directly relevant, but I know that you can use copy instead of append, and it's a lot clearer to read.)
However, I still can't fully understand why the syntax append(someSlice[:0:0], someSlice...) creates a new backing array. I was also initially confused why the append operation didn't mess with (or truncate) the original slice.
Now for my guesses:
I'm assuming that all of this is necessary and useful because if you just assign newSlice := oldSlice, then changes to the one will be reflected in the other. Often, you won't want that.
Because we don't assign the result of the append to the original slice (as is normal in Go), nothing happens to the original slice. It isn't truncated or changed in any way.
Because the length and capacity of anySlice[:0:0] are both zero, Go must create a new backing array if it's going to assign the elements of anySlice to the result. Is this why a new backing array is created?
What would happen if anySlice... had no elements? A snippet on the Go Playground suggests that if you use this append trick on an empty slice, the copy and the original initially have the same backing array. (Edit: as a commenter explains, I misunderstood this snippet. The snippet shows that the two items are initially the same, but neither has a backing array yet. They both point initially to a generic zero value.) Since the two slices both have a length and capacity of zero, the minute you add anything to one of them, that one gets a new backing array. Therefore, I guess, the effect is still the same. Namely, the two slices cannot affect each other after the copy is made by append.
This other playground snippet suggests that if a slice has more than zero elements, the append copy method leads immediately to a new backing array. In this case, the two resulting slices come apart, so to speak, immediately.
I am probably worrying way too much about this, but I'd love a fuller explanation of why the append(a[:0:0], a...) trick works the way it does.

Because the length and capacity of anySlice[:0:0] are both zero, Go must create a new backing array if it's going to assign the elements of anySlice to the result. Is this why a new backing array is created?
Because capacity is 0, yes.
https://pkg.go.dev/builtin#go1.19.3#append
If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated.
cap=0 is NOT sufficient for non-empty slice, allocating a new array is necessary.

How to avoid freeing objects that are stored in containers with the same reference count

I have been working on some features of a custom programming language written in c. Currently i'm working on a system that does reference counting for objects in the language, which in c are represented as structs with among other things, a reference count.
There also is a feature which can free all currently allocated objects (say before the exit of the program to clean up all memory). Now here lies the problem exactly.
I have been thinking about how to do it best but i'm running into some problems. Let me sketch out the situation a bit:
2 new integers are allocated. both have reference count of 1
1 new list is allocated, also with a reference count of 1
now both integers go in the list, which gives them a reference count of 2
after these actions both integers go out of scope for some reason, so their reference count drops to 1 as they are still in the list.
Now i'm done with these objects so i run the function to delete all tracked objects. However, as you might have noticed both the list and the objects in the list have the same reference count (1). This means there is no way to decide which object to free first.
If i would free the integers before the list, the list will try to decrement the reference count on the integers which were freed before, which will segfault.
If the list would be freed before the integers, it would decrement the reference count of the integers to 0, which automatically frees them too and no further steps need to be taken to free the integers. They aren't tracked anymore.
Currently i have a system that works most of the time but not for the example i give above, where i free the objects based on their reference count. Highest count latest. This obviously only works as long as the integers have higher reference count than the list which is as visible in the example above, not always the case. (It only works assuming the integers didn't drop out of scope so they still have a higher reference count than the list)
Note: i have already found one way which i really don't like: adding a flag to every object indicating it is in a container so cant be freed. I don't like this because it adds some memory overhead to every allocated object, and when there is a circular dependency no object would be freed. Of course a cycle detector could fix this but preferably i'd like to do this with the reference counting only.
Let me give a concrete example of the described steps above:
//this initializes and sets a garbage collector object.
//Basically it's a datastructure which records every allocated object,
//and is able to free them all or in the future
//run some cycle detection on all objects.
//It has to be set before allocating objects
garbagecollector *gc = init_garbagecollector();
set_garbagecollector(gc);
//initialize a tracked object fromthe c integer value 10
myobject * a = myinteger_from_cint(10);
myobject * b = myinteger_from_cint(10);
myobject * somelist = mylist_init();
mylist_append(somelist,a);
mylist_append(somelist,b);
// Simulate the going out of scope of the integers.
// There are no functions yet so i can't actually do it but this
// is a situation which can happen and has happened a couple of times
DECREF(a);
DECREF(b);
//now the program is done. all objects have a refcount of 1
//delete the garbagecollector and with that all tracked objects
//there is no way to prevent the integers being freed before the list
delete_garbagecollector(gc);
what of course should happen is that 100% of the time, the list is freed before the integers are.
What would be a smarter way of freeing all existing objects, in a way such that objects stored in containers aren't freed before the containers they're in?

It depends on your intention with:
There also is a feature which can free all currently allocated objects (say before the exit of the program to clean up all memory).
If the goal is to forcibly deallocate every single object regardless of its ref count, then I would have a separate chunk of code that walks the object graph and frees each object without touching its ref count. The ref count itself is going to end up freed too, so there's little point in updating it.
If the goal is to just tell the system "We don't need the objects anymore" then another option is to simply walk the roots and decrement their ref counts. If there are no other references to them, they'll hit zero. They will then decrement the ref counts of everything they refer to before being deallocated. That in turn percolates through the object graph. If the roots are the only thing holding onto references at the point that you call this, it will effectively free everything.

You should not free anything until the reference count for somelist is zero.

How to prevent buffer overflow

I am being passed an array from a C program that does not include the size of the array; that is, it just passes a pointer to the array. The array is a generic type <Item>. How can I determine the end of the array in order to detect a buffer overflow?
I tried iterating through the array until I received something that wasn't an <Item>. That worked most of the time but sometimes the nonsense at the end Would be of type <Item>. I am using C and calling a function from an external class I had no deal in developing. <Item> is a struct with multiple references to other arrays (sort of like a linked list).
EDIT:
The api stated that the array was intended to be a read-only version. The problem is I cannot read it if I do not know the size. It doesn't appear there is a sentinel value. There is a random comment stating that if the size is needed use sizeOf (array)/sizeOf (Item) which doesn't work. It was developed by a team that no longer works here. The problem is other code already relies on this C code and I cannot change it without fear of ruining other code.

It is not possible to determine the end of an array based on just a pointer to an element of that array.
I tried iterating through the array until I received something that wasn't an <Item>
It's also not possible to determine whether particular memory location contains an object of particular type - or whether it contains any object. Even if you could, how would you determine if the object that you find is really part of the array and not just a separate <Item> object that happens to be there?
A possible solution is to use a sentinel value to represent the end of an array. For example, you could define the interface such that <Item>.member == 0 if and only if that is the last element of the array. This is similar to how null-terminated strings work.

If all you have is a pointer and no size or known "end-of-array" marker (sentinel) in the data, then you have an impossible situation. There is no way in that case to determine the size/end of the passed array.

Updating string array using java stream

I know that for object, we can forEach the collection and update the object as we like but for immutable objects like Strings, how can we update the array with new object without converting it into an array again.
For e.g, I have an array of string. I want to iterate through each string and trim them. I would otherwise have to do something like this:
Arrays.stream(str).map(c -> c.trim()).collect(Collectors.toList())
In the end, I will get a List rather then String[] that I initially gave. Its a whole lot of processing. Is there any way I can do something similar to:
for(int i = 0; i < str.length; i++) {
str[i] = str[i].trim();
}
using java streams?

Streams are not intended for manipulating other data structures, especially not for updating their source. But the Java API consists of more than the Stream API.
As Alexis C. has shown in a comment, you could use Arrays.setAll(arr, i -> arr[i].trim());
There’s even a parallelSetAll that you could use when you have a really large array.
However, it might be easier to use just Arrays.asList(arr).replaceAll(String::trim);.
Keep in mind that the wrapper returned by Arrays.asList allows modifications of the wrapped array through the List interface. Only adding and removing is not supported.

Use toArray :
str = Arrays.stream(str).map(c -> c.trim()).toArray(String[]::new);
The disadvantage here (over your original Java 7 loop) is that a new array is created to store the result.
To update the original array, you can re-write your loop with Streams, though I'm not sure what's the point :
IntStream.range (0, str.length).forEach (i -> {str[i] = str[i].trim();});

It's not that much processing as you might think, the array has a known size and the spliterator from it will be SIZED, thus the resulting collection size will be known before processing and the space for it can be allocated ahead of time, without having to re-size the collection.
It's also always interesting that in the absence of actual tests we almost always assume that this is slow or memory hungry.
of course if you want an array as the result there is a method for that :
.toArray(String[]::new);

Can one use SetWindowLongPtr + GWLP_USERDATA to store data (not pointer)

I know one can use SetWindowLongPtr + GWLP_USERDATA to store a pointer which points to some data.
But could one store the data directly, for example "a handle", "a bool, an "int" or other larger data.
From http://msdn.microsoft.com/zh-tw/library/windows/desktop/ms644898%28v=vs.85%29.aspx, it says:
Sets new extra information that is private to the application, such as handles or pointers.
, so I guess to store a handle is OK. I also used this method to store an RGB value without problem.
But I don't know if this is a good idea to do things like this. And can we store other data which is large (for example, a structure)?
p.s: The motivation of this question is: When I create a dialog window, I want to store data for each of its controls. Of course I can use static variables in the window procedure and pass pointer (to them) to SetWindowLongPtr function. But this is not "perfect" in theory, because when the dialog window is closed, I don't need these data anymore. Of course, in practice, the data I need to use is very small, and I should not care about the usage of memory. But I still like to know if there is a better way.

You only need one pointer to store anything you want. Declare a struct with the data you want to store. Allocate it before the CreateWindowEx() call and pass the pointer as the last argument. You get it back in your window procedure for the WM_CREATE message, CREATESTRUCT.lpCreateParams field. Now call SetWindowsLongPtr to store that pointer.
Anytime you need it back, use GetWindowlongPtr to recover the pointer to the struct. You'll need to cleanup again, use the WM_NCDESTROY message to release the pointer.
Note that this is a standard technique used in C++ class libraries that wrap the winapi. Do consider using one of them instead of spinning this yourself.

The SetWindowLongPtr function can store a piece of data which has the same size as LONG_PTR (most likely 32bit or 64bit). If your data can be stored in that size, you're fine. I.e. a bool would be fine, so would most handles (since handles tend to be pointers, too).
A typical RGB value would work as well since it's stored as three bytes (one byte per color component) or four bytes (an extra byte for the alpha channel).
If you need more space than this, you should allocate a structure somewhere else and store a pointer to that structure.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight