Set per_process_gpu_memory_fraction in tensorflow.js tfjs-node-gpu - tensorflow.js

is it possible to set the max allocate GPU memory for tfjs-node-gpu ? By default it take 100%, and I haven't view any information on the API doc.
thanks

For now it is not possible to set a memory limit on the GPU; node does not yet offer a control over the gpu used and neither tfjs-node-gpu in itself.
However, you can use the memory footprint to check manually the size allocated with tf.memory

Related

Flink rocksdb per slot memory configuration issue

I have 32GB of managed memory and 8 task slots.
As state.backend.rocksdb.memory.managed is set to true, each rockdb in each task slot uses 4GB of memory.
Some of my tasks do not require a rocksdb backend so I want to increase this 4GB to 6GB by setting state.backend.rocksdb.memory.fixed-per-slot: 6000m
The problem when I set state.backend.rocksdb.memory.fixed-per-slot: 6000m is on Flink UI, in task manager page, I cant see the allocated managed memory anymore.
As you can see when state.backend.rocksdb.memory.fixed-per-slot is not set and state.backend.rocksdb.memory.managed: true, 4GB usage appears on managed memory for each running task which uses rocksdb backend.
But after setting state.backend.rocksdb.memory.fixed-per-slot: 6000m , Managed Memory always shows zero!
1- How can I watch the managed memory allocation after setting state.backend.rocksdb.memory.fixed-per-slot: 6000m
2- Should state.backend.rocksdb.memory.managed be set to true even I set fixed-per-slot.
Another reply we got from the hive:
"Fixed-per-slot overrides managed memory setting, so managed zero is expected (it's either fixed-per-slot or managed). As Yuval wrote you can see the memory instances by checking the LRU caches.
One more thing to check is write_buffer_manager pointer in the RocksDB log file. It will be different for each operator if neither fixed-per-slot or managed memory is used and shared between instances otherwise"
Let us know if this is useful
Shared your question with the Speedb hive on Discord and here's the "honey" we got for you:
We don't have much experience with Flink setups regarding how to configure the memory limits and their different parameters. However, RocksDB uses a shared Block Cache to control the memory limits of your state. so for question 1 - you could grep "block_cache:" and "capacity :" from all the LOG files of all the DBs (operators). the total memory limit allocated to RocksDB through the block cache would be the sum of the capacity for all the unique pointers. the same block cache (memory) can be shared across DBs.
do note that RocksDB might use more memory than the block cache capacity.
Hope this help. If you have follow-up questions or want more help with this, send us a message on Discord.

Looking for a custom memory allocator which allocates from within a large pre-allocated block of memory

I have a memory-heavy application which is supposed to run with low latency and with constant speed, but in practice it has poor performance during the first few seconds of startup. This appears to be because the initial memory accesses triggers page faults which have significant performance implications.
I would like to try preallocating a single large block of memory, paging it all in (via mlock() or just by touching each byte), and then using a custom malloc()/free() implementation to ensure that all further allocations are done from within this block.
I am aware of numerous custom memory allocators (TCMalloc, Hoard, jemalloc, etc) but it is not clear to me whether they can be backed by user-provided memory, or whether they always perform their internal allocations from the OS. Does anyone have any insight or recommendations here?
To be clear, I am not looking for a memory pooling system (which would be for reusing small objects). The custom implementation of malloc()/free() should be able to perform any size allocation while limiting fragmentation of its backing store and following other best practices.
Edit based on comments: I do not expect to make the system faster - I just want to move the slow part (allocation, initial page faults) to the start of the process, and then do the real computation work once the system is 'primed'.
Thanks!
A bit late to the party.
dlmalloc is one choice that can be backed by pre-allocated memory. You can find it here. You may just need to add some extra definitions in the beginning to force it to use your pre-allocated memory rather than call the system mmap, you can refer to the nice documentation at the beginning of the file.

Can I force memory usage of vespa-proton-bin?

I found vespa-proton-bin already used 68GB memory of my system. I've tried to limit memory on docker level and found that it will randomly kill process, which can be a huge problem.
Is there any setting to force it just using certain amount of memory on vespa-proton-bin in Vespa setting? Thanks.
Great question!
There is no explicit way to tell Vespa to only use x GB of memory but default Vespa will block feeding if 80% of the memory is in use already, see https://docs.vespa.ai/documentation/writing-to-vespa.html#feed-block. Using docker limits is only going to cause random OOM kills which is not what you want.
I'm guessing that you have a lot of attribute fields which are in-memory structures , see https://docs.vespa.ai/documentation/performance/attribute-memory-usage.html.

set stack size for threads using setrlimit

I'm using a library which creates a pthread using the default stack size of 8MB. Is it possible to programatically reduce the stack size of the thread the library creates? I tried using setrlimit(RLIMIT_STACK...) inside my main() function, but that doesn't seem to have any effect. ulimit -s seems to do the job, but I don't want to set the stack size before my program is executed.
Any ideas what I can do? Thanks
Update 1:
Seems like I'm going to give up on being able to set the stack size using setrlimit(RLIMIT_STACK,...). I checked the resident memory and found it's a lot less than the virtual memory. That's a good enough reason for me to give up on trying to limit the stack size.
I think you are out of luck. If the library you are using does not provide a way to set the stack limit, then you can't change it after the thread has been created. setrlimit and shell limits effects the main thread's stack.
Threads are created within the processes memory space so their stacks are allocated when the threads are created. On Unix I believe the stack will be mapped to RAM on demand, so you may not actually use 8 Megs of RAM if you don't need it (virtual vs resident memory).
There are a couple aspects to answering this question.
First, as stated in the comments, pthread_attr_setstacksize is The Right Way to do this. If the library calling pthread_create doesn't have a way to let you do this, fixing the library would be the ideal solution. If the thread is purely internal to the library (not calling code from the calling application) it really should set its own preference for the stack size based on something like PTHREAD_STACK_MIN + ITS_OWN_NEEDS. If it's calling back to your code, it should let you request how much stack space you need.
Second, as an implementation detail, glibc uses the stack limit from setrlimit/ulimit to derive the stack size for threads created by pthread_create. You can perhaps influence the size this way, but it's not portable, and as you've found, not reliable even there (it's not working when you call setrlimit from within the process itself). It's possible that glibc only probes the limit once when the relevant code is first initialized, so I would try moving the setrlimit call as early as possible in main to see if this helps.
Finally, the stack size for threads may not even be relevant to your application. Even if the stack size is 8MB, only the pages which have actually been modified (probably 4k or at most 8k unless you have big arrays on the stack) are actually using physical memory. The rest is just tying up virtual address space (of which you always have at least 2-3 GB) and possibly commit charge. By default, Linux enables overcommit, so commit charge will not be strictly enforced, and therefore the fact that glibc is requesting too much may not even matter. You could make the overcommit checking even less strict by writing a 1 to /proc/sys/vm/overcommit_memory, but this will cause you to loose information about when you're "running out of memory" and make your program crash instead. On such a constrained system you may prefer even stricter overcommit accounting, but then you have to fix the thread stack size problem...

How to unload a ByteArray using Actionscript 3?

How do I forcefully unload a ByteArray from memory using ActionScript 3?
I have tried the following:
// First non-working solution
byteArray.length = 0;
byteArray = new ByteArray();
// Second non-working solution
for ( var i:int=0; i < byteArray.length; i++ ) {
byteArray[i] = null;
}
I don't think you have anything to worry about. If System.totalMemory goes down you can relax. It may very well be the OS that doesn't reclaim the newly freed memory (in anticipation of the next time Flash Player will ask for more memory).
Try doing something else that is very memory intensive and I'm sure that you'll notice that the memory allocated to Flash Player will decrease and be used for the other process instead.
As I've understood it, memory management in modern OS's isn't intuitive from the perspective of looking at the amounts allocated to each process, or even the total amount allocated.
When I've used my Mac for 5 minutes 95% of my 3 GB RAM is used, and it will stay that way, it never goes down. That's just the way the OS handles memory.
As long as it's not needed elsewhere even processes that have quit still have memory assigned to them (this can make them launch quicker the next time, for example).
(I'm not positive about this, but...)
AS3 uses a non-deterministic garbage collection which means that dereferenced memory will be freed up whenever the runtime feels like it (typically not unless there's a reason to run, since it's an expensive operation to execute). This is the same approach used by most modern garbage collecting languages (like C# and Java as well).
Assuming there are no other references to the memory pointed to by byteArray or the items within the array itself, the memory will be freed at some point after you exit the scope where byteArray is declared.
You can force a garbage collection, though you really shouldn't. If you do, do it only for testing. If you do it in production, you'll hurt performance much more than help it.
To force a GC, try (yes, twice):
flash.system.System.gc();
flash.system.System.gc();
You can read more here.
Have a look at this article
http://www.gskinner.com/blog/archives/2006/06/as3_resource_ma.html
IANA actionscript programmer, however the feeling I'm getting is that, because the garbage collector might not run when you want it to.
Hence
http://www.craftymind.com/2008/04/09/kick-starting-the-garbage-collector-in-actionscript-3-with-air/
So I'd recommend trying out their collection code and see if it helps
private var gcCount:int;
private function startGCCycle():void{
gcCount = 0;
addEventListener(Event.ENTER_FRAME, doGC);
}
private function doGC(evt:Event):void{
flash.system.System.gc();
if(++gcCount > 1){
removeEventListener(Event.ENTER_FRAME, doGC);
setTimeout(lastGC, 40);
}
}
private function lastGC():void{
flash.system.System.gc();
}
Unfortunately when it comes to memory management in Flash/actionscript there isn't a whole lot you can do. ActionScript was designed to be easy to use (so they didn't want people to have to worry about memory management)
The following is a workaround, instead of creating a ByteArray variable try this.
var byteObject:Object = new Object();
byteObject.byteArray = new ByteArray();
...
//Then when you are finished delete the variable from byteObject
delete byteObject.byteArray;
Where byteArray is a dynamic property of byteObject, you can free the memory that was allocated for it.
I believe you have answered your own question.
System.totalMemory gives you the total amount of memory being "used", not allocated. It is accurate that your application may only be using 20 MB, but it has 5 MB that is free for future allocations.
I'm not sure whether the Adobe docs would shed light on the way that it manages memory.
So, if I load say 20MB from MySQL, in the Task Manager the RAM for the application goes up by about 25MB. Then when I close the connection and try to dispose the ByteArray, the RAM never frees up. However, if I use System.totalMemory, flash player shows that the memory is being released, which is not the case.
Is the flash player doing something like Java and reserving heap space and not releasing it until the app quits?
Well yes and no, as you might have read from countless blog posts that the GC in AVM2 is optimistic and will work its own mysterious ways. So it does work a bit like Java and tries to reserve heap space. However if you let it long enough and start doing other operations that are consuming some significant memory, it will free that previous space. You can see this using the profiler overnight with some tests running on top of your app.
So, if I load say 20MB from MySQL, in the Task Manager the RAM for the application goes up by about 25MB. Then when I close the connection and try to dispose the ByteArray, the RAM never frees up. However, if I use System.totalMemory, flash player shows that the memory is being released, which is not the case.
The player is "releasing" the memory. If you minimize the window and restore it you should see that the memeory is now much closer to what System.totalMemory shows.
You might also be interested in using FlexBuilder's profiling tools which can show you if you really have memory leaks.
Use bytearray.clear()
As per the Language Reference
this
Clears the contents of the byte array and resets the length and position properties to 0. Calling this method explicitly frees up the memory used by the ByteArray instance.

Resources