Pushing the limit of a content node - vespa

I have a question regarding how far can we push the limit of a content node.
My setup is one machine being stateless, the other being a content node. I noticed that when pushing a lot of documents (around 50k characters), the node will fail around 80 Millions docs, which is about 1 Terabyte of data.
The content node has 4Tb of storage, for 115Go of memory. I do not save anything as attribute, only summary and index.
The thing is that I can't properly manage to identify what is the cause of the failure of the content node, for example, which metrics to look at to identify the problem.
I thoroughly read the sizing documentation, but I did not found my answer.
Maybe do you have some hints on where to look ?

Did you check the vespa.log file on the content node? You might get some hints there.
Also, depending on your system configuration, you might be running out of file descriptors on the content node.

Could you please define "the node will fail"? How does it fail? If you manage to run out of memory the OOM killer might be coming for your proton-bin process (https://linux-mm.org/OOM_Killer). What is the resource utilization prior to the failure?

Related

memory increment of the application in the browser while running it for several hour

I am stuck with the memory increment of my application and as it is single page I can't even reload it. After running my application for around 5-6 hour memory size is reaching around 600mb from initial loading i.e 120mb and we did some fixed for this like making the ref to null in the componentWillunMount() and memory has reduced to 400 mb after the same testing for same time but still I can see there are lot many detached element, definitely it caused by some other parts of the code, in the snapshot file which we can take from chrome inbuilt functionality. So is there any way that I can remove all the detached-element while leaving the certain page or why don't browser removed this from memory as the detached-element is retaining some size of the memory ?
DOM node can only be garbage collected when there are no references to it from either the page's DOM tree or JavaScript code.
I suggest you take a look at your code and see if there are functions running not when you want it. If you use react or similar frameworks, you have to be careful with their lifecycle (important!).
Also here https://developers.google.com/web/tools/chrome-devtools/memory-problems/
There are many useful information, such as
- Investigate memory allocation by function
- Spot frequent garbage collections
So is there any way that I can remove all the detached-element while leaving the certain page or why don't browser removed this from memory
I cant offer any more accurate assumption or suggestion if what we have is I use javascript information. Countless consequences from countless combination of libraries, stacks and techniques make this impossible to guess.

Drupal 7- blocks assigned to custom regions are randomly unassigned

I set up a number of custom regions for my home page, which uses a custom template, and was able to position them and assign blocks to them as I wanted.
However, at random, any blocks I assign to these regions will disappear. When I look in the _block table of the database, the value set to the region field is -1 (ie., not assigned), and I have to go in and reassign the blocks for these regions.
This happens whether I cache my blocks or not.
It does not happen when I run a cron or clear my cache. As this only happens to my custom fields, I feel like there is something else I should be doing in addition to setting up my regions in the theme's and subtheme's .info files, that would make block assignments to these regions as persistent as they would be to a standard region.
A number of people have reported this over at Drupal.org, but no one has been able to come up with an answer. I've researched this for the last several weeks, and looked through any scripts that might store or rewrite my database to those values.
It always happens overnight, but not at any reliable interval: I can go a week without it happening, or I can go a day.
It does not happen on the offline XAMPP stack I use as my development server. There may be something going on with my web host, but as this is a very specific problem, I am thinking it has to do with how my custom regions are set up.
I used the directions at this blog: http://megadrupal.com/blog/add-new-regions-in-drupal-7-themes.
I have posted a question about it there, but have not received a reply yet.

Are there any good software development "patterns" for memory intensive .net programs?

Basically I'm working on a program that processes a lot of large video and image files, and I'm struggling with the memory management side of it because I've never dealt with anything quite like this before.
For instance, it stores all these images in a database, and loads a list of videos, and then you can switch between the videos and view images from the video. Right now, it's keeping all of those images in memory all the time, which is eating up a lot of space. I know I can lazy load the images, but once you've switched back and forth you get all of them stuck in memory.
I want to take advantage of the WPF databinding functionality and MVVM as much as possible, but if I need to look at a different architecture I will.
I'm just looking for general advice, tips, links to articles, or anything that could help.
One of the things you could look at is data virtualization, which is not provided in WPF by default (they provide UI virtualization instead). Data virtualization can say 'load and bind the data for an item / range of items while visible, then unload when not visible'.
Here's a great article that describes a concrete implementation that you may be able to use as-is or adapt:
http://www.codeproject.com/KB/WPF/WpfDataVirtualization.aspx
It sounds like the main problem you're having is not so much the performance-intensiveness of the application (which things like fixed-size buffers and static allocation will help with) but its overall memory footprint. The way to control that is with virtualization.
Lazy loading gets you halfway there: you don't actually create the object until something needs it. That's fine, but the longer the user works with the application and the more objects he visits in the UI, the more objects get created, and eventually the application runs out of memory.
So you want to throw away objects that the user doesn't need anymore. Figuring out which objects the user doesn't need can be a hard problem, but it can also be as easy as assuming that the user doesn't need the object that he used least recently. You use a least-recently-used (LRU) cache to do this.
This is totally consistent with the MVVM pattern. In your view class, you make your property getter for the object use this pseudocode:
if object hasn't been loaded
load object
add object to the LRU cache (whether you loaded it or not)
return object
The LRU cache I wrote keeps a simple queue of the objects it contains. When you add an object to the cache, if it's not already in the queue it gets added to the back, and if it is already in the queue it gets moved to the back.
If the queue's at its capacity when you add an object, it pops off whatever is at the front of the queue (which is the one that was used least recently) and raises the DiscardingOldestItem event.
This event is the object's chance to tell anything that holds a reference to it (i.e. the view object that it's a property of) that it needs to be discarded (probably by raising an event of its own). The view object's event handler should first raise the PropertyChanged event. If the property getter gets called when it does this, there's a binding somewhere that's still looking at the property, so it shouldn't be discarded yet. (Also, since the getter was called, the object just got moved to the back of the queue.) Otherwise, it can be thrown away.
(Note that if you have more objects visible in the UI than the cache can hold, this little dance becomes an infinite loop and you'll get a stack overflow.)
A more sophisticated approach would have the LRU cache start discarding old items when the application started running low on memory (it uses a fixed capacity right now). That's a straightforward change, but if you make that change, the scenario described in the previous paragraph is something you need to give more thought to; one very large object could result in the whole UI going kablooey.
It seems that to increase raw performance you would actually want to avoid patterns. They have their uses, don't get me wrong, but if you're trying to blast video at the highest performance possible the last thing you need to do it introduce abstraction layers that are designed to write higher quality code, not increase application performance.
this article on informIt has a lot of good info on the subject although it is more c and c++.
Static Allocation Pattern: Allocates memory up front
It suggests,
Pool Allocation Pattern: Preallocates pools of needed objects
Fixed Sized Buffer Pattern: Allocates memory in same-sized blocks
Smart Pointer Pattern: Makes pointers reliable
Garbage Collection Pattern: Automatically reclaims lost memory
Garbage Compactor Pattern: Automatically defragments and reclaims memory
"I know I can lazy load the images,
but once you've switched back and
forth you get all of them stuck in
memory."
This is not true to my understanding. The images can get garbage collected just like anything else, by removing all references. Are you sure you dont have a reference to them somewhere? Try a memory profiler like memprofiler or ANTS to see whats happening.
To those who have found this question looking for general patterns (not WPF) to reduce memory, the famous one (which I have never seen used!) is The Flyweight pattern

HCI: make the user wait through everything up front, or amortize?

I'm writing a silverlight app that queries a web service to populate a tree control. Each element will have at least 2 levels of children, so something like this:
a
+-b
+-c
d
+-g
+-h
e
+-i
+-j
f
+-k
+-l
The web service API is such that I can only get one level of child nodes at a time, so the first trip, I can get a,d,e,f. To get b,g,i,k, I have to make 4 trips. Similarly, I have to make 4 more trips to get c,h,j,l. (The service does actually allow me to get all the nodes in one trip, but it doesn't give me parent-child relationships along with it :-()
My question is this: should I make the user wait for a while up front while I get all the nodes for the tree view, or should I just get the top few nodes, and get the other nodes on-demand, or in a background task? Also, the nodes can change asynchronously, so if I get all the nodes up front, I'll need a "refresh" button for the treeview, and if I do it on demand, I'll have to have a caching strategy.
Which is best for the user?
A compromise where you load the first level up front and then load the remaining items in the background overridden by on-demand as required. If you load the nodes breadth first (e.g. a,d,e,f then b,g,i,k) rather than depth first (e.g. a,d,e,f followed by b,c) you can redirect your loading to be focused on the most recently expanded node.
Personally, as a user, I would prefer all the data to be loaded up front so that once the application finishes loading I can trust that I won't have to wait anymore (or at least very little)
But, I suppose it depends on several traits of your application / data:
How dynamic is the data? Does it update more often then the rate at which the user explores the nodes? If it does, then you will have to read the data as the user explores it, otherwise you can probably get away with only updating it occasionally and checking for the freshest data before performing important operations.
How much of the data will the user explore during normal use? If they are constantly exploring throughout the entire tree, then having the entire tree loaded is important. On the other hand, if most users will usually only expand a small portion of the tree, then maybe loading on demand is better so you don't waste thier time loading data they will never see anyway.
How much affect with this have on performance? Does it really take a long time to load all the data? If the data is not too much, maybe the whole thing can be loaded in a matter of seconds, in which case the amount of work to implement the optimization will not be significant to the end user and in turn will not have a good return on investment.
Most likely you don't have clear cut answers to these questions, but they're probably good to consider when you're attacking this interesting problem.
Short answer is to make the user wait for as little as possible. They will curse your name if they have to wait 10-20 seconds on application load, but not notice 0.1-0.2 seconds for a tree node to expand.
I have an app in production with a similar structure. I cannot load up-front because it'd be effectively loading the entire database. Here's my strategy:
The tree control starts with 1 level expanded below the root.
Each unexpanded node has a dummy child node in order to get the [+] expansion icon to show
When a node is expanded, it fires an event which is trapped by the app. If the only child node is the dummy one, the dummy is deleted and the children are loaded from the database.
Changes in the data are not reflected automatically by visible nodes, however the context menu for the tree has a Refresh item that can be used to refresh a node.
I have considered showing updates asynchronously, but have tended to avoid it because large amounts of data can be shown in the tree and I'm wary of DB load if I'm checking them all for changes.
The app is WinForms, written in C# using .NET 2.0.

db4o concerns

I'm interested in using db4o as my persistence mechanism in my Desktop application but I'm concerned about a couple things.
1st concern: Accidentally clipping very complex object graphs.
Say I have a tree with a height of 10 and I fetch the root, how does it handle me storing the root object again?
From my understanding, it doesn't fetch the entire tree it fetches the first 5 referenced layers.
So.. If I make a trivial change to the root and then store it, will it clip away the nodes further down the tree, in essence deleting them.
If not.. how does it handle this?
2nd concern: Extracting subgraphs in a larger object graph
Using my tree example from above... If the database contains 1 massive tree can I query for a single node within it? Since .store was called only once, does my database think it contains only 1 "record"?
Thank you.
You have to be very careful, because two things can happen: you can pull whole db into memory, or just partial graph (rest of objects will be null).
In db4o there's notion of Activator and Update depth, which can be configured upon dbv40 configuration, or when objects are fetched. Its the way you tell db40 how deep you want him to go when fetching referenced objects. Check db4o web site, there's documentation about it:
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Activation
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Update_Depth
DB4O's Transparent Activation should resolve most of the fears you've expressed here.

Resources