I need to store a graph for the map of a game inside a game server written in C.
The graph has ~200 nodes and 3 kinds of edges that can connect two nodes (these three kind can also overlap: a node can be connected by 2 edges of two different types for example). The maximum degree of a node is something like 5-6 nodes.
What I would like to have is to have this static structure implemented in an efficient way to allow me to do simple operations like
is n1 connected to n2? (with all kinds of edges in case of affermative response)
what is n1 connected to? (with all kinds of edges or a specific one)
but in a multi-threaded environment since there will be many instances of the game that relies on the same static graph.
Since the graph can't be modified and the structure is well known I'm sure there are some tricks to implement it in a cool fashion to use least resources possible.
I would like to avoid using STL or Boost for now.. do you have any clues on a data structure that could suit well?
(it's not a premature optimization, the fact is that it will run on a vps and I don't have many ram neither cpu power so I need to keep it tight)
EDIT: just because I forgot (and thanks to make me realize it) the graph is undirected so every edge is symmetric..
Thanks in advance
Many answers are possible. This one relies on the fact that you have relatively few nodes. The advantage of this approach is probably unbeatable performance.
The idea is to represent your graph as a 200x200 matrix of bytes, each entry representing an edge. The byte gives you 256 different possible values, where a 0 will obviously mean "no connection" and any non-zero combination of bits can represent up to 8 different edge types.
Let the "row" of this matrix be the starting node and the "column" be the destination. Initialize the structure such that for every edge connecting one node with another, there's a value at the intersection of starting / ending. That value can be a combination of bits representing edge types.
To find out whether one node connects to another, simply query the byte at the intersection of one node and the other: If there's a nonzero value there, then there is a connection, and the value will tell you what kind.
For 200 nodes, this data structure will eat up 40 KB, which is pretty moderate. It won't scale too well once you get beyond, say, 1000 nodes.
As long as nothing (apart from one-time initialization) ever writes to this structure, it will be naturally thread safe, as its state never changes.
Since degrees are limited, you can get very good performance by just representing a node by a struct with arrays of pointers to other nodes (one array for each edge type).
Regardless of the data structure you pick, you can avoid worrying about multithreading if your graph is read-only (OK for multiple thread to access it without synchronization).
Related
I was wondering why should we even use stack, since an array or linked list can do everything a stack can do? Why do we bother name it a "data structure" separately? In the real world, just use an array would be sufficient enough to solve the problem; why would one bother to implement a stack which will restrict himself to only be able to push and pop from the top of the collection?
I think it is better to use the term data type to refer to things whose behavior is defined by some interface, or algebra, or collection of operations. Things like
stacks
queues
priority queues
dequeues
lists
sets
hierarchies
maps (dictionaries)
are types because for each, we just list their behaviors. Stacks are stacks because they can only be pushed and popped. Queues are queues because ... (you get the picture).
On the other hand, a data structure is an implementation of a data type defined by the way it arranges its components. Examples include
array (constant time access by index)
linked list
bitmap
BSTs
Red-black tree
(a,b)-trees (e.g. 2-3 trees)
Skip lists
hash tables (many variants)
adjacency matrices
Bloom filters
Judy arrays
Tries
A lot of people do confuse the terms data structures and data types, but it's best not to be too pedantic. Stacks are data types, not data structures, but again, why be too pedantic.
To answer your specific question, we use stacks as data types in situations where we want to ensure or data is modified only by pushing and popping and we never violate this access pattern (even by accident).
Under the hoodm we may use a linked list as an implementation of our stack. But most programming languages will provide a way to restrict the interface to allow our code to readably, and securely, use our data in a LIFO fashion.
TL;DR: Readability. Security.
Stacks can and usually are implemented using an array or a linked list as the underlying structure. There are many benefits to using a Stack.
One of the main benefits is the LIFO ordering that is maintained by the push/pop operations. This allows for the stack elements to have a natural order and be removed from the stack in the reverse order to the order of their addition. Such data structure can be very useful in many applications where using just an array or a linked list would actually be less efficient and/or more complicated. Some of those applications are:
Balancing of symbols (such as parenthesis)
Infix-to-postfix conversion
Evaluation of postfix expression
Implementing function calls (including recursion)
Finding of spans (finding spans in stock markets)
Page-visited history in a Web browsed (Back buttons)
Undo sequence in a text editor
Matching Tags in HTML and XML
Here are some more stack applications
And some more...
The two underlying implementations of an array or a linked list give the stack different capabilities and features. Here is a simple comparison:
(Dynamic) Array Implementation:
1. All operations take constant time besides push().
2. Expensive doubling operation every once in a while.
3. Any sequence of n operations (starting from empty stack) -> "amortized" bound takes time proportional to n.
Linked List Implementation:
1. Grows and shrinks easily.
2. Every operation takes constant time O(1).
3. Every operation uses extra space and time to deal with references.
I am trying to develop a network resource manager component in C which keeps track of various network elements over TCP/UDP sockets. For this, I use three values :
Hardware Location Number
Service Group Number
Node Number
The rule is that no two elements on a network may have the same set of these three numbers. Thus, each location's identity will be unique on the network. This information needs to be saved in the program (non-persistently) in a way so that given any of the parameters (could be just a single number, or a combination of any two, or all three) the program returns the eligible candidates by performing a quick search.
The addition and deletion should also be efficient, but given that there will be few insertions or deletions after the initial transient phase if they are a bit slower than search, it should be OK. Using trees is one option, but the answer of 'Which one to use?' still eludes me (Not that I know of many, but I look forward to learning newer ones if they serve my purpose).
To do this, I could have three different trees maintained separately with similar nodes pointing to a same structure in memory, but I feel that is inefficient and not compact. I am looking for a unified data set which can handle these variations like multiple keys.
Or I could have a single AVL tree with multiple keys (if that is allowed).
The number of elements in the network is dynamic, so using a 3D array is out of option.
A friend also suggested hashing, but I am not too sure.
Please help.
Hashing seems like a silly choice for this. Perhaps the most significant reason is that you seem interested in approximate lookups. Hashing your values will likely mean iterating through the entire collection to find a group of nodes that have a common prefix, or a similar prefix.
PATRICIA is commonly used in routing tables, and makes itself quite amenable to searching for items that have similar keys. Note that I have found much misleading information about PATRICIA tries, which I've written about here. I found this resource to be particularly helpful.
Similarly to an AVL tree, you'll need to combine the three keys to form one (without hashing, preferably).
unsigned int key[3] = { hardware_location_number, service_group_number, node_number };
/* ^------- Use something like this as your key */
I am working on a simulation where I need to be able to handle thousands potentialy millions of objects updating every loop.
All of the objects need to have their logic function called (AI).
But depending on the location of the object determines how detailed the logic will be. For example:
[working with 100 objects to keep it simple]
All objects have a location (x,y)
20 objects are 500 points away
from a 'point of interest' location.
50 objects are 500 points
from the 20 objects (1000 points away).
30 objects are within 100
points from the point of interest.
Now say this was a detailed city simulation with the objects being virtual citizens.
At 6pm it's time for everyone to go home from their jobs and sleep.
So we iterate through all citizens, but I'm wanting them to do different things.
The furtherest away objects (50) Go home from their job and sleep
until morning.
The closer objects (20) Go home from their job, have a
bite to eat then sleep until morning.
The closest objects (30) Go
home from their job, have a bite to eat, brush teeth then sleep until
morning.
As you can see the closer they are to the point of interest the more detailed the logic becomes.
I am trying to work out what the best and most performance efficient way to iterate through all objects would be.
This would be relativly easy with a hand full of objects but as this needs to handle at lest 500,000 objects efficiently, I need some advice.
Also I'm not sure if I should iterate through all objects every loop or maybe it would be better to iterate through the closest objects every loop but only itereate through further away objects every 10 loops?
With the additional requirement of needing the objects to interact between other objects close to them, I have been thinking the best way to do this might be to organise them in a quadtree but I'm not sure. It seems as though quad trees are more for static content, but the objects i'm dealing with, as mentioned have a location and are required to move to other locations.
Am I going down the right track of thinking? or is there a 'better' way?
I am also working in c++ if anyone thinks its relevant.
Any advise would be greatly appreciated.
NOTE:
The point of interest changes regularly, think of it as a camera
view.
Objects are created and destroyed dynamically
If you want to quickly select objects in certain radius from particular point, then quad-tree or just simple square grid will help.
If your problem is how to store millions of objects to make iteration through them efficient, then you probably could use column based technique, when instead of having 1 million objects each having 5 fields, you have 5 arrays of 1 million elements each. In this case each object is just an index in range 0 .. 999999. So, for example, you want to store 1 million object of the following structure:
struct resident
{
int x;
int y;
int flags;
int age;
int health; // This is computer game, right?
}
Then, instead of declaring resident residents [1000000] you declare 5 arrays:
int resident_x [1000000];
int resident_y [1000000];
int resident_flags [1000000];
int resident_age [1000000];
int resident_health [1000000];
And then, instead of, say, residents [n].x you use resident_x [n]. Such way to store objects may be faster when you need to iterate through all objects of the same type and do something with couple of fields in each object (with the same set of fields in each object).
You need to break the problem down into "classes", just like in the real world. Each person's class is computed from the distance. So lower class people are far away and upper class are close. Or more correctly "far class", nearish class and "here class" or whatever you want to name them.
1) Make an array with one slot for each class. This slot will hold a "linked list" of each person in that class. When a persons class changers(social climbers), then it is very rapid to move the object to another list.
2) So put everybody into the proper classes and iterate only the classes close to you. In a proper scenario there are objects which are to far away to care about so you can put those back to disk and only reload when you get nearer.
There's a few questions embedded in there:
-How to deal with large quantities of objects? If there is a constant number of fixed objects, you may be able to simply create an array of them, as long as you have sufficient memory. If you need to dynamically create and destroy them, you put yourself at risk for memory leaks without careful handling of destroyed objects. At a certain point, you may ask yourself whether it is better to use another application, such as a database, to store your objects, and perform just the logic in your C++ code. Databases will provide additional functionality that I will highlight.
-How to find objects in a given distance from others. This is a classic problem for geographic information systems (GIS); it sounds like you are trying to operate a simple GIS to store your objects and your attributes, so it is applicable. It takes computation power to test SQRT((X-x)^2+(Y-y)^2), the distance formula, on every point. Instead, it is common to use a 'windowing function' to extract a square containing all the points you want, and then search within this to find points that lie specifically in a given radius. Some databases are optimized to perform a variety of GIS functions, including returning points within a given radius, or returning points within some other geometry like a polygon. Otherwise you'll have to program this functionality yourself.
-Tree storage of objects. This can improve speed, but you will hit a tradeoff if the objects are constantly moving around, wherein the tree has to be restructured often. It all depends on how often things move versus how often you want to do calculations on them.
-AI code. If you're trying to do AI on millions of objects, this may be your biggest use of performance, instead of the methodology used to store and search the objects. You're right in that simpler code for points farther away will increase performance, as will executing the logic less often for far away points. This is sometimes handled using Monte Carlo analysis, where the logic will be performed on a random subset of points during any given iteration, and you could have the probability of execution decrease as distance from the point of interest increases.
I would consider using a Linear Quadtree with Morton Encoding / Z-Order indexing. You can further optimize this structure by using a Bit Array to represent nodes that contain data and very quickly perform calculations.
I've done this extremely efficiently in the browser using Javascript and I can traverse through 67 million nodes in sub-seconds. Once I've narrowed it down to the region of interest, I look up the data in a different structure. All of it still in milliseconds. I'm using this for spatial vector animation.
I have seen code that creates an octree, adds and remove data from them, but how does one actually build the octree? Is there 3d voxel software that will save to an array of some sort that can then be converted to an octree? or can you save directly to an octree?
Depends on your implementation -
If you are using oct-trees to subdivide space then typically you'll throw a bunch of V3s at it, and once a node has more than a certain amount of points in it, you subdivide, and redistribute them.
If you are looking for a way to store minecraft style voxels, then you'll subdivide until you reach 1:1 with your voxel size, and store your data in the leaf nodes.
Where the data comes from is up to you - Octrees are a way of storing, manipulating and searching for data, not a file-format as such.
Each node in an octree has one point. These nodes are broken up into (you guessed it) eight children nodes. These nodes in turn contain each, one single point.
In general, you don't add all of the your vertices to an octree unless you are doing some ungodly collision detection where every single vertex counts... Not that you can't make it fast then, but it's still slower than the approximation given by a smaller number of nodes. (This is true for nearly everything, approximation is faster).
Again, if you are doing rendering at a high quality octrees should probably have as many nodes as you have points.
Now on to the answer:
Create a root node with the bounding box that will enclose it from the center.
Insert each point. This should subdivide the octree in the relevant directions.
As you insert these points, the data will move further down into leaf nodes that more closely encapsulate your model.
Also, as you subdivide the bounding box is halved for each subsequent node down.
If you really want to save it, you can save the vertices (numbered), and then have your program write down the various connections between nodes, and vertices onto a disk, where they will be able to be loaded in roughly the same time it takes to build the octree from scratch in the first place.
Anyway, I hope I answered your question.
This program I'm doing is about a social network, which means there are users and their profiles. The profiles structure is UserProfile.
Now, there are various possible Graph implementations and I don't think I'm using the best one. I have a Graph structure and inside, there's a pointer to a linked list of type Vertex. Each Vertex element has a value, a pointer to the next Vertex and a pointer to a linked list of type Edge. Each Edge element has a value (so I can define weights and whatever it's needed), a pointer to the next Edge and a pointer to the Vertex owner.
I have a 2 sample files with data to process (in CSV style) and insert into the Graph. The first one is the user data (one user per line); the second one is the user relations (for the graph). The first file is quickly inserted into the graph cause I always insert at the head and there's like ~18000 users. The second file takes ages but I still insert the edges at the head. The file has about ~520000 lines of user relations and takes between 13-15mins to insert into the Graph. I made a quick test and reading the data is pretty quickly, instantaneously really. The problem is in the insertion.
This problem exists because I have a Graph implemented with linked lists for the vertices. Every time I need to insert a relation, I need to lookup for 2 vertices, so I can link them together. This is the problem... Doing this for ~520000 relations, takes a while.
How should I solve this?
Solution 1) Some people recommended me to implement the Graph (the vertices part) as an array instead of a linked list. This way I have direct access to every vertex and the insertion is probably going to drop considerably. But, I don't like the idea of allocating an array with [18000] elements. How practically is this? My sample data has ~18000, but what if I need much less or much more? The linked list approach has that flexibility, I can have whatever size I want as long as there's memory for it. But the array doesn't, how am I going to handle such situation? What are your suggestions?
Using linked lists is good for space complexity but bad for time complexity. And using an array is good for time complexity but bad for space complexity.
Any thoughts about this solution?
Solution 2) This project also demands that I have some sort of data structures that allows quick lookup based on a name index and an ID index. For this I decided to use Hash Tables. My tables are implemented with separate chaining as collision resolution and when a load factor of 0.70 is reach, I normally recreate the table. I base the next table size on this Link.
Currently, both Hash Tables hold a pointer to the UserProfile instead of duplication the user profile itself. That would be stupid, changing data would require 3 changes and it's really dumb to do it that way. So I just save the pointer to the UserProfile. The same user profile pointer is also saved as value in each Graph Vertex.
So, I have 3 data structures, one Graph and two Hash Tables and every single one of them point to the same exact UserProfile. The Graph structure will serve the purpose of finding the shortest path and stuff like that while the Hash Tables serve as quick index by name and ID.
What I'm thinking to solve my Graph problem is to, instead of having the Hash Tables value point to the UserProfile, I point it to the corresponding Vertex. It's still a pointer, no more and no less space is used, I just change what I point to.
Like this, I can easily and quickly lookup for each Vertex I need and link them together. This will insert the ~520000 relations pretty quickly.
I thought of this solution because I already have the Hash Tables and I need to have them, then, why not take advantage of them for indexing the Graph vertices instead of the user profile? It's basically the same thing, I can still access the UserProfile pretty quickly, just go to the Vertex and then to the UserProfile.
But, do you see any cons on this second solution against the first one? Or only pros that overpower the pros and cons on the first solution?
Other Solution) If you have any other solution, I'm all ears. But please explain the pros and cons of that solution over the previous 2. I really don't have much time to be wasting with this right now, I need to move on with this project, so, if I'm doing to do such a change, I need to understand exactly what to change and if that's really the way to go.
Hopefully no one fell asleep reading this and closed the browser, sorry for the big testament. But I really need to decide what to do about this and I really need to make a change.
P.S: When answering my proposed solutions, please enumerate them as I did so I know exactly what are you talking about and don't confuse my self more than I already am.
The first approach is the Since the main issue here is speed, I would prefer the array approach.
You should, of course, maintain the hash table for the name-index lookup.
If I understood correctly, you only process the data one time. So there is no dynamic data insertion.
To deal with the space allocation problem, I would recommend:
1 - Read once the file, to get the number of vertex.
2 - allocate that space
If you data is dynamic, you could implement some simple method to increment the array size in steps of 50%.
3 - In the Edges, substitute you linked list for an array. This array should be dynamically incremented with steps of 50%.
Even with the "extra" space allocated, when you increment the size with steps of 50%, the total size used by the array should only be marginally larger than with the size of the linked list.
I hope I could help.