I have seen code that creates an octree, adds and remove data from them, but how does one actually build the octree? Is there 3d voxel software that will save to an array of some sort that can then be converted to an octree? or can you save directly to an octree?
Depends on your implementation -
If you are using oct-trees to subdivide space then typically you'll throw a bunch of V3s at it, and once a node has more than a certain amount of points in it, you subdivide, and redistribute them.
If you are looking for a way to store minecraft style voxels, then you'll subdivide until you reach 1:1 with your voxel size, and store your data in the leaf nodes.
Where the data comes from is up to you - Octrees are a way of storing, manipulating and searching for data, not a file-format as such.
Each node in an octree has one point. These nodes are broken up into (you guessed it) eight children nodes. These nodes in turn contain each, one single point.
In general, you don't add all of the your vertices to an octree unless you are doing some ungodly collision detection where every single vertex counts... Not that you can't make it fast then, but it's still slower than the approximation given by a smaller number of nodes. (This is true for nearly everything, approximation is faster).
Again, if you are doing rendering at a high quality octrees should probably have as many nodes as you have points.
Now on to the answer:
Create a root node with the bounding box that will enclose it from the center.
Insert each point. This should subdivide the octree in the relevant directions.
As you insert these points, the data will move further down into leaf nodes that more closely encapsulate your model.
Also, as you subdivide the bounding box is halved for each subsequent node down.
If you really want to save it, you can save the vertices (numbered), and then have your program write down the various connections between nodes, and vertices onto a disk, where they will be able to be loaded in roughly the same time it takes to build the octree from scratch in the first place.
Anyway, I hope I answered your question.
Related
I am working on an Android application that uses the Overpass API at [1]. My goal is to get all circular ways that enclose a certain lat-long point.
In order to do so I build a request for a rectangle that contains my location, then parse the response XML and run a ray-casting algorithm to filter the ways that enclose the given lat-long position. This is too slow for the purpose of my application because sometimes the response has tens or hundreds of MB.
Is there any OSM API that I can call to get all ways that enclose a certain location? Otherwise, how could I optimize the process?
Thanks!
[1] http://overpass-api.de/
To my knowledge, there is no standard API in OSM to do this (it is indeed a very uncommon usecase).
I assume you define enclose as the point representing the current location is inside the inner area of the polygon. Furthermore I assume optimizing the process might including changing the entire concept of the algorithm.
First of all, you need to define the rectangle to fetch data. For that, you need to consider that querying a too large rectangle would yield too much data. As far as I know there is no specific API to query circular ways only, and even if there is, querying a too large rectangle would probably denied by the server, because the server load would be enormous.
Server-side precomputation / prefiltering
Therefore I suggest the first optimization: Instead of querying an API that is not specifically suited for your purpose, use an offline database saved on the Android device. OsmAnd and others save the whole database for a country offline, but in your specific usecase you only need to save a pre-filtered database of circular ways.
As far as I know, only a small fraction of the ways in OSM is circular. Therefore I suggest writing a script that regularly downloads OSM dumps e.g. from Geofabrik, remove non-circular ways (e.g. you could check if the last node ID in a way is equal to the first node ID, but you'd need to check if that captures any way you would define as circular). How often you would run it depends on your usecase.
This optimization solves:
The issue of downloading a large amount of data
The issue of overloading the API with large request
The issue of not being able to request large chunks of data
If that is not suitable for your usecase, I suggest to build a simple API for that on your server.
Re-chunking the data into appriopriate grids
However, you still would need to filter a large amount of data. In order to partially solve this, I suggest the second optimization: Re-chunk your data. For example, if your current location is in Virginia, you would not need to filter circular ways that have an area not beyond Texas. Because filtering by state etc. would by highly country-dependent and difficult (CPU-intensive), I suggest to choose a grid, say e.g. 0.05 lat/lon degree (I'd choose a equirectangular projection because it's easy to calculate if you already have lat/lon coordinates).
The script that preprocessed that data shall then create one chunk of data (that could be a file, but we don't know enough about your usecase to talk about specific data strucutres) for any rectangle in the area you want to use. A circular way is included in this chunk if and only if it has at least one node that is inside the chunk area.
You would then only request / filter the specific chunk your position is currently in. Choose the chunk size appropriately for your application (preferably rather small, but that depends on numerous factors!).
This optimization solves:
Assuming most of the circular ways are quite small in terms of their bounding rectangles, you only need to filter a tiny fraction of the overall ways
IO is minimized, especially if you
Hysteretic heuristics
If the aforementioned optimizations do not sufficiently reduce your computation time, I'd suggest the third optimization that depends on how many circular ways you want to find (if you really need to find all, it won't help at all): Use hysteresis. Save the circular ways you were inside of during the last computation (assuming the new current location is near to the last location) and check them first. If your location didn't change too much, you have a high chance of hitting a way you're inside of during the first few raycasts.
Leveraging relations between different circular ways
Also, a fourth optimization is possible: There will be some circular ways that are fully enclosed in another circular way. You could code your program so that it knows about that relation and checks the inner circular way first. If this check succeeds, you automatically now that the current position is also contained in the outer circular way. I think computing the information (server-side) could be incredibly CPU-intensive and implementing it might also be a hard task, so I'd suggest to use this optimization only if not avoidable.
Tuning the parameters of these optimizations should be sufficient to decrease the CPU time needed for your computation significantly. Please feel free to comment/ask if you have further questions regarding these suggestions.
I am working on a simulation where I need to be able to handle thousands potentialy millions of objects updating every loop.
All of the objects need to have their logic function called (AI).
But depending on the location of the object determines how detailed the logic will be. For example:
[working with 100 objects to keep it simple]
All objects have a location (x,y)
20 objects are 500 points away
from a 'point of interest' location.
50 objects are 500 points
from the 20 objects (1000 points away).
30 objects are within 100
points from the point of interest.
Now say this was a detailed city simulation with the objects being virtual citizens.
At 6pm it's time for everyone to go home from their jobs and sleep.
So we iterate through all citizens, but I'm wanting them to do different things.
The furtherest away objects (50) Go home from their job and sleep
until morning.
The closer objects (20) Go home from their job, have a
bite to eat then sleep until morning.
The closest objects (30) Go
home from their job, have a bite to eat, brush teeth then sleep until
morning.
As you can see the closer they are to the point of interest the more detailed the logic becomes.
I am trying to work out what the best and most performance efficient way to iterate through all objects would be.
This would be relativly easy with a hand full of objects but as this needs to handle at lest 500,000 objects efficiently, I need some advice.
Also I'm not sure if I should iterate through all objects every loop or maybe it would be better to iterate through the closest objects every loop but only itereate through further away objects every 10 loops?
With the additional requirement of needing the objects to interact between other objects close to them, I have been thinking the best way to do this might be to organise them in a quadtree but I'm not sure. It seems as though quad trees are more for static content, but the objects i'm dealing with, as mentioned have a location and are required to move to other locations.
Am I going down the right track of thinking? or is there a 'better' way?
I am also working in c++ if anyone thinks its relevant.
Any advise would be greatly appreciated.
NOTE:
The point of interest changes regularly, think of it as a camera
view.
Objects are created and destroyed dynamically
If you want to quickly select objects in certain radius from particular point, then quad-tree or just simple square grid will help.
If your problem is how to store millions of objects to make iteration through them efficient, then you probably could use column based technique, when instead of having 1 million objects each having 5 fields, you have 5 arrays of 1 million elements each. In this case each object is just an index in range 0 .. 999999. So, for example, you want to store 1 million object of the following structure:
struct resident
{
int x;
int y;
int flags;
int age;
int health; // This is computer game, right?
}
Then, instead of declaring resident residents [1000000] you declare 5 arrays:
int resident_x [1000000];
int resident_y [1000000];
int resident_flags [1000000];
int resident_age [1000000];
int resident_health [1000000];
And then, instead of, say, residents [n].x you use resident_x [n]. Such way to store objects may be faster when you need to iterate through all objects of the same type and do something with couple of fields in each object (with the same set of fields in each object).
You need to break the problem down into "classes", just like in the real world. Each person's class is computed from the distance. So lower class people are far away and upper class are close. Or more correctly "far class", nearish class and "here class" or whatever you want to name them.
1) Make an array with one slot for each class. This slot will hold a "linked list" of each person in that class. When a persons class changers(social climbers), then it is very rapid to move the object to another list.
2) So put everybody into the proper classes and iterate only the classes close to you. In a proper scenario there are objects which are to far away to care about so you can put those back to disk and only reload when you get nearer.
There's a few questions embedded in there:
-How to deal with large quantities of objects? If there is a constant number of fixed objects, you may be able to simply create an array of them, as long as you have sufficient memory. If you need to dynamically create and destroy them, you put yourself at risk for memory leaks without careful handling of destroyed objects. At a certain point, you may ask yourself whether it is better to use another application, such as a database, to store your objects, and perform just the logic in your C++ code. Databases will provide additional functionality that I will highlight.
-How to find objects in a given distance from others. This is a classic problem for geographic information systems (GIS); it sounds like you are trying to operate a simple GIS to store your objects and your attributes, so it is applicable. It takes computation power to test SQRT((X-x)^2+(Y-y)^2), the distance formula, on every point. Instead, it is common to use a 'windowing function' to extract a square containing all the points you want, and then search within this to find points that lie specifically in a given radius. Some databases are optimized to perform a variety of GIS functions, including returning points within a given radius, or returning points within some other geometry like a polygon. Otherwise you'll have to program this functionality yourself.
-Tree storage of objects. This can improve speed, but you will hit a tradeoff if the objects are constantly moving around, wherein the tree has to be restructured often. It all depends on how often things move versus how often you want to do calculations on them.
-AI code. If you're trying to do AI on millions of objects, this may be your biggest use of performance, instead of the methodology used to store and search the objects. You're right in that simpler code for points farther away will increase performance, as will executing the logic less often for far away points. This is sometimes handled using Monte Carlo analysis, where the logic will be performed on a random subset of points during any given iteration, and you could have the probability of execution decrease as distance from the point of interest increases.
I would consider using a Linear Quadtree with Morton Encoding / Z-Order indexing. You can further optimize this structure by using a Bit Array to represent nodes that contain data and very quickly perform calculations.
I've done this extremely efficiently in the browser using Javascript and I can traverse through 67 million nodes in sub-seconds. Once I've narrowed it down to the region of interest, I look up the data in a different structure. All of it still in milliseconds. I'm using this for spatial vector animation.
Bentley-Ottmann algorithm works for finding intersections of set of straight lines. But I have lot of polylines:
Is there a way to find intersections of the set of polylines?
I'm figuring out, but in the meanwhile, if someone can give some pointers or ideas, that would be helpful. Thanks for reading. By the way, I'm using WPF/C# and all the polylines are PathGeometry.
Source of the Image: http://www.sitepen.com/blog/wp-content/uploads/2007/07/gfx-curve-1.png
The sweep line algorithm has a nice theory but is hard to implement robustly. You need to treat vertical segments, and there might be cases where more than two line segments intersect in a single point (or even share a common line segment).
I'd use an R-Tree to store bounding boxes of the line segments of the polyline and then use the R-Tree to find possibly intersecting elements. Only these need to be tested for intersection. The advantage is that you can use a separate R-Tree for each polyline and thus avoid detection of selfintersections, if needed.
Consider using CGAL's exact predicates kernel to get reliable results.
Adding to Geom's suggestion (an R-Tree is the way to go), further performance improvements can be gained by doing the following:
1. Simplify the polyline - The number of points in the polyline can be reduced while keeping the polyline's general shape. This can be done using an angle threshold and processing each point or by using the Ramer-Douglas-Peucker algorithm. Depending on what you're doing, you may need to keep track of which points from the original polyline were used as the start/end points for each segment of the simplified polyline (the indices of the original polyline's points will need to be stored somewhere).
In this example you can see how a polyline's number of points can be reduced. The red points indicate points which were not used from the original polyline, and the green points indicate points which were kept to build the simplified polyline.
2. Store the simplified polylines in an R-Tree, and determine the intersections between each segment of each polyline (comparing bounds of segments to reduce calculations is beneficial to performance). As this is being done, the old indices from the original polyline's segments are stored as information relating to each detected intersection, along with which polylines intersected (some sort of identifier can be used). This essentially gives you the start and end bounds of each segment in the original polylines where the intersections exist with each other polyline.
3. This step is performed only if the intersection location must match the exact location of the intersections of the original polylines. You will need to go back and use the original non-simplified polylines, along with the data from the intersection information obtained in step 2. Each intersection should have a start and end index associated with it, and these indices can be used to determine which specific segments of the original polyline need to be processed. This will allow you to only process the necessary segments (given by the start and end indices stored with the intersection information). An alternative to that would be to use the point itself and extend a bounding box outward, then process segments of the orignal polylines which intersect with that bounding box (although this will likely take longer).
4. It may be necessary to take an additional step to check the endpoints of each polyline against every other polyline's segments, since the simplification process can knock out some endpoint intersections. (this is generally pretty fast).
This algorithm can be further improved by using the Bentley-Ottmann algorithm (this is the sweep-line algorithm Geom was referring to). Also note that depending on the simplification algorithm used and the parameters used for such algorithms (angular tolerance for example), there may be a trade-off between performance and accuracy (some intersection results can be lost depending on how simple the polylines are).
Obviously, there are libraries out there which may be viable, but if you're limited by license terms due to the company you work for or the product you're working on, third party libraries may not be an option. Additionally, other libraries may not perform as well as may be required.
I need to store a graph for the map of a game inside a game server written in C.
The graph has ~200 nodes and 3 kinds of edges that can connect two nodes (these three kind can also overlap: a node can be connected by 2 edges of two different types for example). The maximum degree of a node is something like 5-6 nodes.
What I would like to have is to have this static structure implemented in an efficient way to allow me to do simple operations like
is n1 connected to n2? (with all kinds of edges in case of affermative response)
what is n1 connected to? (with all kinds of edges or a specific one)
but in a multi-threaded environment since there will be many instances of the game that relies on the same static graph.
Since the graph can't be modified and the structure is well known I'm sure there are some tricks to implement it in a cool fashion to use least resources possible.
I would like to avoid using STL or Boost for now.. do you have any clues on a data structure that could suit well?
(it's not a premature optimization, the fact is that it will run on a vps and I don't have many ram neither cpu power so I need to keep it tight)
EDIT: just because I forgot (and thanks to make me realize it) the graph is undirected so every edge is symmetric..
Thanks in advance
Many answers are possible. This one relies on the fact that you have relatively few nodes. The advantage of this approach is probably unbeatable performance.
The idea is to represent your graph as a 200x200 matrix of bytes, each entry representing an edge. The byte gives you 256 different possible values, where a 0 will obviously mean "no connection" and any non-zero combination of bits can represent up to 8 different edge types.
Let the "row" of this matrix be the starting node and the "column" be the destination. Initialize the structure such that for every edge connecting one node with another, there's a value at the intersection of starting / ending. That value can be a combination of bits representing edge types.
To find out whether one node connects to another, simply query the byte at the intersection of one node and the other: If there's a nonzero value there, then there is a connection, and the value will tell you what kind.
For 200 nodes, this data structure will eat up 40 KB, which is pretty moderate. It won't scale too well once you get beyond, say, 1000 nodes.
As long as nothing (apart from one-time initialization) ever writes to this structure, it will be naturally thread safe, as its state never changes.
Since degrees are limited, you can get very good performance by just representing a node by a struct with arrays of pointers to other nodes (one array for each edge type).
Regardless of the data structure you pick, you can avoid worrying about multithreading if your graph is read-only (OK for multiple thread to access it without synchronization).
This program I'm doing is about a social network, which means there are users and their profiles. The profiles structure is UserProfile.
Now, there are various possible Graph implementations and I don't think I'm using the best one. I have a Graph structure and inside, there's a pointer to a linked list of type Vertex. Each Vertex element has a value, a pointer to the next Vertex and a pointer to a linked list of type Edge. Each Edge element has a value (so I can define weights and whatever it's needed), a pointer to the next Edge and a pointer to the Vertex owner.
I have a 2 sample files with data to process (in CSV style) and insert into the Graph. The first one is the user data (one user per line); the second one is the user relations (for the graph). The first file is quickly inserted into the graph cause I always insert at the head and there's like ~18000 users. The second file takes ages but I still insert the edges at the head. The file has about ~520000 lines of user relations and takes between 13-15mins to insert into the Graph. I made a quick test and reading the data is pretty quickly, instantaneously really. The problem is in the insertion.
This problem exists because I have a Graph implemented with linked lists for the vertices. Every time I need to insert a relation, I need to lookup for 2 vertices, so I can link them together. This is the problem... Doing this for ~520000 relations, takes a while.
How should I solve this?
Solution 1) Some people recommended me to implement the Graph (the vertices part) as an array instead of a linked list. This way I have direct access to every vertex and the insertion is probably going to drop considerably. But, I don't like the idea of allocating an array with [18000] elements. How practically is this? My sample data has ~18000, but what if I need much less or much more? The linked list approach has that flexibility, I can have whatever size I want as long as there's memory for it. But the array doesn't, how am I going to handle such situation? What are your suggestions?
Using linked lists is good for space complexity but bad for time complexity. And using an array is good for time complexity but bad for space complexity.
Any thoughts about this solution?
Solution 2) This project also demands that I have some sort of data structures that allows quick lookup based on a name index and an ID index. For this I decided to use Hash Tables. My tables are implemented with separate chaining as collision resolution and when a load factor of 0.70 is reach, I normally recreate the table. I base the next table size on this Link.
Currently, both Hash Tables hold a pointer to the UserProfile instead of duplication the user profile itself. That would be stupid, changing data would require 3 changes and it's really dumb to do it that way. So I just save the pointer to the UserProfile. The same user profile pointer is also saved as value in each Graph Vertex.
So, I have 3 data structures, one Graph and two Hash Tables and every single one of them point to the same exact UserProfile. The Graph structure will serve the purpose of finding the shortest path and stuff like that while the Hash Tables serve as quick index by name and ID.
What I'm thinking to solve my Graph problem is to, instead of having the Hash Tables value point to the UserProfile, I point it to the corresponding Vertex. It's still a pointer, no more and no less space is used, I just change what I point to.
Like this, I can easily and quickly lookup for each Vertex I need and link them together. This will insert the ~520000 relations pretty quickly.
I thought of this solution because I already have the Hash Tables and I need to have them, then, why not take advantage of them for indexing the Graph vertices instead of the user profile? It's basically the same thing, I can still access the UserProfile pretty quickly, just go to the Vertex and then to the UserProfile.
But, do you see any cons on this second solution against the first one? Or only pros that overpower the pros and cons on the first solution?
Other Solution) If you have any other solution, I'm all ears. But please explain the pros and cons of that solution over the previous 2. I really don't have much time to be wasting with this right now, I need to move on with this project, so, if I'm doing to do such a change, I need to understand exactly what to change and if that's really the way to go.
Hopefully no one fell asleep reading this and closed the browser, sorry for the big testament. But I really need to decide what to do about this and I really need to make a change.
P.S: When answering my proposed solutions, please enumerate them as I did so I know exactly what are you talking about and don't confuse my self more than I already am.
The first approach is the Since the main issue here is speed, I would prefer the array approach.
You should, of course, maintain the hash table for the name-index lookup.
If I understood correctly, you only process the data one time. So there is no dynamic data insertion.
To deal with the space allocation problem, I would recommend:
1 - Read once the file, to get the number of vertex.
2 - allocate that space
If you data is dynamic, you could implement some simple method to increment the array size in steps of 50%.
3 - In the Edges, substitute you linked list for an array. This array should be dynamically incremented with steps of 50%.
Even with the "extra" space allocated, when you increment the size with steps of 50%, the total size used by the array should only be marginally larger than with the size of the linked list.
I hope I could help.