NBody Simulation - appropriate design approach - database

I have a problem where I am going to have a bunch of nbodies - the movements of each is predescribed by existing data, however when a body is in the range of another one certain properties about it change. For the sake of this question we'll just assume you have a counter per body that counts the time you were around other bodies. So basically you start with t = 0, you spend 5 seconds around body 2, so your t is now 5. I am wondering what's the best way I should go about this, I don't have the data yet, but I was just wondering if it's appropriate for me to explore something like CUDA/OpenCL or should I stick with optimizing this across a multi-core cpu machine. Because the amount of data that this will be simulated across is about 500 bodies, which each have movements described down to the second over a 30 day period, so that's 43200 points of data per body.

Brute force nbody is definitely suited to GPUs, because it is "embarrassingly parallel". Each body-to-body interaction computation is completely independent of any other. Your variation that includes keeping track of time spent in the "presence" of other bodies would be a straightforward addition to the existing body-to-body force computation, since everything is done on a timestep basis anyway.
Here's some sample CUDA code for nbody.

Related

Monte Carlo Tree Search Improvements

I'm trying to implement the MCTS algorithm on a game. I can only use around 0.33 seconds per move. In this time I can generate one or two games per child from the start state, which contains around 500 child nodes. My simulations aren't random, but of course I can't make a right choice based on 1 or 2 simulations. Further in the game the tree becomes smaller and I can my choices are based on more simulations.
So my problem is in the first few moves. Is there a way to improve the MCTS algorithm so it can simulate more games or should I use another algorithm?
Is it possible to come up with some heuristic evaluation function for states? I realise that one of the primary benefits of MCTS is that in theory you wouldn't need this, BUT: if you can create a somewhat reasonable evaluation function anyway, this will allow you to stop simulations early, before they reach a terminal game state. Then you can back-up the evaluation of such a non-terminal game state instead of just a win or a loss. If you stop your simulations early like this, you may be able to run more simulations (because every individual simulation takes less time).
Apart from that, you'll want to try to find ways to ''generalize''. If you run one simulation, you should try to see if you can also extract some useful information from that simulation for other nodes in the tree which you didn't go through. Examples of enhancements you may want to consider in this spirit are AMAF, RAVE, Progressive History, N-Gram Selection Technique.
Do you happen to know where the bottleneck is for your performance? You could investigate this using a profiler. If most of your processing time is spent in functions related to the game (move generation, advancing from one state to the next, etc.), you know for sure that you're going to be limited in the number of simulations you can do. You should then try to implement enhancements that make each individual simulation as informative as possible. This can for example mean using really good, computationally expensive evaluation functions. If the game code itself already is very well optimized and fast, moving extra computation time into things like evaluation functions will be more harmful to your simulation count and probably pay off less.
For more on this last idea, it may be interesting to have a look through some stuff I wrote on my MCTS-based agent in General Video Game AI, which is also a real-time environment with a very computationally expensive game, meaning that simulations counts are severely constrained (but the branching factor is much much smaller than it seems to be in your case). Pdf files of my publications on this are also available online.

Rolling Timer Array for Calculating Averages

Language: C++
Development Environment: Microsoft Visual C++
Libraries Used: MFC
Problem: This should be fairly simple, but I can't quite wrap my head around it. I'm attempting to calculate a rolling average over a given amount of time - let's say five seconds. Every second, my program receives a data message containing some numerical information, including the CPU idle time during the process.
I want to be able to show the user an average CPU idle time over a five second period. I was thinking about using just an array and storing a value every five seconds, but I'm not sure how to do the rolling portion. Unless there is some other built-in method for doing rolling calculations?
As it turns out, it would actually be better to implement immediate feedback regarding idle percentage, which is much easier to code.

A.I.: How would I train a Neural Network across multiple machines?

So, for larger networks with large data sets, they take a while to train. It would be awesome if there was a way to share the computing time across multiple machines. However, the issue with that is that when a neural network is training, the weights are constantly being altered every iteration, and each iteration is more or less based on the last -- which makes the idea of distributed computing at the very least a challenge.
I've thought that for each portion of the network, the server could send maybe a 1000 sets of data to train a network on... but... you'd have roughly the same computing time as I wouldn't be able to train on different sets of data simultaneously (which is what I want to do).
But even if I could split up the network's training into blocks of different data sets to train on, how would I know when I'm done with that set of data? especially if the amount of data sent to the client machine isn't enough to achieve the desired error?
I welcome all ideas.
Quoting http://en.wikipedia.org/wiki/Backpropagation#Multithreaded_Backpropagation:
When multicore computers are used multithreaded techniques can greatly decrease the amount of time that backpropagation takes to converge. If batching is being used, it is relatively simple to adapt the backpropagation algorithm to operate in a multithreaded manner.
The training data is broken up into equally large batches for each of the threads. Each thread executes the forward and backward propagations. The weight and threshold deltas are summed for each of the threads. At the end of each iteration all threads must pause briefly for the weight and threshold deltas to be summed and applied to the neural network.
which is essentially what other answers here describe.
Depending on your ANN model you can exploit some parallelism on multiple machines by running the same model with the same training and validation data on multiple machines but set different ANN properies; initial values, ANN parameters, noise etc, for different runs.
I used to do this a lot to make sure I'd explored the problem space effectively and wasn't stuck in local minima etc. This is a very easy way to take advantage of multiple machines without having to recode your algorith. Just another approach you might want to consider.
My assumption is you have more than 1 training set, and you have a gold standard. Also, I assume you have some way of storing the state of the neural network (whether it's a list of probability weights for each node, or something along those lines).
Using as many compute nodes in a cluster as you can, launch the program on a data set on each node. Save the results for each, and test on the gold standard. Which ever neural network state performs best set as the input for the next round of training. Repeat as much as you see fit
If I understand correctly, you're trying to figure out a way to train an ANN on a cluster of machines? As you stated, partitioning the network isn't the right approach, and as far as I know, is seemingly unfeasible for most models. A possible approach might be to partition the training sets and run local copies of your network, and then merge the results. An intuitive way to do this and gain some validation along the way would be with cross-validation. As you stated, knowing when the network has had the right amount of training is a problem, but that variability is a problem inherent to neural nets in general, not in parallelizing the work.
As you also stated, the updates that happen during each iteration of training are dependent on the current state of the weights, but without mixing up training sets/validation, you're likely overfitting. This is why CV is nice, because your training sets will all get a chance to play a role in the training, and the validating, across multiple samples.
If you do batch training, the weight are only altered after you have been through the entire dataset. You can compute the weight update vector for each data point in the set on a separate machine/core and add them up at the end, then proceed with the next epoch.
Here is a link to a question about batch training.

Architecture and pattern for large scale, time series based, aggregation operation

I will try to describe my challenge and operation:
I need to calculate stocks price indices over historical period. For example, I will take 100 stocks and calc their aggregated avg price each second (or even less) for the last year.
I need to create many different indices like this where the stocks are picked dynamically out of 30,000~ different instruments.
The main consideration is speed. I need to output a few months of this kind of index as fast as i can.
For that reason, i think a traditional RDBMS are too slow, and so i am looking for a sophisticated and original solution.
Here is something i had In mind, using NoSql or column oriented approach:
Distribute all stocks into some kind of a key value pairs of time:price with matching time rows on all of them. Then use some sort of a map reduce pattern to select only the required stocks and aggregate their prices while reading them line by line.
I would like some feedback on my approach, suggestion for tools and use cases, or suggestion of a completely different design pattern. My guidelines for the solution is price (would like to use open source), ability to handle huge amounts of data and again, fast lookup (I don't care about inserts since it is only made one time and never change)
Update: by fast lookup i don't mean real time, but a reasonably quick operation. Currently it takes me a few minutes to process each day of data, which translates to a few hours per yearly calculation. I want to achieve this within minutes or so.
In the past, I've worked on several projects that involved the storage and processing of time series using different storage techniques (files, RDBMS, NoSQL databases). In all these projects, the essential point was to make sure that the time series samples are stored sequentially on the disk. This made sure reading several thousand consecutive samples was quick.
Since you seem to have a moderate number of time series (approx. 30,000) each having a large number of samples (1 price a second), a simple yet effective approach could be to write each time series into a separate file. Within the file, the prices are ordered by time.
You then need an index for each file so that you can quickly find certain points of time within the file and don't need to read the file from the start when you just need a certain period of time.
With this approach you can take full advantage of today's operating systems which have a large file cache and are optimized for sequential reads (usually reading ahead in the file when they detect a sequential pattern).
Aggregating several time series involves reading a certain period from each of these files into memory, computing the aggregated numbers and writing them somewhere. To fully leverage the operating system, read the full required period of each time series one by one and don't try to read them in parallel. If you need to compute a long period, then don’t break it into smaller periods.
You mention that you have 25,000 prices a day when you reduce them to a single one per second. It seems to me that in such a time series, many consecutive prices would be the same as few instruments are traded (or even priced) more than once a second (unless you only process S&P 500 stocks and their derivatives). So an additional optimization could be to further condense your time series by only storing a new sample when the price has indeed changed.
On a lower level, the time series files could be organized as a binary files consisting of sample runs. Each run starts with the time stamp of the first price and the length of the run. After that, the prices for the several consecutive seconds follow. The file offset of each run could be stored in the index, which could be implemented with a relational DBMS (such as MySQL). This database would also contain all the meta data for each time series.
(Do stay away from memory mapped files. They're slower because they aren’t optimized for sequential access.)
If the scenario you described is the ONLY requirement, then there are "low tech" simple solutions which are cheaper and easier to implement. The first that comes to mind is LogParser. In case you haven't heard of it, it is a tool which runs SQL queries on simple CSV files. It is unbelievably fast - typically around 500K rows/sec, depending on row size and the IO throughput of the HDs.
Dump the raw data into CSVs, run a simple aggregate SQL query via the command line, and you are done. Hard to believe it can be that simple, but it is.
More info about logparser:
Wikipedia
Coding Horror
What you really need is a relational database that has built in time series functionality, IBM released one very recently Informix 11.7 ( note it must be 11.7 to get this feature). What is even better news is that for what you are doing the free version, Informix Innovator-C will be more than adequate.
http://www.freeinformix.com/time-series-presentation-technical.html

Storing Signals in a Database

I'm designing an application that receives information from roughly 100k sensors that measure time-series data. Each sensor measures a single integer data point once every 15 minutes, saves a log of these values, and sends that log to my application once every 4 hours. My application should maintain about 5 years of historical data. The packet I receive once every 4 hours is of the following structure:
Data and time of the sequence start
Number of samples to arrive (assume this is fixed for the sake of simplicity, although in practice there may be partials)
The sequence of samples, each of exactly 4 bytes
My application's main usage scenario is showing graphs of composite signals at certain dates. When I say "composite" signals I mean that for example I need to show the result of adding Sensor A's signal to Sensor B's signal and subtracting Sensor C's signal.
My dilemma is how to store this time-series data in my database. I see two options, assuming I use a relational database:
Store every sample in a row of its own: when I receive a signal, break it to samples, and store each sample separately with its timestamp. Assume the timestamps can be normalized across signals.
Store every 4-hour signal as a separate row with its starting time. In this case, whenever a signal arrives, I just add it as a BLOB to the database.
There are obvious pros and cons for each of the options, including storage size, performance, and complexity of the code "above" the database.
I wondered if there are best practices for such cases.
Many thanks.
Storing each sample in it's own row sounds simple and logical to me. Don't be too hasty to optimize unless there is actually a good reason for it. Maybe you should do some tests with dummy data to see if any optimization is really necessary.
I think storing the data in the form that makes it easiest to carry out your main goal is likely the least painful overall. In this case, it's likely the more efficient as well.
Since your main goal appears to be to display the information in interesting and flexible ways I'd go with separate rows for each data point. I presume most of the effort required to write this program well is likely on the display side, you should minimize the complexity on that side as much as possible.
Storing data in BLOBs is good if the content isn't relevent and you would never want to run queries against it. In this case, your data will be the contents of the database, and therefore, very relevent.
I think you should:
1.Store every sample in a row of its own: when I receive a signal, break it to samples, and store each sample separately with its timestamp. Assume the timestamps can be normalized across signals.
I see two database operations here: the first is to store the data as it comes in, and the second is to retrieve the data in a (potentially large) number of ways.
As Kieveli says, since you'll be using discrete parts of the data (as opposed to all of the data all at once), storing it as a blob won't help you when it comes time to read it. So for the first task, storing the data line by line would be optimal.
This might also be "good enough" when querying the data. However, if performance is an issue, and/or if you get massive amounts of volume [100,000 sensors x 1 per 15 minutes x 4 hours = 9,600,000 rows per day, x 5 years = 17,529,600,000 or so rows in five years]. To my mind, if you want to write flexible queries against that kind of data, you'll want some form of star schema structure (as gets used in data warehouses).
Whether you load the data directly into the warehouse, or let it build up "row by row" to be added to the warehouse ever day/week/month/whatever, depends on time, effort, available resources, and so on.
A final suggestion: when you set up a test environment for your new code, load it with several years of (dummy) data, to see how it will perform.

Resources