Alternate cut off matrix - analytics

If I have an alternate confusion matrix similar to one above. I am trying to predict bad risk.
I do understand that with .04 probability I am able to find more bad risk but I would want to know what implication does this .04 has interms of business.Why do I want a different cut off then .5?

Related

Non-converging Neural Network in C

I wrote my first feed-forward neural network in C, using the sigmoid 1.0 / (1.0 + exp(-x)) as activation function and gradient descent to adjust the weights. I tried to approximate sin(x) to make sure my network works. However, the output of the neuron on the output layer seems to always oscillate between the extreme values 0 and 1 and the weights of the neurons grow to absurd sizes, no matter how many hidden layers there are, how many neurons are in the hidden layer(s), how many training samples I provide, or even what the target outputs are.
1) Are there any standard 'tried and tested' data sets used to proof-test neural networks for errors? If yes, what structures work best (e.g. numbers of neuron(s) in the hidden layer) to converge to the desired output?
2) Are there any common errors that generate the same symptoms? I found this thread, but the issue was because of faulty data, which I believe is not my case.
3) Is there any preferred way of training the network? In my implementation I cycle through the training sets and adjust the weights each time, then rinse and repeat ~1000 times. Is there any other order that works better?
So, to sum up:
Assuming that your gradient propagation works properly usually the values of parameters like topology, learning rate, batch size or value of a constant connected with weight penalty (L1 and L2 decay) are computed using a techniques called grid search or random search. It was empirically proved that random search performs better in this task.
The most common reason of weight divergence is wrong learning rate. Big value of it might make learning really hard. But on the other hand - when learning rate is too small - learning process might take a really long time. Usually - you should babysit the learning phase. The specified instruction might be found e.g. here.
In your learning phase you used a technique called SGD. Usually - it may achieve good results but it's vulnerable to variance of data sets and big values of learning rates. What I advice you is to use batch learning and set a batch size as additional learning parameter learnt during grid or random search. You can read about here e.g. here.
Another thing which you might consider is to change your activation function to tanh or relu. There are a lot of problems with saturation regions of sigmoid and it usually needs a proper initialization. You can read about it here.

traversing numbers in an interval wisely

I want to scan the numbers in a big interval wisely until I find the one I need.
But, I don't have any clue where this number might be and I will not have any clue during searching process.
Let me give an example to make it easy to state my question
Assume I am searching a number between 100000000000000 and 999999999999999
Naive approach would be starting from 100000000000000 and counting to 99... one by one.
but this is not wise because number can be on the far end If I am not lucky.
so, what is the best approach to this problem. I am not looking for mathematically best, I need a technique which is easy to implement in C programming Language.
thanks in advance.
There is no solution to your problem, but knowledge. If you don't know anything about the number, any strategy to enumerate them is equally good (or bad).
If you suppose that you are fighting against an adversary that is trying to hide the number for you, a strategy would be to make your next move unguessable. That would be to randomly pick numbers in the range and ask for them. (to avoid repetitions, you'd have to use a random permutation of your numbers.) By that you'd then find your number with an expected number of about half the total number, that is you'd gain a factor of two from the worst case. But as said all of that depends on the assumption that you can make.
Use bisection search. First see if your number is above or below the middle of the range. Depending on the answer, repeat the process for the upper or lower half of the range, respectively.
As you already know there is no strategy to improve search speed. All you can do is to speed up the search itself by using multithreading. So the technically best approach might be to try to implement the algorithm in OpenCL (which is fairly similar to C and which can be used through a C library) and run several hundred tests in parallel, depending on your hardware (GPU).

What is the most practical board representation for a magic bitboard move generation system?

I am rewriting a chess engine I wrote to run on magic bitboards. I have the magic functions written and they take a square of the piece and an occupancy bitboard as parameters. What I am having debates with myself is which one of these board representation schemes is faster/more practical:
scheme 1: There is a bitboard for each type of piece, 1 for white knights, 1 for black rooks. . . , and in order to generate moves and push them to the move stack, I must serialize them to find the square of the piece and then call the magic function. Then I must serialize that move bitboard and push them. The advantage is is that the attacking and occupancy bitboards are closer at hand.
scheme 2: A simple piece centric array [2][16] or [32] contains the square indices of the pieces. A simply loopthrough and call of the functions is all it takes for the move bitboards. I then serialize those bitboards and push them to the move stack. I also have to maintain an occupancy bitboard. I guess getting an attack bitboard shouldn't be any different: I have to once again generate all the move bitboards and, instead of serializing them, I bitwise operate them in a mashup of magic.
I'm leaning towards scheme 2, but for some reason I think there is some sort of implementation similar to scheme 1 that is standard. For some reason I can't find drawbacks of making a "bitboard" engine without actually using bitboards. I wouldn't even be using bitboards for king and knight data, just a quick array lookup.
I guess my question is more of whether there is a better way to do this board representation, because I remember reading that keeping a bitboard for each type of piece is standard (maybe this is only necessary with rotated bitboards?). I'm relatively new to bitboard engines but I've read a lot and I've implemented the magic method. I certainly like the piece centric array approach - it makes a lot of arbitrary stuff like printing the board to the screen easier, but if there is a better/equal/more standard way can someone please point it out? Thanks in advance - I know this is a fairly specific question and difficult to answer unless you are very familiar with chess programming.
Last minute question: how is the speed of a lookup into a 2D array measure up to using a 1D array and adding 16 * team_side to the normal index to lookup the piece?
edit: I thought I should add that I am valuing speed over almost all else in my chess implementation. Why else would I go with magic bitboards as opposed to simply arrays with slide data?
There is no standard answer to this, sorry.
The number and types of data structures you need depends on exactly what you want to do in your program. For example, having more than one representation of the pieces on the board makes some operations faster. On the other hand, it takes more time to update your data during each move.
To get the maximum speed, it is your job to find out what works best for your program. Does maintaining an extra array of pieces result in a net speedup for a program? It depends!
Sometimes it is a net gain to maintain a data structure, sometimes you can delay the calculations and cache the result, and sometimes you just calculate it when needed (and hope it isn't needed very often).

Randomness in Artificial Intelligence & Machine Learning

This question came to my mind while working on 2 projects in AI and ML. What If I'm building a model (e.g. Classification Neural Network,K-NN, .. etc) and this model uses some function that includes randomness. If I don't fix the seed, then I'm going to get different accuracy results every time I run the algorithm on the same training data. However, If I fix it then some other setting might give better results.
Is averaging a set of accuracies enough to say that the accuracy of this model is xx % ?
I'm not sure If this is the right place to ask such a question/open such a discussion.
Simple answer, yes, you randomize it and use statistics to show the accuracy. However, it's not sufficient to just average a handful of runs. You need, at a minimum, some notion of the variability as well. It's important to know whether "70%" accurate means "70% accurate for each of 100 runs" or "100% accurate once and 40% accurate once".
If you're just trying to play around a bit and convince yourself that some algorithm works, then you can just run it 30 or so times and look at the mean and standard deviation and call it a day. If you're going to convince anyone else that it works, you need to look into how to do more formal hypothesis testing.
There are models which are naturally dependent on randomness (e.g., random forests) and models which only use randomness as part of exploring the space (e.g., initialisation of values for neural networks), but actually have a well-defined, deterministic, objective function.
For the first case, you will want to use multiple seeds and report average accuracy, std. deviation, and the minimum you obtained. It is often good if you have a way to reproduce this, so just use multiple fixed seeds.
For the second case, you can always tell, just on the training data, which run is best (although it might actually not be the one which gives you the best test accuracy!). Thus, if you have the time, it is good to do say, 10 runs, and then evaluate on the one with the best training error (or validation error, just never evaluate on testing for this decision). You can go a level up and do multiple multiple runs and get a standard deviation too. However, if you find that this is significant, it probably means you weren't trying enough initialisations or that you are not using the right model for your data.
Stochastic techniques are typically used to search very large solution spaces where exhaustive search is not feasible. So it's almost inevitable that you will be trying to iterate over a large number of sample points with as even a distribution as possible. As mentioned elsewhere, basic statistical techniques will help you determine when your sample is big enough to be representative of the space as a whole.
To test accuracy, it is a good idea to set aside a portion of your input patterns and avoid training against those patterns (assuming you are learning from a data set). Then you can use the set to test whether your algorithm is learning the underlying pattern correctly, or whether it's simply memorizing the examples.
Another thing to think about is the randomness of your random number generator. Standard random number generators (such as rand from <stdlib.h>) may not make the grade in many cases so look around for a more robust algorithm.
I generalize the answer from what i get of your question,
I suppose Accuracy is always average accuracy of multiple runs and the standard deviation. So if you are considering accuracy you get using different seeds to the random generator, are you not actually considering a greater range of input (which should be a good thing). But you have to consider the Standard deviation to consider the accuracy. Or did i get your question it totally wrong ?
I believe cross-validation may give you what you ask about: an averaged, and therefore more reliable, estimate of classification performance. It contains no randomness, except in permuting the data set initially. The variation comes from choosing different train/test splits.

general question about solving problems with parallelisation

i have a general question about programming of parallel algorithms in C. Lets assume that our task is to implement some matrix algorithms with MPI and/or OpenMP. There are some situations, like false sharing in OpenMP or in MPI where the communications arise in dependence of the matrix dimension (columns cyclic distrubuted among processes), which cause some problems . Would it be a good and a common attempt to solve this situations by, for example, transposing the matrix, because this would reduce the necessary communications or even avoiding the false sharing problem? After that you would undo the transposition. Of course, assuming that this would lead to a much better speed up.
I dont think that this would be very cunning and more of a lazy way to do this. But im curious to read some opions about this.
Let's start with the first question first: can it make sense to transpose? The answer is, it depends, and you can estimate whether it will improve things or not.
The transposition/retransposition with impose a one-time memory bandwidth cost of 2*(going through memory the fast way) + 2*(going through memory the slow way) where those memory operations are literally memory operations in the multicore case, or network communications in the distributed memory case. You're going to be reading a matrix in the fast way and putting it into memory the slow way. (You can make this, essentially, 4*(going through memory the fast way) by reading the matrix in one cache-sized block at a time, transposing in cache, and writing out in order).
Whether or not that's a win or not depends on how many times you'll be accessing the array. If you would have been hitting the entire non-transposed array 4 times with memory access in the "wrong" direction, then you will clearly win by doing the two transposes. If you'd only be going through the non-transposed array once in the wrong direction, then you almost certainly won't win by doing the transposition.
As to the larger question, #AlexandreC is absolutely right here -- trying to implement your own linear algebra routines is madness. Take a look at, eg, How To Write Fast Numerical Code, figure 3; there can be factors of 40 in performance between naive and highly-tuned (say) GEMM operations. These things are highly memory-bandwidth limited, and in parallel that means network limited. By far, best is to use existing tools.
For multicore linear algebra, existing libraries include
Atlas
Plasma
Flame
For MPI implementations, there are
BLACS
Scalapack
or complete solver environments like
PETSc
Trilinos
I don't know that you'd throw the transpose away the second that you completed the operation, but yes this is a valid mechanism to increase parallelism.
I am not an expert; I've only read a little bit about this topic, and even that was for SIMD architectures, so please take my opinion lightly... but I think the usual mechanism is to lay your structures out in memory to match the machine (so you'd transpose a large matrix to line up better with your vectors and increase the dependency distance in your loops), and then you also build an indexing structure of pointers around that so that you can quickly access individual elements in the transpose differently. This gets more difficult to do as your input changes more dynamically.
I dont think that this would be very cunning and more of a lazy way to do this.
Lazy solutions are usually better than "cunning" ones, because they tend to be more simple and straightforward. They're therefore easier to implement, document, understand and maintain. Indeed, laziness is arguably one of the greatest virtues a programmer can have. As long as the program produces correct results at acceptable speeds, nobody should care how elegantly you solved the problem (including you).

Resources