Memristors vs neural network nodes, what's the difference? - artificial-intelligence

Knowing very little about AI, just a little puzzled by the claim that memristors may finally lead to AI that will equal, and likely pass, the power of the human brain.
So, what's the difference between memristors (hardware) vs neural network nodes (software)?
Very possible the two are complete non-related, but given that it's my understanding is neural networks are used to simulate "bio neural networks" seems to me that memristors are just the silco version of the bio version that is emulated by neural networks.
Reason I ask, is because if they're very close or the same in concept (meaning they only differ in implementation) no idea how one could make the claim that memristors will close the gap on AI.

neural networks are able to form new connections. Hardware cannot do this.
Memristors are much more useful for creating fast non-volatile memory. In the future there won't be RAM and storage, but a unified memory.

Its intended to move away from the von neumann architecture; a memory node can be a compute node. Yes, a fundamental shift. It also allows other logic operations which enables AI from a different perspective.

Memristors allow in-memory computing, e.g., they can accelerate matrix-vector multiplication (MVM), bitwise operations and search operations. Also, memristor is a high-density non-volatile memory.
Since computation pattern of convolution layer is MVM and the fully-connected layer is memory-intensive, memristors can accelerate both convolution and fully-connected layers. As such, memristors can allow efficient hardware acceleration of convolution neural networks and other variants of neural networks.
However, current implementations of memristors also have several issues, such as poor reliability, hard errors. See my survey paper for more details on the use of memristors for accelerating neural networks and the discussion on current challenges.

Related

Software SPI Implementation

I'm thinking about creating my own pure-C software SPI library because there are none available (as far as I can tell).
Which also worries me - why aren't there any software SPI libraries? Is there some hardware limitation I'm not considering?
EDIT:
I've decided to write my own library due to how buggy the SPI peripheral is in STM32. Especially in 8 bit mode, but I've also had a lot of problems with 16 bit mode. Many other issues I didn't even bother documenting.
I've now written the software implementation (it is pretty easy) and in works just fine.
why aren't there any software SPI libraries?
Because it's about 10 lines of code each for the WriteByte and ReadByte functions, and most of that is bit banging processor-specific registers. The higher level protocol depends on the device connected to the SPI. Here's what wikipedia has to say on the subject
The SPI bus is a de facto standard. However, the lack of a formal
standard is reflected in a wide variety of protocol options. Different
word sizes are common. Every device defines its own protocol,
including whether or not it supports commands at all. Some devices are
transmit-only; others are receive-only. Chip selects are sometimes
active-high rather than active-low. Some protocols send the least
significant bit first.
So there's really no point in making a library. You just write the code for each specific situation, and combination of devices.
Although the others have answered that its just bitbanging; I would argue that there is benefit to writing a small layer:
If you don't use a HAL, or the Standard Libs (like myself) you can write a layer to deal with initialization which can do things as per the initialisation sequence of the chip.
You can map all your interrupt vectors for a specific peripheral in this layer utilising callback mechanisms
Create separation between application and system domains which is a core principle for modular design
Increase of code reusability utilising techniques such as function pointers and common interfaces
Add input validation on settings/parameters that would otherwise cause code duplication if no layer was used. For example, ensuring that the HCLK does not exceed 180MHz on the stm32f429 when initialising it.
Whilst it is true that to send data generally all you need to do is set a register, but more often that not the initialisation sequence is complicated.
With the increase in power and capacity in microcontrollers along with project size, its important to implement balanced design choices that promote scalability and maintainability - especially in commercial projects.
If you are asking for microcontrollers then you can have your own SPI library.
You need to use bit-banging technique for that.
There are software SPI libraries available.
As every microcontroller have different PORT architecture and registers those are not generic and they are specific for that controller only.
e.g. For 8051 architecture you can find this.

Recommendations for artificial neural network simulation basic software

I have a single threshold unit truth table to simulate for 100 epochs, with my input of the learning curve, target, activation, bias and weight to detect errors. Is there some software which can do this?
If you have neural network toolbox on Matlab you can use it.
You can design your network on it and feed learning curve via input and set target and bias or other parameters then run learning for 100 epochs.
SciLab is an open-source software and is the best free option instead of Matlab. You can use Neural Network Toolbox of it for your purpose.

what is the principle of low latency in trading application?

It seems that all the major investment banks use C++ in Unix (Linux, Solaris) for their low latency/high frequency server applications. How do people achieve low latency in trading high frequency equity? Any book teach how to achieve this?
http://g-wan.com/
http://www.zeromq.org
http://www.quantnet.com/forum/threads/low-latency-trading-system.3163/
check these websites you may get some idea about low latency programming.
but be aware that some codes may not be standardized. Which means it will not for longer time
it may creates some bugs in it.

Raspberry Pi cluster, neuron networks and brain simulation

Since the RBPI (Raspberry Pi) has very low power consumption and very low production price, it means one could build a very big cluster with those. I'm not sure, but a cluster of 100000 RBPI would take little power and little room.
Now I think it might not be as powerful as existing supercomputers in terms of FLOPS or others sorts of computing measurements, but could it allow better neuronal network simulation ?
I'm not sure if saying "1 CPU = 1 neuron" is a reasonable statement, but it seems valid enough.
So does it mean such a cluster would more efficient for neuronal network simulation, since it's far more parallel than other classical clusters ?
Using Raspberry Pi itself doesn't solve the whole problem of building a massively parallel supercomputer: how to connect all your compute cores together efficiently is a really big problem, which is why supercomputers are specially designed, not just made of commodity parts. That said, research units are really beginning to look at ARM cores as a power-efficient way to bring compute power to bear on exactly this problem: for example, this project that aims to simulate the human brain with a million ARM cores.
http://www.zdnet.co.uk/news/emerging-tech/2011/07/08/million-core-arm-machine-aims-to-simulate-brain-40093356/ "Million-core ARM machine aims to simulate brain"
http://www.eetimes.com/electronics-news/4217840/Million-ARM-cores-brain-simulator "A million ARM cores to host brain simulator"
It's very specialist, bespoke hardware, but conceptually, it's not far from the network of Raspberry Pis you suggest. Don't forget that ARM cores have all the features that JohnB mentioned the Xeon has (Advanced SIMD instead of SSE, can do 64-bit calculations, overlap instructions, etc.), but sit at a very different MIPS-per-Watt sweet-spot: and you have different options for what features are included (if you don't want floating-point, just buy a chip without floating-point), so I can see why it's an appealing option, especially when you consider that power use is the biggest ongoing cost for a supercomputer.
Seems unlikely to be a good/cheap system to me.
Consider a modern xeon cpu. It has 8 cores running at 5 times the clock speed, so just on that basis can do 40 times as much work. Plus it has SSE which seems suited for this application and will let it calculate 4 things in parallel. So we're up to maybe 160 times as much work. Then it has multithreading, can do 64 bit calculations, overlap instructions etc. I would guess it would be at least 200 times faster for this kind of work.
Then finally, the results of at least 200 local "neurons" would be in local memory but on the raspberry pi network you'd have to communicate between 200 of them... Which would be very much slower.
I think the raspberry pi is great and certainly plan to get at least one :P But you're not going to build a cheap and fast network of them that will compete with a network of "real" computers :P
Anyway, the fastest hardware for this kind of thing is likely to be a graphics card GPU as it's designed to run many copies of a small program in parallel. Or just program an fpga with a few hundred copies of a "hardware" neuron.
GPU and FPU do this kind of think much better then a CPU, the Nvidia GPU's that support CDUA programming has in effect 100's of separate processing units. Or at least it can use the evolution of the pixel pipe lines (where the card could render mutiple pixels in parallel) to produce huge incresses in speed. CPU allows a few cores that can carry out reletivly complex steps. GPU allows 100's of threads that can carry out simply steps.
So for tasks where you haves simple threads things like a single GPU will out preform a cluster of beefy CPU. (or a stack of Raspberry pi's)
However for creating a cluster running some thing like "condor" Which cand be used for things like Disease out break modelling, where you are running the same mathematical model millions of times with varible starting points. (size of out break, wind direction, how infectious the disease is etc.. ) so thing like the Pi would be ideal. as you are general looking for a full blown CPU that can run standard code.http://research.cs.wisc.edu/condor/
some well known usage of this aproach are "Seti" or "folding at home" (search for aliens and cancer research)
A lot of universities have a cluster such as this so I can see some of them trying the approach of mutipl Raspberry Pi's
But for simulating nurons in a brain you require very low latency between the nodes they are special OS's and applications that make mutiply systems act as one. You also need special networks to link it togather to give latency between nodes in terms of < 1 millsecond.
http://en.wikipedia.org/wiki/InfiniBand
the Raspberry just will not manage this in any way.
So yes I think people will make clusters out of them and I think they will be very pleased. But I think more university's and small organisations. They arn't going to compete with the top supercomputers.
Saying that we are going to get a few and test them against our current nodes in our cluster to see how they compare with a desktop that has a duel core 3.2ghz CPU and cost £650! I reckon for that we could get 25 Raspberries and they will use much less power, so will be interesting to compare. This will be for disease outbreak modelling.
I am undertaking a large amount of neural network research in the area of chaotic time series prediction (with echo state networks).
Although I see using the raspberry PI's in this way will offer little to no benefit over say a strong cpu or a GPU, I have been using a raspberry PI to manage the distribution of simulation jobs to multiple machines. The processing power benefit of a large core will nail that possible on the raspberry PI, not just that but running multiple PI's in this configuration will generate large overheads of waiting for them to sync, data transfer etc.
Due to the low cost and robustness of the PI, i have it hosting the source of the network data, as well as mediating the jobs to the Agent machines. It can also hard reset and restart a machine if a simulation fails taking the machine down with it allowing for optimal uptime.
Neural networks are expensive to train, but very cheap to run. While I would not recommend using these (even clustered) to iterate over a learning set for endless epochs, once you have the weights, you can transfer the learning effort into them.
Used in this way, one raspberry pi should be useful for much more than a single neuron. Given the ratio of memory to cpu, it will likely be memory bound in its scale. Assuming about 300 megs of free memory to work with (which will vary according to OS/drivers/etc.) and assuming you are working with 8 byte double precision weights, you will have an upper limit on the order of 5000 "neurons" (before becoming storage bound), although so many other factors can change this and it is like asking: "How long is a piece of string?"
Some engineers at Southampton University built a Raspberry Pi supercomputer:
I have ported a spiking network (see http://www.raspberrypi.org/phpBB3/viewtopic.php?f=37&t=57385&e=0 for details) to the Raspberry Pi and it runs about 24 times slower than on my old Pentium-M notebook from 2005 with SSE and prefetch optimizations.
It all depends on the type of computing you want to do. If you are doing very numerically intensive algorithms with not much memory movement between the processor caches and RAM memory then a GPU solution is indicated. The middle ground is an Intel PC chip using the SIMD assembly language instructions - you can still easily end up being limited by rate you can transfer data to and from RAM. For nearly the same cost you can get 50 ARM boards with say 4 cores per board and 2Gb RAM per board. That's 200 cores and 100 Gb of RAM. The amount of data that can be shuffled between the CPUs and RAM per second is very high. It could be a good option for neural nets that use large weight vectors. Also the latest ARM GPU's and the new nVidea ARM based chip (used in the slate tablet) have GPU compute as well.

Continuous time-dependent signal prediction

Which type of artificial neural network would you suggest to be able to make continuous time-dependant signal predictions? It should predict smallscale steps over very few signals up to very large scale steps with very many signals, possibly with less precision (abstraction by some kind of hierarchy?).
See:
Actually the system should learn and predict simultaneously.
I think a Spiking Neural Network, which is "third generation" and most similar to the neurons in our brain would do best.
It runs in real time, although I don't think it can learn in real-time.
Instead, you can just continually examine and overhaul it, running a couple seconds behind live input so you can adjust its answers, before it becomes good enough to go real-time.

Resources