CPU Temperature for Linux OS / Intel 64 bit Architecture - c

I have come across several post to read CPU temperature ad fan speed[ 1, 2], but could not find any post for the 64-bit i7 Intel architecture (quad core) using Linux OS. Can any one point to any article and/or source code that can read individual core temperature and possibly fan speed. I have been going through the performance counters in the intel architecture, I find Chapter 14 to describe the Thermal Monitors for the thermal status informations. Any sample C code to read these information/ registers will be of great help.

One common way is to read /sys/class/thermal/thermal_zone0/temp.
You can take a look at the source code of i3status which is written in C and is able to display the CPU temperature: print_cpu_temperature.c

Related

Where to start ARM Cortex-A programming

I have experience with Cortex-M controllers (LPC series from NXP) and Keil.
I want to move for cortex-A because my logic needs some better speed.
I found from internet that these processors will come with linux in it.
How can i use my code directly rather than using linux??
I don't need IO pins.
Where should i start?? What IDE should i use??
And i found debugging of Cortex-A controllers is tough because it is involving OS. is it true?
And is there any way without going for cortex A but achieving higher speeds (around Giga Hz)
By Cortex-M series, I suppose you have experience with M0 and M3. Right?
If you plan on using A-Series, you should know that they are more designed to run operating systems (than M-Series). (For example they have virtual memory management units...) That's why you may not find much bare-metal programming guides with these processors.
Also, these devices don't usually have on-board ROMs. So, you don't have an embedded flash... Therefore, you basically use an SD-Card or eMMC to boot them.
You may use Linux (Easier for you but won't be real-time), or an RTOS (also easier). If that doesn't suit you, you may use "UBoot" from SD-Card or eMMC and do a couple non-trivial steps (dependent on architecture) to run your bare-metal software (which is loaded from SD-Card or eMMC).
I suggest you buy a beagle bone and start from there.
You can still use Cortex-A for normal bare metal application adn with this way you will have something similair to what to what you had with application running on cortex-m
However it really depends from what you want:
if you want to understand how cortex-a is working or you are bringing
up a custom platform which is not that stable so bare metal coding is
your answer and with it you will be able learn a lot bout cortex-a
functionality
If you want to use Cortex-A from user point of view so you need to
compile your linux kernel for your cortex-a based board and start
using developing on top of your running kernel

ARM Architecture Initialization

In the case of x86 the same (real mode) bootloader works on virtually any x86 device.
Is that possible on ARM or do I need to create a specific bootloader for each 'cortex'?
x86 or lets say PC compatible systems are ... pc compatible. They support the ancient bios calls so that there is massive compatibility. by design, by the chip vendor (intel) the software vendors (bios, operating system) and the motherboard vendors.
ARM is in now way shape or form like that. There are instruction sets you can choose that work almost or all the way across, but remember ARM systems you buy an ARM core and add it to your special chip, you and your special/custom stuff, then that is put on one or more different boards. There is little to no compatibility. Instruction set and arm core is a small part of the whole picture most of the code is for the non-arm stuff.
u-boot and perhaps others are fairly massive bootloaders, pretty much an operating system themselves, and have to be ported just like an operating system to each chip/board combination. The chip vendor, if this is a linux compatible system, most likely has a reference design and a BSP including a u-boot port and/or some other solution (rasberry pi is a good example). it is fairly trivial to boot linux or used to be, there is no reason for the massively overcomplicated u-boot. without a DTB you setup a few memory locations a register or two and branch to the kernel, thats it (again look at the raspberry pi), I assume with DTB you build the dtb then put it somewhere, setup a few registers and branch to the linux kernel (raspberry pi? ntc chip?)
There is a Arm open source project that can cover Armv7/v8 Cortex-A processors bootloaders.
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/
Another open source project for Cortex-M processors:
https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/

Raspberry Pi cluster, neuron networks and brain simulation

Since the RBPI (Raspberry Pi) has very low power consumption and very low production price, it means one could build a very big cluster with those. I'm not sure, but a cluster of 100000 RBPI would take little power and little room.
Now I think it might not be as powerful as existing supercomputers in terms of FLOPS or others sorts of computing measurements, but could it allow better neuronal network simulation ?
I'm not sure if saying "1 CPU = 1 neuron" is a reasonable statement, but it seems valid enough.
So does it mean such a cluster would more efficient for neuronal network simulation, since it's far more parallel than other classical clusters ?
Using Raspberry Pi itself doesn't solve the whole problem of building a massively parallel supercomputer: how to connect all your compute cores together efficiently is a really big problem, which is why supercomputers are specially designed, not just made of commodity parts. That said, research units are really beginning to look at ARM cores as a power-efficient way to bring compute power to bear on exactly this problem: for example, this project that aims to simulate the human brain with a million ARM cores.
http://www.zdnet.co.uk/news/emerging-tech/2011/07/08/million-core-arm-machine-aims-to-simulate-brain-40093356/ "Million-core ARM machine aims to simulate brain"
http://www.eetimes.com/electronics-news/4217840/Million-ARM-cores-brain-simulator "A million ARM cores to host brain simulator"
It's very specialist, bespoke hardware, but conceptually, it's not far from the network of Raspberry Pis you suggest. Don't forget that ARM cores have all the features that JohnB mentioned the Xeon has (Advanced SIMD instead of SSE, can do 64-bit calculations, overlap instructions, etc.), but sit at a very different MIPS-per-Watt sweet-spot: and you have different options for what features are included (if you don't want floating-point, just buy a chip without floating-point), so I can see why it's an appealing option, especially when you consider that power use is the biggest ongoing cost for a supercomputer.
Seems unlikely to be a good/cheap system to me.
Consider a modern xeon cpu. It has 8 cores running at 5 times the clock speed, so just on that basis can do 40 times as much work. Plus it has SSE which seems suited for this application and will let it calculate 4 things in parallel. So we're up to maybe 160 times as much work. Then it has multithreading, can do 64 bit calculations, overlap instructions etc. I would guess it would be at least 200 times faster for this kind of work.
Then finally, the results of at least 200 local "neurons" would be in local memory but on the raspberry pi network you'd have to communicate between 200 of them... Which would be very much slower.
I think the raspberry pi is great and certainly plan to get at least one :P But you're not going to build a cheap and fast network of them that will compete with a network of "real" computers :P
Anyway, the fastest hardware for this kind of thing is likely to be a graphics card GPU as it's designed to run many copies of a small program in parallel. Or just program an fpga with a few hundred copies of a "hardware" neuron.
GPU and FPU do this kind of think much better then a CPU, the Nvidia GPU's that support CDUA programming has in effect 100's of separate processing units. Or at least it can use the evolution of the pixel pipe lines (where the card could render mutiple pixels in parallel) to produce huge incresses in speed. CPU allows a few cores that can carry out reletivly complex steps. GPU allows 100's of threads that can carry out simply steps.
So for tasks where you haves simple threads things like a single GPU will out preform a cluster of beefy CPU. (or a stack of Raspberry pi's)
However for creating a cluster running some thing like "condor" Which cand be used for things like Disease out break modelling, where you are running the same mathematical model millions of times with varible starting points. (size of out break, wind direction, how infectious the disease is etc.. ) so thing like the Pi would be ideal. as you are general looking for a full blown CPU that can run standard code.http://research.cs.wisc.edu/condor/
some well known usage of this aproach are "Seti" or "folding at home" (search for aliens and cancer research)
A lot of universities have a cluster such as this so I can see some of them trying the approach of mutipl Raspberry Pi's
But for simulating nurons in a brain you require very low latency between the nodes they are special OS's and applications that make mutiply systems act as one. You also need special networks to link it togather to give latency between nodes in terms of < 1 millsecond.
http://en.wikipedia.org/wiki/InfiniBand
the Raspberry just will not manage this in any way.
So yes I think people will make clusters out of them and I think they will be very pleased. But I think more university's and small organisations. They arn't going to compete with the top supercomputers.
Saying that we are going to get a few and test them against our current nodes in our cluster to see how they compare with a desktop that has a duel core 3.2ghz CPU and cost £650! I reckon for that we could get 25 Raspberries and they will use much less power, so will be interesting to compare. This will be for disease outbreak modelling.
I am undertaking a large amount of neural network research in the area of chaotic time series prediction (with echo state networks).
Although I see using the raspberry PI's in this way will offer little to no benefit over say a strong cpu or a GPU, I have been using a raspberry PI to manage the distribution of simulation jobs to multiple machines. The processing power benefit of a large core will nail that possible on the raspberry PI, not just that but running multiple PI's in this configuration will generate large overheads of waiting for them to sync, data transfer etc.
Due to the low cost and robustness of the PI, i have it hosting the source of the network data, as well as mediating the jobs to the Agent machines. It can also hard reset and restart a machine if a simulation fails taking the machine down with it allowing for optimal uptime.
Neural networks are expensive to train, but very cheap to run. While I would not recommend using these (even clustered) to iterate over a learning set for endless epochs, once you have the weights, you can transfer the learning effort into them.
Used in this way, one raspberry pi should be useful for much more than a single neuron. Given the ratio of memory to cpu, it will likely be memory bound in its scale. Assuming about 300 megs of free memory to work with (which will vary according to OS/drivers/etc.) and assuming you are working with 8 byte double precision weights, you will have an upper limit on the order of 5000 "neurons" (before becoming storage bound), although so many other factors can change this and it is like asking: "How long is a piece of string?"
Some engineers at Southampton University built a Raspberry Pi supercomputer:
I have ported a spiking network (see http://www.raspberrypi.org/phpBB3/viewtopic.php?f=37&t=57385&e=0 for details) to the Raspberry Pi and it runs about 24 times slower than on my old Pentium-M notebook from 2005 with SSE and prefetch optimizations.
It all depends on the type of computing you want to do. If you are doing very numerically intensive algorithms with not much memory movement between the processor caches and RAM memory then a GPU solution is indicated. The middle ground is an Intel PC chip using the SIMD assembly language instructions - you can still easily end up being limited by rate you can transfer data to and from RAM. For nearly the same cost you can get 50 ARM boards with say 4 cores per board and 2Gb RAM per board. That's 200 cores and 100 Gb of RAM. The amount of data that can be shuffled between the CPUs and RAM per second is very high. It could be a good option for neural nets that use large weight vectors. Also the latest ARM GPU's and the new nVidea ARM based chip (used in the slate tablet) have GPU compute as well.

Intel atom or ARM for heavy Signal processing workload

I would like to know which is a better (in performance) option :
To get a Intel Dual core atom based board
To get a Arm cortex A9 based board (pandaboard etc)
I would like to run some light version of linux and do some very cpu intensive
computations like Image/Video processing (maybe 3D later) and also process audio
on them. Of-course all floating point mathematics.
Definitely #2, Pandaboard is an OMAP4 platform.
OMAP4 contains not only the ARM Cortex A9 (which is not likely to compete on it's own with dual core Atom), but, and this is crucial, a full C674x DSP core, both floating and fixed point mathematics.
The embedded DSP core in OMAP4 is fully capable of handling 1080p H.264 decode, with some resources to spare. I'm yet to see an Atom platform capable of that.
(shameless plug - my company is using OMAP3 and evaluating OMAP4 for some of our niche markets, and we might be interested in assisting in yours as well)

Any open-source ARM7 emulators suitable for linking with C?

I have an open-source Atari 2600 emulator (Z26), and I'd like to add support for cartridges containing an embedded ARM processor (NXP 21xx family). The idea would be to simulate the 6507 until it tries to read or write a byte of memory (which it will do every 841ns). If the 6507 performs a write, put the address and data on some of the ARM's I/O ports and let the ARM code run 20 cycles, confirm that the ARM is floating its data bus, and let the ARM run for another 38 cycles. If the 6507 performs a read, put the address on the ARM's I/O ports, let the ARM run 38 cycles, grab the data from the ARM's I/O port (hopefully the ARM software will have put it there), and let the ARM run another 20 cycles.
The ARM7 seems pretty straightforward to implement; I don't need to simulate a whole lot of hardware features. Any thoughts?
Edit
What I have in mind would be a routine that would take as a parameter a struct holding the machine state and pointers to a memory access routine. When called, the routine would emulate the ARM's instruction engine, generating appropriate reads, writes, and code fetches. I could then write the memory access routine to regard appropriate areas as flash (with roughly-approximated wait states), RAM, I/O ports, and timer registers. Some other areas would be marked as don't-care, and accesses to any other areas would flag an error and stop the emulator.
Perhaps QEMU uses such a thing internally. Since the ARM emulation would be integrated into an already-existing emulation engine (which I didn't write and don't fully understand--the only parts of Z26 I've patched have been the memory read/write logic) I would need something with a fairly small footprint.
Any idea how QEMU works inside? Any idea what the GPL licence would require if I just use 2% of the code in QEMU--whether I'd have to bundle the code for the whole thing, or just the part that I use, or what?
Try QEMU.
With some work, you can make my emulator do what you want. It was written for ARM920, and the Thumb instruction set isn't done yet. Neither is the MMU/cache interface. Also, it's slow because it is an interpreter. On the bright side, it's all written in C99.
http://code.google.com/p/gp2xemu/
I haven't worked on it for a while (The svn trunk is 2 years old), but if you're going to use the code, I'll be glad to help you out with the missing features. It is licensed under MIT, so it's just the same as the broad BSD license.

Resources