Writing "Power" Efficient Code [duplicate] - mobile

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Power Efficient Software Coding
Adobe announced at Google I/O that it's next version of Flash 10.1 is going to more efficient for devices where power consumption matters.
This got me to thinking: how do you write code that uses less power? Are there any helpful resources regarding this topic?
My guess would be that it is a combination of:
reducing the complexity of your application
writing efficient code that is executed quickly (presumably because processing time = power consumed)

There's actually one much bigger way to reduce power consumption that hasn't been touched on.
Let's take a computer and divide all functions into two basic groups. Those implemented in hardware and those implemented in software.
If a function is implemented in hardware (that is- there is circuitry for which you can put the inputs on one set of wires and the outputs come out another set of wires) then the power consumption is equal to the power consumed in the total number of gates. The clock ticks one time (draining a little power) and the bus goes hot for the output (draining a little power).
If a function is implemented in software (that is- there is no single circuit which is used to implement the function) then it requires the use of multiple circuits, multiple clock cycles, and often-times lots of memory calls. Keep in mind that SRAM (used for processor registers) is made of D flip-flops which are constantly draining power so long as they are in use.
As a simple example, let's look at the H.264 encoder. H.264 is a video encoding used by QuickTime videos. It's also used in MPEG videos, many AVIs, and it's used by Skype. Because it's so common someone sat down and found a way to make a chip in hardware to which you feed the encoded file on one end and the red, green, and blue video channels come out the other end.
Before this chip existed (and before Flash 10.1) you had to decode this using software. Decoding it involves lots of sines and cosines. Sine and cosine are transcendental functions (that is- there is no way to write them in the four basic math operations without an infinite series). This means that the best you could do what run a loop 32-64 times getting gradually more accurate, with each iteration of the loop adding, multiplying, and dividing. Each iteration of the loop also moves values in and out of registers (which- as you recall, uses power).
Flash used to decode video by mathematically decoding it in software. Now it just says "pass the video to the H.264 chip". Of course it also has to check for the existence of this chip and use software if it doesn't exist. This means Flash, as a whole, is now larger. But one any system (like HTC phones) with an H.264 chip, it now uses less power.
Apply this same logic for:
Multiplying (adding multiple times in software)
Modulus (an infinite series in software)
Comparing (subtracting and checking if negative in software)
Drawing (sines/cosines/nastiness in software. Easy to pass to a videocard)

Seeing as though this is probably aimed towards embedded devices, I would venture to say that the best way to save power is to not be on, and to minimize how long the device is on. This means putting the processor to sleep and waking up only when work needs to be done. The best way I can think of to do this would to make an application entirely interrupt-driven.

In addition to Kevin's suggestion, I would think that minimizing Internet communications would help. This would include fetching data in bulk so more time can be spent asleep.

Also keep in mind that accessing devices like drives and wifi increases power consumption. Try to minimize access to such devices.

Related

Benchmarking microcontrollers

currently I am working on setting up benchmark between microcontrollers (based on Powerpc). So I would like to know, if anyone can provide me some documentation showing in detail, what factors are most important to be considered for benchmarking?
In other words I am looking for documentation which provides detailed information about factors that should be considered for enhancement in the performance of
Core
Peripherals,
Memory banks
Plus, if someone could provide algorithms that will be lot helpful.
There is only one useful way and that is to write your application for both and time your application. Benchmarks are for the most part bogus there are too many factors and it is quite trivial to craft a benchmark that takes advantage of the differences, or even takes advantage of the common features in a way to make two things look different.
I perform this stunt on a regular basis, most recently this code
.globl ASMDELAY
ASMDELAY:
subs r0,r0,#1
bne ASMDELAY
bx lr
Run on a raspberry pi (bare metal) the same raspberry pi not comparing two just comparing it to itself, clearly assembly so not even taking into account compiler features/tricks that you can encode in the benchmark intentionally or accidentally. Two of those three instructions matter for benchmarking purposes, have the loop run many tens of thousands of times I think I used 0x100000. The punchline to that performance was those two instructions in a loop ran as fast as 93662 timer ticks and as slow as 4063837 timer ticks for 0x10000 loops. Certainly i cache and branch prediction were turned on and off for various tests. But even with both branch prediction on and the i cache on, these two instructions will vary in speed depending on where they lie within the fetch line and the cache line.
A microcontroller makes this considerably worse depending on what you are comparing, some have flashes that can use the same wait state for a wide range of clock speeds, some are speed limited and for every N Mhz you have to add another wait state, so depending on where you set your clock it affects performance across that range and definitely just below and just above the boundary where you add a wait state (24Mhz minus a smidge and 24Mhz with an extra wait state if it was from 2-3 wait states then fetching just got 50% slower 36Mhz minus a smidge it may still be at the 3 wait states but 3 wait states at 36minus a smidge is faster than 24mhz 3 wait states). if you run the same code in sram vs flash for those platforms there usually isnt a wait state issue the sram can usually match the cpu clock and so that code at any speed may be faster than the same code run from flash.
If you are comparing two microcontrollers from the same vendor and family then it is usually pointless, the internals are the same they usually just vary by how many, how many flash banks how many sram banks how many uarts, how many timers, how many pins, etc.
One of my points being if you dont know the nuances of the overall architecture, you can possibly make the same code you are running now on the same board a few percent to tens of times faster by simply understanding how things work. Enabling features you didnt know where there, proper alignment of the code that is exercised often (simply re-arranging your functions within a C file can/will affect performance) adding one or more nops in the bootstrap to change the alignment of the whole program can and will change performance.
Then you get into compiler differences and compiler options, you can play with those and also get some to several to dozens of times improvement (or loss).
So at the end of the day the only thing that matters is I have an application it is the final binary and how fast does it run on A, then I ported that application and the final binary for B is done and how fast does it run there. Everything else can be manipulated, the results cant be trusted.

Musical instrument automated teaching via microcontroller

The premise of the project will be:
There will be a prerecorded track of guitar, for example. The student will play the same track on his guitar. I need to compare these two sounds and find out whether the student played it good or not. I will be using STM32 microcontroller and Keil uVision software for simulation at first (programming at C).
I know that I will be using an ADC using DMA and I assume I would Fast Fourier Transform the wave signals and then somehow compare the two frequency responses. Also, would there be a problem with tempo? I mean it is not logical that every note will hit on the exact ms and then compare it
I've seen some methods like Hidden Markov Model or Goertzel algorithm but I am not quite sure what they do and if they are optimal and easy for the project. So my question would be: is there a specific algorithm that suits best and how would I implement it on my code (since I haven't really started working on code, mostly theoretical reading so far).
edit: I've made a similar post yesterday but my premise was too complicated to solve so I am posting on a new premise, much easier to accomplish. I thought not to ask on the first thread since it would mix up two different issues.
Assuming that you can use FFT to find out which notes are playing at what time (this may prove to be difficult for distorted guitar chords), you can do this e.g. 10 times per second for both streams, and then check how often the notes in both streams match. This will give you a percentage, if you need a binary value you'd have to use a threshold value.
If both streams are not equal length (different tempo) then you will have to stretch. You don't have to stretch the actual audio, just the times between the note measurements (e.g. every 100 ms for the first stream and every 125 seconds for the second stream).
So the biggest problem may be to find out what notes are playing at any given moment in time.
I'd start with constructing a mapping of frequencies to notes. Also it may be a good idea to low-pass filter the signal at around 1100 Hz to already get rid of some of the unwanted harmonics (you can't play higher than that on the guitar anyway) and similaryly high-pass filter the signal at 80 Hz. Then after the FFT or DFT (not sure if it matters which you choose), find the frequencies that are close to the real note frequencies. Then pick the loudest one and those that are above a certain threshold relative to the loudest one (e.g. drop anything that is less than half as loud as the loudest one, but some experimentation will be needed to find a good threshold value).

Is it possible to programmatically edit a sound file based on frequency?

Just wondering if it's possible to go through a flac, mp3, wav, etc file and edit portions, or the entire file by removing sections based on a specific frequency range?
So for example, I have a recording of a friend reciting a poem with a few percussion instruments in the background. Could I write a C program that goes through the entire file and cuts out everything except the vocals (human voice frequency ranges from 85-255 Hz, from what I've been reading)?
Thanks in advance for any ideas!
To address the OP's specific example: I think your understanding of human voice frequency is wrong. Perhaps the fundamental frequency of male spoken voice stays in that range (for tenor singing, or female speech or singing, or shouting, even the fundamental will go much higher, maybe 500-1000 Hz). But that doesn't even matter, because even if the fundamental is low, the overtones which create the different vowel sounds will go up to 2000-4000 Hz or higher. And the frequencies which define "noise" consonants like "t" and "s" go all the way to the top of the audio range, say 5000-10000 Hz. Percussion fills this same audio range, so I doubt that you can separate voice and percussion by filtering certain frequencies in or out.
It is certainly possible, otherwise digital studio mixing software wouldn't exist.
What your'e effectively asking for is to attenuate frequency ranges across an entire file. In analog land, you would apply a low-pass and a high-pass filter (or some other combination of filters) to attenuate the frequencies.
In software, you'd solve this problem by writing a digital filter of sorts that would reduce the output of various frequencies. Frequencies would be identified via an FFT computation.
The fastest thing to do would be to use an audio editing app and apply the changes there.
There is an audio library called PortAudio that may provide some support for editing an audio stream at the numerical level. It is written in C, and has a C API.
If you want to test out audio processing algorithms I strongly suggest Supercollider. It's free and has many kinds of audio filter built in. But eliminating voice could require considerable tweaking. Supercollider will allow you to write code driven by various parameters and then hook those parameters up to a GUI which you'll be able to tweak while supplying it with live (or recorded) data.
Even if you want to write C code, you'll learn a lot from using Supercollider first. Many of the filters are surprisingly easy to implement in C but you'll need to write a certain amount of framework code before you can get started.
Additionally, I learnt quite a bit about writing digital audio filters from this book. Among other things, it discusses some of the characteristics of human speech, as well as how to build filters to selectively enhance or knock out particular frequencies. It also provides working C code.
SciPy can do all sorts of signal processing.
You can also use MAX/MSP (but that's paid) or PureData (that's free) for working with music algorithms , they are the basis from which supercollider was created. And are excellent softwares if you want to do that on real-time envirollments.

1k of Program Space, 64 bytes of RAM. Is 1 wire communication possible?

(If your lazy see bottom for TL;DR)
Hello, I am planning to build a new (prototype) project dealing with physical computing. Basically, I have wires. These wires all need to have their voltage read at the same time. More than a few hundred microseconds difference between the readings of each wire will completely screw it up. The Arduino takes about 114 microseconds. So the most I could read is 2 or 3 wires before the latency would skew the accuracy of the readings.
So my plan is to have an Arduino as the "master" of an array of ATTinys. The arduino is pretty cramped for space, but it's a massive playground compared to the tinys. An ATTiny13A has 1k of flash ROM(program space), 64 bytes of RAM, and 64 bytes of (not-durable and slow) EEPROM. (I'm choosing this for price as well as size)
The ATTinys in my system will not do much. Basically, all they will do is wait for a signal from the Master, and then read the voltage of 1 or 2 wires and store it in RAM(or possibly EEPROM if it's that cramped). And then send it to the Master using only 1 wire for data.(no room for more than that!).
So far then, all I should have to do is implement trivial voltage reading code (using built in ADC). But this communication bit I'm worried about. Do you think a communication protocol(using just 1 wire!) could even be implemented in such constraints?
TL;DR: In less than 1k of program space and 64 bytes of RAM(and 64 bytes of EEPROM) do you think it is possible to implement a 1 wire communication protocol? Would I need to drop to assembly to make it fit?
I know that currently my Arduino programs linking to the Wiring library are over 8k, so I'm a bit concerned.
Since you only need to send data (which is simpler than receiving) and you can select your own protocol, it should not be a problem to fit the code in the available memory space.
I once created software for an industrial control panel that contained 8x14 segment LCD display, some LEDs, some buttons, a serial (I2C) EEPROM, and serial interface to the host. A 4 bit processor was used. The device did not have any serial interface, so both the RS232C interface and I2C bus had to be implemented in software. On top of that, there was Modbus protocol (which among other things requires CRC calculations some exact timing), and the application program.
The device had some 128 x 4 bits of RAM and 1kW, 2kW, 3kW or 4kW of ROM (10 bits per word). The size of the final program was about 1100 words, so it did not quite fit in the smallest device. I used Assembler, of course.
However, instead of using multiple microcontrollers, you could consider using a hardware solution.
You could use a sample and hold circuit. For that, you need an array of analog switches and capacitors and perhaps op-amps. Just issue a trigger to latch all the voltages into the capacitors. Then you can use as much time as you need to read the voltages with your master processor.
Update: Forgot to mention that there are ready-made sample-and-hold amplifiers that need very little or no external components. This is probably the easiest solution.
1k of program space should be plenty, considering that your protocol only needs to be complicated enough to send a single integer when tickled. Look into Manchester Encoding.
You can probably get away with using a C compiler that targets this architecture, but you'll have to create your own runtime environment and not rely on the one supplied with the compiler. That's doable, but I'm not sure if the additional work to essentially create your own mini-OS outweighs the productivity benefit of using C over assembler.
I've done embedded programming in similar constraints. I used Borland Turbo C (it was a long time ago) in the tiny model and obtained code that was hardly bulkier than I could have done in assembler, with a fraction of the effort. What I'm saying is: It's quite feasible and sensible to use C as a high level assembler.
Just like me, though, you will be facing the problem of providing C with a (tiny) runtime environment. Ideally, you will only need to set up the stack and a few registers. Also, you won't have room for the C library, so you will need to program any needed functions yourself.
Yes, probably, though if you know your compiler very well you might be able to get away with c.
What you could do, is use a compiler to emit any standalone functions you need based on c code, then glue them together with a little of your own. (You'll certainly have to do the c runtime setup yourself - stacks etc.)
You may consider upgrading to the ATTiny25. It is a more capable 8-pin AVR that includes Atmel's Universal Serial Interface. It is capable of doing 1-wire serial comms in hardware, given only a few bytes of software.
Why wouldn't you just use sample-and-hold hardware, rather than a pile of microcontrollers?
I designed a master-slave system recently using an AT90USB646 master and ATtiny85 slaves. Obviously I had a lot more memory to work with on the slaves, but what I wanted to share with you is this:
With regard to your communication protocol, bear in mind that the uncalibrated internal oscillator on the ATtiny13 has an accuracy of +/- 10%. This means you won't be able to use, e.g., RS-232 communications.
I used a variant of the Dallas 1-Wire protocol in my system. Including full support for slave enumeration etc., the C source code compiles into 1626 bytes.
Edit: Whoops, didn't realize the question is so old. Hopefully this may still be of some help.

Alternative Entropy Sources

Okay, I guess this is entirely subjective and whatnot, but I was thinking about entropy sources for random number generators. It goes that most generators are seeded with the current time, correct? Well, I was curious as to what other sources could be used to generate perfectly valid, random (The loose definition) numbers.
Would using multiple sources (Such as time + current HDD seek time [We're being fantastical here]) together create a "more random" number than a single source? What are the logical limits of the amount of sources? How much is really enough? Is the time chosen simply because it is convenient?
Excuse me if this sort of thing is not allowed, but I'm curious as to the theory behind the sources.
The Wikipedia article on Hardware random number generator's lists a couple of interesting sources for random numbers using physical properties.
My favorites:
A nuclear decay radiation source detected by a Geiger counter attached to a PC.
Photons travelling through a semi-transparent mirror. The mutually exclusive events (reflection — transmission) are detected and associated to "0" or "1" bit values respectively.
Thermal noise from a resistor, amplified to provide a random voltage source.
Avalanche noise generated from an avalanche diode. (How cool is that?)
Atmospheric noise, detected by a radio receiver attached to a PC
The problems section of the Wikipedia article also describes the fragility of a lot of these sources/sensors. Sensors almost always produce decreasingly random numbers as they age/degrade. These physical sources should be constantly checked by statistical tests which can analyze the generated data, ensuring the instruments haven't broken silently.
SGI once used photos of a lava lamp at various "glob phases" as the source for entropy, which eventually evolved into an open source random number generator called LavaRnd.
I use Random.ORG, they provide free random data from Atmospheric noise, that I use to periodically re-seed a Mersene-Twister RNG. Its about as random as you can get with no hardware dependencies.
Don't worry about a "good" seed for a random number generator. The statistical properties of the sequence do not depend on how the generator is seeded. There are other things, however. to worry about. See Pitfalls in Random Number Generation.
As for hardware random number generators, these physical sources have to be measured, and the measurement process has systematic errors. You might find "pseudo" random numbers to have higher quality than "real" random numbers.
Linux kernel uses device interrupt timing (mouse, keyboard, hard drives) to generate entropy. There's a nice article on Wikipedia on entropy.
Modern RNGs are both checked against correlations in nearby seeds and run several hundred iterations after the seeding. So, the unfortunately boring but true answer is that it really doesn't matter very much.
Generally speaking, using random physical processes have to be checked that they conform to a uniform distribution and are otherwise detrended.
In my opinion, it's often better to use a very well understood pseudo-random number generator.
I've used an encryption program that used the users mouse movement to generate random numbers. The only problem was that the program had to pause and ask the user to move the mouse around randomly for a few seconds to work properly which might not always be practical.
I found HotBits several years ago - the numbers are generated from radioactive decay, genuinely random numbers.
There are limits on how many numbers you can download a day, but it has always amused me to use these as really, really random seeds for RNG.
Some TPM (Trusted Platform Module) "chips" have a hardware RNG. Unfortunately, the (Broadcom) TPM in my Dell laptop lacks this feature, but many computers sold today come with a hardware RNG that uses truly unpredictable quantum mechanical processes. Intel has implemented the thermal noise variety.
Also, don't use the current time alone to seed an RNG for cryptographic purposes, or any application where unpredictability is important. Using a few low order bits from the time in conjunction with several other sources is probably okay.
A similar question may be useful to you.
Sorry I'm late to this discussion (what is it 3 1/2 years old now?), but I've a rekindled interest in PRN generation and alternate sources of entropy. Linux kernel developer Rusty Russell recently had a discussion on his blog on alternate sources of entropy (other than /dev/urandom).
But, I'm not all that impressed with his choices; a NIC's MAC address never changes (although it is unique from all others), and PID seems like too small a possible sample size.
I've dabbled with a Mersenne Twister (on my Linux box) which is seeded with the following algorithm. I'm asking for any comments/feedback if anyone's willing and interested:
Create an array buffer of 64 bits + 256 bits * number of /proc files below.
Place the time stamp counter (TSC) value in the first 64 bits of this buffer.
For each of the following /proc files, calculate the SHA256 sum:
/proc/meminfo
/proc/self/maps
/proc/self/smaps
/proc/interrupts
/proc/diskstats
/proc/self/stat
Place each 256-bit hash value into its own area of the array created in (1).
Create a SHA256 hash of this entire buffer. NOTE: I could (and probably should) use a different hash function completely independent of the SHA functions - this technique has been proposed as a "safeguard" against weak hash functions.
Now I have 256 bits of HOPEFULLY random (enough) entropy data to seed my Mersenne Twister. I use the above to populate the beginning of the MT Array (624 32-bit integers), and then initialize the remainder of that array with the MT author's code. Also, I could use a different hash function (e.g. SHA384, SHA512), but I'd need a different size array buffer (obviously).
The original Mersenne Twister code called for one single 32-bit seed, but I feel that's horribly inadequate. Running "merely" 2^32-1 different MTs in search of breaking the crypto is not beyond the realm of practical possibility in this day and age.
I'd love to read anyone's feedback on this. Criticism is more than welcome. I will defend my use of the /proc files as above because they're constantly changing (especially the /proc/self/* files, and the TSC always yields a different value (nanosecond [or better] resolution, IIRC). I've run Diehard tests on this (to the tune of several hundred billion bits), and it seems to be passing with flying colors. But that's probably more testament to the soundness of the Mersenne Twister as a PRNG than to how I'm seeding it.
Of course, these aren't totally impervious to someone hacking them, but I just don't see all of these (and SHA*) being hacked and broken to in my lifetime.
Some use keyboard input (timeouts between keystrokes), I heard of I think in a novel that radio static reception can be used - but of course that requires other hardware and software...
Noise on top of the Cosmic Microwave Background spectrum. Of course you must first remove some anisotropy, foreground objects, correlated detector noise, galaxy and local group velocities, polarizations etc. Many pitfalls remain.
Source of seed isn't that much important. More important is the pseudo numbers generator algorithm. However I've heard some time ago about generating seed for some bank operations. They took many factors together:
time
processor temperature
fan speed
cpu voltage
I don't remember more :)
Even if some of these parameters doesn't change much in time, you can put them into some good hashing function.
How to generate good random number?
Maybe we can take into account inifinite number of universes? If this is true, that all the time new parallel universes are being created, we can do something like this:
int Random() {
return Universe.object_id % MAX_INT;
}
In every moment we should be on another branch of parallel universes, so we should have different id. The only problem is how to get Universe object :)
How about spinning off a thread that will manipulate some variable in a tight loop for a fixed amount of time before it is killed. What you end up with will depend on the processor speed, system load, etc... Very hokey, but better than just srand(time(NULL))...
Don't worry about a "good" seed for a random number generator. The statistical properties of the sequence do not depend on how the generator is seeded.
I disagree with John D. Cook's advice. If you seed the Mersenne Twister with all bits set to zero except one, it will initially generate numbers which are anything but random. It takes a long time for the generator to churn this state into anything that would pass statistical tests. Simply setting the first 32 bits of the generator to a seed will have a similar effect. Also, if the entire state is set to zero the generator will produce endless zeroes.
Properly written RNG code will have a properly written seeding algorithm that accepts say a 64 bit value and seeds the generator so it will produce decent random numbers for each possible input. So if you are using a reliable library then any seed will do. But if you hack together your own implementation then you need to be careful.

Resources