STM32F Discovery - Undefined reference to arm_sin_f32 - c

I'm new to programming the STM32F Discovery board. I followed the instructions here and managed to get the blinky led light working.
But now I'm trying to play an audio tone for which I have borrowed code from here. In my Makefile I have included CFLAGS += -lm which is where I understand that arm_sin_f32 is defined.
This is the code for main.c:
#include "stm32f4xx.h"
#define ARM_MATH_CM4
#include <arm_math.h>
#include <math.h>
#include "speaker.h"
//Quick hack, approximately 1ms delay
void ms_delay(int ms)
while (ms-- > 0) {
volatile int x=5971;
while (x-- > 0)
volatile uint32_t msTicks = 0;
// SysTick Handler (every time the interrupt occurs, this is called)
void SysTick_Handler(void){ msTicks++; }
// initialize the system tick
void InitSystick(void){
// division occurs in terms of seconds... divide by 1000 to get ms, for example
if (SysTick_Config(SystemCoreClock / 10000)) { while (1); } //
update every 0.0001 s, aka 10kHz
//Flash orange LED at about 1hz
int main(void)
int16_t audio_sample;
int loudness = 250;
float audio_freq = 440;
audio_sample = (int16_t) (loudness * arm_sin_f32(audio_freq*msTicks/10000));
But when trying to run make I get the following error:
main.c:42: undefined reference to `arm_sin_f32'

By using -lm, you're linking to libc's math library, which for floating points provides you with
Function: double sin (double x)
Function: float sinf (float x)
Function: long double sinl (long double x)
Function: _FloatN sinfN (_FloatN x)
Function: _FloatNx sinfNx (_FloatNx x)
Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.
These functions return the sine of x, where x is given in radians. The return value is in the range -1 to 1.
You'll want to use sinf as you're using a float.
If you'd like to use arm_sin_f32, then you should link to CMSIS's dsp library.
float32_t arm_sin_f32 (float32_t x)
Fast approximation to the trigonometric sine function for floating-point
You should link to the appropriate precompiled library as detailed in: CMSIS DSP Software Library
The latest version of CMSIS at this moment is available at:
I don't think you should simply copy the c-files, as it will 'pollute' your own project and updating will be hard.
Simply download the latest release and to your makefile add:
CMSISPATH = "C:/path/to/cmsis/top/directory"
LDFLAGS += -L$(CMSISPATH)/CMSIS/Lib/GCC/ -larm_cortexM4lf_math

First of all the arm_sin_32 does not exist. arm_sin_f32 for example yes. There are more different ones as well. You need to add the appropriate c file from the CMSIS to your project for example: CMSIS/DSP/Source/FastMathFunctions/arm_sin_f32.c
I would suggest to do not use the one from the keil as it probably outdated - just download the most current version of the CMSIS from github.
arm_.... functions are not the part of the m library.
Do not use nop-s for the delay as they are instantly flushed out from the pipeline without the execution. They are used only for padding


Enabling HVX SIMD in Hexagon DSP by using instruction intrinsics

I was using Hexagon-SDK 3.0 to compile my sample application for HVX DSP architecture. There are many tools related to Hexagon-LLVM available to use located folder at:
I wrote a small example to calculate the product of two arrays to makes sure I can utilize the HVX hardware acceleration. However, when I generate my assembly, either with -S , or, with -S -emit-llvm I don't find any definition of HVX instructions such as vmem, vX, etc. My C application is executing on hexagon-sim for now till I manage to find a way to run in on the board as well.
As far as I understood, I need to define my HVX part of the code in C Intrinsics, but was not able to adapt the existing examples to match my own needs. It would be great if somebody could demonstrate how this process can be done. Also in the Hexagon V62 Programmer's Reference Manual many of the intrinsic instructions are not defined.
Here is my small app in pure C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#if defined(__hexagon__)
#include "hexagon_standalone.h"
#include "subsys.h"
#include "io.h"
#include "hvx.cfg.h"
#define KERNEL_SIZE 9
#define Q 8
#define PRECISION (1<<Q)
double vectors_dot_prod2(const double *x, const double *y, int n)
double res = 0.0;
int i = 0;
for (; i <= n-4; i+=4)
res += (x[i] * y[i] +
x[i+1] * y[i+1] +
x[i+2] * y[i+2] +
x[i+3] * y[i+3]);
for (; i < n; i++)
res += x[i] * y[i];
return res;
int main (int argc, char* argv[])
int n;
long long start_time, total_cycles;
/* -----------------------------------------------------*/
/* Allocate memory for input/output */
/* -----------------------------------------------------*/
//double *res = memalign(VLEN, 4 *sizeof(double));
const double *x = memalign(VLEN, n *sizeof(double));
const double *y = memalign(VLEN, n *sizeof(double));
if ( *x == NULL || *y == NULL ){
printf("Error: Could not allocate Memory for image\n");
return 1;
#if defined(__hexagon__)
#if LOG2VLEN == 7
/* -----------------------------------------------------*/
/* Call fuction */
/* -----------------------------------------------------*/
start_time = READ_PCYCLES();
total_cycles = READ_PCYCLES() - start_time;
printf("Array product of x[i] * y[i] = %f\n",vectors_dot_prod2(x,y,4));
#if defined(__hexagon__)
printf("AppReported (HVX%db-mode): Array product of x[i] * y[i] =%f\n", VLEN, vectors_dot_prod2(x,y,4));
return 0;
I compile it using hexagon-clang:
hexagon-clang -v -O2 -mv60 -mhvx-double -DLOG2VLEN=7 -I../../common/include -I../include -DQDSP6SS_PUB_BASE=0xFE200000 -o arrayProd.o -c arrayProd.c
Then link it with subsys.o (is found in DSK and already compiled) and -lhexagon to generate my executable:
hexagon-clang -O2 -mv60 -o arrayProd.exe arrayProd.o subsys.o -lhexagon
Finally, run it using the sim:
hexagon-sim -mv60 arrayProd.exe
A bit late, but might still be useful.
Hexagon Vector eXtensions are not emitted automatically and current instruction set (as of 8.0 SDK) only supports integer manipulation, so compiler will not emit anything for the C code containing "double" type (it is similar to SSE programming, you have to manually pack xmm registers and use SSE intrinsics to do what you need).
You need to define what your application really requires.
E.g., if you are writing something 3D-related and really need to calculate double (or float) dot products, you might convert yout floats to 16.16 fixed point and then use instructions (i.e., C intrinsics) like
Q6_Vw_vmpyio_VwVh and Q6_Vw_vmpye_VwVuh to emulate fixed-point multiplication.
To "enable" HVX you should use HVX-related types defined in
#include <hexagon_types.h>
#include <hexagon_protos.h>
The instructions like 'vmem' and 'vmemu' are emitted automatically for statements like
// I assume 64-byte mode, no `-mhvx-double`. For 128-byte mode use 32 int array
int values[16] = { 1, 2, 3, ..... };
/* The following line compiles to
r4 = __address_of_values
v1 = vmem(r4 + #0)
You can get the exact code by using '-S' switch, as you already do
HVX_Vector v = *(HVX_Vector*)values;
Your (fixed-point) version of dot_product may read out 16 integers at a time, multiply all 16 integers in a couple of instructions (see HVX62 programming manual, there is a tip to implement 32-bit integer multiplication from 16-bit one),
then shuffle/deal/ror data around and sum up rearranged vectors to get dot product (this way you may calculate 4 dot products almost at once and if you preload 4 HVX registers - that is 16 4D vectors - you may calculate 16 dot products in parallel).
If what you are doing is really just byte/int image processing, you might use specific 16-bit and 8-bit hardware dot products in Hexagon instruction set, instead of emulating doubles and floats.

2D array, prototype function and random numbers [duplicate]

I need a 'good' way to initialize the pseudo-random number generator in C++. I've found an article that states:
In order to generate random-like
numbers, srand is usually initialized
to some distinctive value, like those
related with the execution time. For
example, the value returned by the
function time (declared in header
ctime) is different each second, which
is distinctive enough for most
randoming needs.
Unixtime isn't distinctive enough for my application. What's a better way to initialize this? Bonus points if it's portable, but the code will primarily be running on Linux hosts.
I was thinking of doing some pid/unixtime math to get an int, or possibly reading data from /dev/urandom.
Yes, I am actually starting my application multiple times a second and I've run into collisions.
This is what I've used for small command line programs that can be run frequently (multiple times a second):
unsigned long seed = mix(clock(), time(NULL), getpid());
Where mix is:
// Robert Jenkins' 96 bit Mix Function
unsigned long mix(unsigned long a, unsigned long b, unsigned long c)
a=a-b; a=a-c; a=a^(c >> 13);
b=b-c; b=b-a; b=b^(a << 8);
c=c-a; c=c-b; c=c^(b >> 13);
a=a-b; a=a-c; a=a^(c >> 12);
b=b-c; b=b-a; b=b^(a << 16);
c=c-a; c=c-b; c=c^(b >> 5);
a=a-b; a=a-c; a=a^(c >> 3);
b=b-c; b=b-a; b=b^(a << 10);
c=c-a; c=c-b; c=c^(b >> 15);
return c;
The best answer is to use <random>. If you are using a pre C++11 version, you can look at the Boost random number stuff.
But if we are talking about rand() and srand()
The best simplest way is just to use time():
int main()
Be sure to do this at the beginning of your program, and not every time you call rand()!
Side Note:
NOTE: There is a discussion in the comments below about this being insecure (which is true, but ultimately not relevant (read on)). So an alternative is to seed from the random device /dev/random (or some other secure real(er) random number generator). BUT: Don't let this lull you into a false sense of security. This is rand() we are using. Even if you seed it with a brilliantly generated seed it is still predictable (if you have any value you can predict the full sequence of next values). This is only useful for generating "pseudo" random values.
If you want "secure" you should probably be using <random> (Though I would do some more reading on a security informed site). See the answer below as a starting point: for a better answer.
Secondary note: Using the random device actually solves the issues with starting multiple copies per second better than my original suggestion below (just not the security issue).
Back to the original story:
Every time you start up, time() will return a unique value (unless you start the application multiple times a second). In 32 bit systems, it will only repeat every 60 years or so.
I know you don't think time is unique enough but I find that hard to believe. But I have been known to be wrong.
If you are starting a lot of copies of your application simultaneously you could use a timer with a finer resolution. But then you run the risk of a shorter time period before the value repeats.
OK, so if you really think you are starting multiple applications a second.
Then use a finer grain on the timer.
int main()
struct timeval time;
// microsecond has 1 000 000
// Assuming you did not need quite that accuracy
// Also do not assume the system clock has that accuracy.
srand((time.tv_sec * 1000) + (time.tv_usec / 1000));
// The trouble here is that the seed will repeat every
// 24 days or so.
// If you use 100 (rather than 1000) the seed repeats every 248 days.
// Do not make the MISTAKE of using just the tv_usec
// This will mean your seed repeats every second.
if you need a better random number generator, don't use the libc rand. Instead just use something like /dev/random or /dev/urandom directly (read in an int directly from it or something like that).
The only real benefit of the libc rand is that given a seed, it is predictable which helps with debugging.
On windows:
provides a better seed than time() since its in milliseconds.
C++11 random_device
If you need reasonable quality then you should not be using rand() in the first place; you should use the <random> library. It provides lots of great functionality like a variety of engines for different quality/size/performance trade-offs, re-entrancy, and pre-defined distributions so you don't end up getting them wrong. It may even provide easy access to non-deterministic random data, (e.g., /dev/random), depending on your implementation.
#include <random>
#include <iostream>
int main() {
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 eng(seed);
std::uniform_int_distribution<> dist{1,100};
for (int i=0; i<50; ++i)
std::cout << dist(eng) << '\n';
eng is a source of randomness, here a built-in implementation of mersenne twister. We seed it using random_device, which in any decent implementation will be a non-determanistic RNG, and seed_seq to combine more than 32-bits of random data. For example in libc++ random_device accesses /dev/urandom by default (though you can give it another file to access instead).
Next we create a distribution such that, given a source of randomness, repeated calls to the distribution will produce a uniform distribution of ints from 1 to 100. Then we proceed to using the distribution repeatedly and printing the results.
Best way is to use another pseudorandom number generator.
Mersenne twister (and Wichmann-Hill) is my recommendation.
i suggest you see unix_random.c file in mozilla code. ( guess it is mozilla/security/freebl/ ...) it should be in freebl library.
there it uses system call info ( like pwd, netstat ....) to generate noise for the random number;it is written to support most of the platforms (which can gain me bonus point :D ).
The real question you must ask yourself is what randomness quality you need.
libc random is a LCG
The quality of randomness will be low whatever input you provide srand with.
If you simply need to make sure that different instances will have different initializations, you can mix process id (getpid), thread id and a timer. Mix the results with xor. Entropy should be sufficient for most applications.
Example :
struct timeb tp;
srand(static_cast<unsigned int>(getpid()) ^
static_cast<unsigned int>(pthread_self()) ^
static_cast<unsigned int >(tp.millitm));
For better random quality, use /dev/urandom. You can make the above code portable in using boost::thread and boost::date_time.
The c++11 version of the top voted post by Jonathan Wright:
#include <ctime>
#include <random>
#include <thread>
const auto time_seed = static_cast<size_t>(std::time(0));
const auto clock_seed = static_cast<size_t>(std::clock());
const size_t pid_seed =
std::seed_seq seed_value { time_seed, clock_seed, pid_seed };
// E.g seeding an engine with the above seed.
std::mt19937 gen;
#include <stdio.h>
#include <sys/time.h>
struct timeval tv;
printf("%d\n", tv.tv_usec);
return 0;
tv.tv_usec is in microseconds. This should be acceptable seed.
As long as your program is only running on Linux (and your program is an ELF executable), you are guaranteed that the kernel provides your process with a unique random seed in the ELF aux vector. The kernel gives you 16 random bytes, different for each process, which you can get with getauxval(AT_RANDOM). To use these for srand, use just an int of them, as such:
#include <sys/auxv.h>
void initrand(void)
unsigned int *seed;
seed = (unsigned int *)getauxval(AT_RANDOM);
It may be possible that this also translates to other ELF-based systems. I'm not sure what aux values are implemented on systems other than Linux.
Suppose you have a function with a signature like:
int foo(char *p);
An excellent source of entropy for a random seed is a hash of the following:
Full result of clock_gettime (seconds and nanoseconds) without throwing away the low bits - they're the most valuable.
The value of p, cast to uintptr_t.
The address of p, cast to uintptr_t.
At least the third, and possibly also the second, derive entropy from the system's ASLR, if available (the initial stack address, and thus current stack address, is somewhat random).
I would also avoid using rand/srand entirely, both for the sake of not touching global state, and so you can have more control over the PRNG that's used. But the above procedure is a good (and fairly portable) way to get some decent entropy without a lot of work, regardless of what PRNG you use.
For those using Visual Studio here's yet another way:
#include "stdafx.h"
#include <time.h>
#include <windows.h>
const __int64 DELTA_EPOCH_IN_MICROSECS= 11644473600000000;
struct timezone2
__int32 tz_minuteswest; /* minutes W of Greenwich */
bool tz_dsttime; /* type of dst correction */
struct timeval2 {
__int32 tv_sec; /* seconds */
__int32 tv_usec; /* microseconds */
int gettimeofday(struct timeval2 *tv/*in*/, struct timezone2 *tz/*in*/)
__int64 tmpres = 0;
int rez = 0;
ZeroMemory(&ft, sizeof(ft));
ZeroMemory(&tz_winapi, sizeof(tz_winapi));
tmpres = ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tv->tv_sec = (__int32)(tmpres * 0.000001);
tv->tv_usec = (tmpres % 1000000);
//_tzset(),don't work properly, so we use GetTimeZoneInformation
rez = GetTimeZoneInformation(&tz_winapi);
tz->tz_dsttime = (rez == 2) ? true : false;
tz->tz_minuteswest = tz_winapi.Bias + ((rez == 2) ? tz_winapi.DaylightBias : 0);
return 0;
int main(int argc, char** argv) {
struct timeval2 tv;
struct timezone2 tz;
ZeroMemory(&tv, sizeof(tv));
ZeroMemory(&tz, sizeof(tz));
gettimeofday(&tv, &tz);
unsigned long seed = tv.tv_sec ^ (tv.tv_usec << 12);
Maybe a bit overkill but works well for quick intervals. gettimeofday function found here.
Edit: upon further investigation rand_s might be a good alternative for Visual Studio, it's not just a safe rand(), it's totally different and doesn't use the seed from srand. I had presumed it was almost identical to rand just "safer".
To use rand_s just don't forget to #define _CRT_RAND_S before stdlib.h is included.
Assuming that the randomness of srand() + rand() is enough for your purposes, the trick is in selecting the best seed for srand. time(NULL) is a good starting point, but you'll run into problems if you start more than one instance of the program within the same second. Adding the pid (process id) is an improvement as different instances will get different pids. I would multiply the pid by a factor to spread them more.
But let's say you are using this for some embedded device and you have several in the same network. If they are all powered at once and you are launching the several instances of your program automatically at boot time, they may still get the same time and pid and all the devices will generate the same sequence of "random" numbers. In that case, you may want to add some unique identifier of each device (like the CPU serial number).
The proposed initialization would then be:
srand(time(NULL) + 1000 * getpid() + (uint) getCpuSerialNumber());
In a Linux machine (at least in the Raspberry Pi where I tested this), you can implement the following function to get the CPU Serial Number:
// Gets the CPU Serial Number as a 64 bit unsigned int. Returns 0 if not found.
uint64_t getCpuSerialNumber() {
FILE *f = fopen("/proc/cpuinfo", "r");
if (!f) {
return 0;
char line[256];
uint64_t serial = 0;
while (fgets(line, 256, f)) {
if (strncmp(line, "Serial", 6) == 0) {
serial = strtoull(strchr(line, ':') + 2, NULL, 16);
return serial;
Include the header at the top of your program, and write:
In your program before you declare your random number. Here is an example of a program that prints a random number between one and ten:
#include <iostream>
#include <iomanip>
using namespace std;
int main()
//Initialize srand
//Create random number
int n = rand() % 10 + 1;
//Print the number
cout << n << endl; //End the line
//The main function is an int, so it must return a value
return 0;

Implement sleep() in OpenCL C [duplicate]

This question already has an answer here:
Calculate run time of kernel code in OpenCL C
(1 answer)
Closed 7 years ago.
I want to measure the performance of different devices viz CPU and GPUs.
This is my kernel code:
__kernel void dataParallel(__global int* A)
int pnp;//pnp=probable next prime
int pprime;//previous prime
int i,j;
while((j<i) && A[j]<=sqrt((float)pnp))
However the sleep() function doesnt work. I am getting the following error in buildlog:
<kernel>:4:2: warning: implicit declaration of function 'sleep' is invalid in C99
builtins: link error: Linking globals named '__gpu_suld_1d_i8_trap': symbol multiply defined!
Is there any other way to implement the function. Also is there a way to record the time taken to execute this code snippet.
P.S. I have included #include <unistd.h> in my host code.
You dont need to use sleep in your kernel to measure the execution time.
There are two ways to measure the time.
1. Use opencl inherent profiling
look here: cl api
get timestamps in your hostcode and compare them before and after execution.
double start = getTimeInMS();
//The kernel starts here
clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &tasksize, &local_size_in, 0, NULL, NULL)
//wait for kernel execution
cout << "kernel execution time " << (getTimeInMS() - start) << endl;
Where getTimeinMs() is a function that returns a double value of miliseconds:
(windows specific, override with other implementation if you dont use windows)
static inline double getTimeInMS(){
return (double)st.wSecond * (double)1000 + (double)st.wMilliseconds;}
Also you want to:
#include <time.h>
For Mac it would be (could work on Linux as well, not sure):
static inline double getTime() {
struct timeval starttime;
gettimeofday(&starttime, 0x0);
return (double)starttime.tv_sec * (double)1000 + (double)starttime.tv_usec / (double)1000;}

Sample timestamp C

I'm trying to understand what is the best way to sample timestamps in a Mac OS X 64 bit environment, using the gcc compiler. I read about the TSC register in x86 architectures and HPET for Intel processors, but I can't find a guide to use them. Actually, I tried with the function gettimeofday() but I need the precision of nanosecond.
Can anyone lead me?
On OS X, you can use the mach_absolute_time function to get a high-precision timestamp:
#include <mach/mach_time.h>
#include <stdint.h>
/* get timer units */
mach_timebase_info_data_t info;
/* get timer value */
uint64_t ts = mach_absolute_time();
/* convert to nanoseconds */
ts *= info.numer;
ts /= info.denom;
Note that if you are trying to time something, you should perform the final nanosecond conversion on the difference between timestamps (the duration) to avoid overflow problems.

unable to link to gettimeofday on embedded system, elapsed time suggestions?

I am trying to use gettimeofday on an embedded ARM device, however it seems as though I am unable to use it:
gnychis#ubuntu:~/Documents/coexisyst/econotag_firmware$ make
Building for board: redbee-econotag
CC obj_redbee-econotag/econotag_coexisyst_firmware.o
LINK (romvars) econotag_coexisyst_firmware_redbee-econotag.elf
/home/gnychis/Documents/CodeSourcery/Sourcery_G++_Lite/bin/../lib/gcc/arm-none- eabi/4.3.2/../../../../arm-none-eabi/lib/libc.a(lib_a-gettimeofdayr.o): In function `_gettimeofday_r':
gettimeofdayr.c:(.text+0x1c): undefined reference to `_gettimeofday'
/home/gnychis/Documents/CodeSourcery/Sourcery_G++_Lite/bin/../lib/gcc/arm-none-eabi/4.3.2/../../../../arm-none-eabi/lib/libc.a(lib_a-sbrkr.o): In function `_sbrk_r':
sbrkr.c:(.text+0x18): undefined reference to `_sbrk'
collect2: ld returned 1 exit status
make[1]: *** [econotag_coexisyst_firmware_redbee-econotag.elf] Error 1
make: *** [mc1322x-default] Error 2
I am assuming I cannot use gettimeofday() ? Does anyone have any suggestions for being able to tell elapsed time? (e.g., 100ms)
What you need to do is create your own _gettimeofday() function to get it to link properly. This function could use the appropriate code to get the time for your processor, assuming you have a free-running system timer available.
#include <sys/time.h>
int _gettimeofday( struct timeval *tv, void *tzvp )
uint64_t t = __your_system_time_function_here__(); // get uptime in nanoseconds
tv->tv_sec = t / 1000000000; // convert to seconds
tv->tv_usec = ( t % 1000000000 ) / 1000; // get remaining microseconds
return 0; // return non-zero for error
} // end _gettimeofday()
What I usually do, is to have a timer running at 1khz, so it will generate an interrupt every millisecond, in the interrupt handler I increment a global var by one, say ms_ticks then do something like:
volatile unsigned int ms_ticks = 0;
void timer_isr() { //every ms
void delay(int ms) {
ms += ms_ticks;
while (ms > ms_ticks)
It is also possible to use this as a timestamp, so let's say I want to do something every 500ms:
last_action = ms_ticks;
while (1) { //app super loop
if (ms_ticks - last_action >= 500) {
last_action = ms_ticks;
//action code here
//rest of the code
Another alternative, since ARMs are 32bits and your timer will probably be a 32bits one, is to instead of generating a 1khz interrupt, you leave it free running and simply use the counter as your ms_ticks.
Use one of the timers in the chip...
It looks like you are using the Econotag which is based on the MC13224v from Freescale.
The MACA_CLK register provides a very good timebase (assuming the radio is running). You can also use the the RTC with CRM->RTC_COUNT. The RTC may or may not be very good depending on if you have an external 32kHz crystal or not (the econotag does NOT).
e.g. with MACA_CLK:
uint32_t t;
t = *MACA_CLK;
while (*MACA_CLK - t > SOMETIME);
See also the timer examples in libmc1322x:;a=blob;f=tests/tmr.c
Alternate methods are to use etimers or rtimers in Contiki (which has good support for the Econotag). (see )
I've done this before in one of my applications. Just use :
