I'm using ASF4 API hal_timer for a ARM Cortex M4. I'm using the timer driver to timing a data sequence.
Why does no reset function exist? I'm using the timer on a TIMER_TASK_ONE_SHOT mode and want to reset it when ever I need to.
I thought a simple
timer_start(&TIMER_0);
timer_stop(&TIMER_0);
would do the trick but does not seem to work.
Is it necessary to re-initialize the timer for each timing event?
I'm probably missing something obvious. Am I approaching this problem incorrectly reason being why the method timer_reset() doesn't exist?
I have no experience of this API, but looking at the documentation it is apparent that a single timer can have multiple tasks on different periods, so resetting TIMER_0 makes little semantic sense; rather you need to reset the individual timer task attached to the timer - of which there may be more than one.
From the documentation (which is poor and contains errors), and the source code which is more reliable:
timer_task_instance.time_label = TIMER_0.time ;
where the timer_task_instance is the struct timer_task instance you want to reset. This sets the start time to the current time.
Probably best to wrap that in a function:
// Restart current interval, return interval.
uint32_t timer_restart( struct timer_descriptor* desc, struct timer_task* tsk )
{
tsk->time_label = desc->time
return tsk->interval ;
}
Then:
timer_restart( &TIMER_0, &timer_task_instance ) ;
Assuming you're using the (edited) example from the ASF4 Reference Manual:
/* TIMER_0 example */
static struct timer_task TIMER_0_task;
static void TIMER_0_task_cb(const struct timer_task *const timer_task)
{
// task you want to delay using non-existent reset function.
}
void TIMER_0_example(void)
{
TIMER_0_task.interval = 100;
TIMER_0_task.cb = TIMER_0_task_cb;
TIMER_0_task.mode = TIMER_TASK_ONE_SHOT;
timer_add_task(&TIMER_0, &TIMER_0_task);
timer_start(&TIMER_0);
}
Instead of resetting, which isn't supported by the API, you could use:
timer_remove_task(&TIMER_0, &TIMER_0_task);
timer_add_task(&TIMER_0, &TIMER_0_task);
which will effectively restart the delay associated with TIMER_0_task.
Under the hood, timer tasks are maintained as an ordered list, in order of when each task will expire, and using the functions provided by the API maintains the list order.
I want to create a realtime sine generator using apples core audio framework. I want do do it low level so I can learn and understand the fundamentals.
I know that using PortAudio or Jack would probably be easier and I will use them at some point but I would like to get this to work first so I can be confident to understand the fundamentals.
I literally searched for days now on this topic but no one seems to have ever created a real time wave generator using core audio trying to optain low latency while using C and not Swift or Objective-C.
For this I use a project I set up a while ago. It was first designed to be a game. So after the Application starts up, it will enter a run loop. I thought this would perfectly fit as I can use the main loop to copy samples into the audio buffer and process rendering and input handling as well.
So far I get sound. Sometimes it works for a while then starts to glitch, sometimes it glitches right away.
This is my code. I tried to simplify if and only present the important parts.
I got multiple questions. They are located in the bottom section of this post.
Applications main run loop. This is where it all starts after the window is created and buffers and memory is initialized:
while (OSXIsGameRunning())
{
OSXProcessPendingMessages(&GameData);
[GlobalGLContext makeCurrentContext];
CGRect WindowFrame = [window frame];
CGRect ContentViewFrame = [[window contentView] frame];
CGPoint MouseLocationInScreen = [NSEvent mouseLocation];
BOOL MouseInWindowFlag = NSPointInRect(MouseLocationInScreen, WindowFrame);
CGPoint MouseLocationInView = {};
if (MouseInWindowFlag)
{
NSRect RectInWindow = [window convertRectFromScreen:NSMakeRect(MouseLocationInScreen.x, MouseLocationInScreen.y, 1, 1)];
NSPoint PointInWindow = RectInWindow.origin;
MouseLocationInView= [[window contentView] convertPoint:PointInWindow fromView:nil];
}
u32 MouseButtonMask = [NSEvent pressedMouseButtons];
OSXProcessFrameAndRunGameLogic(&GameData, ContentViewFrame,
MouseInWindowFlag, MouseLocationInView,
MouseButtonMask);
#if ENGINE_USE_VSYNC
[GlobalGLContext flushBuffer];
#else
glFlush();
#endif
}
Through using VSYNC I can throttle the loop down to 60 FPS. The timing is not super tight but it is quite steady. I also have some code to throttle it manually using mach timing which is even more imprecise. I left it out for readability.
Not using VSYNC or using mach timing to get 60 iterations a second also makes the audio glitch.
Timing log:
CyclesElapsed: 8154360866, TimeElapsed: 0.016624, FPS: 60.155666
CyclesElapsed: 8174382119, TimeElapsed: 0.020021, FPS: 49.946926
CyclesElapsed: 8189041370, TimeElapsed: 0.014659, FPS: 68.216309
CyclesElapsed: 8204363633, TimeElapsed: 0.015322, FPS: 65.264511
CyclesElapsed: 8221230959, TimeElapsed: 0.016867, FPS: 59.286217
CyclesElapsed: 8237971921, TimeElapsed: 0.016741, FPS: 59.733719
CyclesElapsed: 8254861722, TimeElapsed: 0.016890, FPS: 59.207333
CyclesElapsed: 8271667520, TimeElapsed: 0.016806, FPS: 59.503273
CyclesElapsed: 8292434135, TimeElapsed: 0.020767, FPS: 48.154209
What is important here is the function OSXProcessFrameAndRunGameLogic. It is called 60 times a second and it is passed a struct containing basic information like a buffer for rendering, keyboard state and a sound buffer which looks like this:
typedef struct osx_sound_output
{
game_sound_output_buffer SoundBuffer;
u32 SoundBufferSize;
s16* CoreAudioBuffer;
s16* ReadCursor;
s16* WriteCursor;
AudioStreamBasicDescription AudioDescriptor;
AudioUnit AudioUnit;
} osx_sound_output;
Where game_sound_output_buffer is:
typedef struct game_sound_output_buffer
{
real32 tSine;
int SamplesPerSecond;
int SampleCount;
int16 *Samples;
} game_sound_output_buffer;
These get set up before the application enters its run loop.
The size for the SoundBuffer itself is SamplesPerSecond * sizeof(uint16) * 2 where SamplesPerSecond = 48000.
So inside OSXProcessFrameAndRunGameLogic is the sound generation:
void OSXProcessFrameAndRunGameLogic(osx_game_data *GameData, CGRect WindowFrame,
b32 MouseInWindowFlag, CGPoint MouseLocation,
int MouseButtonMask)
{
GameData->SoundOutput.SoundBuffer.SampleCount = GameData->SoundOutput.SoundBuffer.SamplesPerSecond / GameData->TargetFramesPerSecond;
// Oszi 1
OutputTestSineWave(GameData, &GameData->SoundOutput.SoundBuffer, GameData->SynthesizerState.ToneHz);
int16* CurrentSample = GameData->SoundOutput.SoundBuffer.Samples;
for (int i = 0; i < GameData->SoundOutput.SoundBuffer.SampleCount; ++i)
{
*GameData->SoundOutput.WriteCursor++ = *CurrentSample++;
*GameData->SoundOutput.WriteCursor++ = *CurrentSample++;
if ((char*)GameData->SoundOutput.WriteCursor >= ((char*)GameData->SoundOutput.CoreAudioBuffer + GameData->SoundOutput.SoundBufferSize))
{
//printf("Write cursor wrapped!\n");
GameData->SoundOutput.WriteCursor = GameData->SoundOutput.CoreAudioBuffer;
}
}
}
Where OutputTestSineWave is the part where the buffer is actually filled with data:
void OutputTestSineWave(osx_game_data *GameData, game_sound_output_buffer *SoundBuffer, int ToneHz)
{
int16 ToneVolume = 3000;
int WavePeriod = SoundBuffer->SamplesPerSecond/ToneHz;
int16 *SampleOut = SoundBuffer->Samples;
for(int SampleIndex = 0;
SampleIndex < SoundBuffer->SampleCount;
++SampleIndex)
{
real32 SineValue = sinf(SoundBuffer->tSine);
int16 SampleValue = (int16)(SineValue * ToneVolume);
*SampleOut++ = SampleValue;
*SampleOut++ = SampleValue;
SoundBuffer->tSine += Tau32*1.0f/(real32)WavePeriod;
if(SoundBuffer->tSine > Tau32)
{
SoundBuffer->tSine -= Tau32;
}
}
}
So when the Buffers are created at start up also Core audio is initialized which I do like this:
void OSXInitCoreAudio(osx_sound_output* SoundOutput)
{
AudioComponentDescription acd;
acd.componentType = kAudioUnitType_Output;
acd.componentSubType = kAudioUnitSubType_DefaultOutput;
acd.componentManufacturer = kAudioUnitManufacturer_Apple;
AudioComponent outputComponent = AudioComponentFindNext(NULL, &acd);
AudioComponentInstanceNew(outputComponent, &SoundOutput->AudioUnit);
AudioUnitInitialize(SoundOutput->AudioUnit);
// uint16
//AudioStreamBasicDescription asbd;
SoundOutput->AudioDescriptor.mSampleRate = SoundOutput->SoundBuffer.SamplesPerSecond;
SoundOutput->AudioDescriptor.mFormatID = kAudioFormatLinearPCM;
SoundOutput->AudioDescriptor.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsNonInterleaved | kAudioFormatFlagIsPacked;
SoundOutput->AudioDescriptor.mFramesPerPacket = 1;
SoundOutput->AudioDescriptor.mChannelsPerFrame = 2; // Stereo
SoundOutput->AudioDescriptor.mBitsPerChannel = sizeof(int16) * 8;
SoundOutput->AudioDescriptor.mBytesPerFrame = sizeof(int16); // don't multiply by channel count with non-interleaved!
SoundOutput->AudioDescriptor.mBytesPerPacket = SoundOutput->AudioDescriptor.mFramesPerPacket * SoundOutput->AudioDescriptor.mBytesPerFrame;
AudioUnitSetProperty(SoundOutput->AudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&SoundOutput->AudioDescriptor,
sizeof(SoundOutput->AudioDescriptor));
AURenderCallbackStruct cb;
cb.inputProc = OSXAudioUnitCallback;
cb.inputProcRefCon = SoundOutput;
AudioUnitSetProperty(SoundOutput->AudioUnit,
kAudioUnitProperty_SetRenderCallback,
kAudioUnitScope_Global,
0,
&cb,
sizeof(cb));
AudioOutputUnitStart(SoundOutput->AudioUnit);
}
The initialization code for core audio sets the render callback to OSXAudioUnitCallback
OSStatus OSXAudioUnitCallback(void * inRefCon,
AudioUnitRenderActionFlags * ioActionFlags,
const AudioTimeStamp * inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList * ioData)
{
#pragma unused(ioActionFlags)
#pragma unused(inTimeStamp)
#pragma unused(inBusNumber)
//double currentPhase = *((double*)inRefCon);
osx_sound_output* SoundOutput = ((osx_sound_output*)inRefCon);
if (SoundOutput->ReadCursor == SoundOutput->WriteCursor)
{
SoundOutput->SoundBuffer.SampleCount = 0;
//printf("AudioCallback: No Samples Yet!\n");
}
//printf("AudioCallback: SampleCount = %d\n", SoundOutput->SoundBuffer.SampleCount);
int SampleCount = inNumberFrames;
if (SoundOutput->SoundBuffer.SampleCount < inNumberFrames)
{
SampleCount = SoundOutput->SoundBuffer.SampleCount;
}
int16* outputBufferL = (int16 *)ioData->mBuffers[0].mData;
int16* outputBufferR = (int16 *)ioData->mBuffers[1].mData;
for (UInt32 i = 0; i < SampleCount; ++i)
{
outputBufferL[i] = *SoundOutput->ReadCursor++;
outputBufferR[i] = *SoundOutput->ReadCursor++;
if ((char*)SoundOutput->ReadCursor >= (char*)((char*)SoundOutput->CoreAudioBuffer + SoundOutput->SoundBufferSize))
{
//printf("Callback: Read cursor wrapped!\n");
SoundOutput->ReadCursor = SoundOutput->CoreAudioBuffer;
}
}
for (UInt32 i = SampleCount; i < inNumberFrames; ++i)
{
outputBufferL[i] = 0.0;
outputBufferR[i] = 0.0;
}
return noErr;
}
This is mostly all there is to it. This is quite long but I did not see a way to present all needed information in a more compact way. I wanted to show all because I am by no means a professional programmer. If there is something you feel is missing, pleas tell me.
My feeling tells me that there is something wrong with the timing. I feel the function OSXProcessFrameAndRunGameLogic sometimes needs more time so that the core audio callback is already pulling samples out of the buffer before it is fully written by OutputTestSineWave.
There is actually more stuff going on in OSXProcessFrameAndRunGameLogic which I did not show here. I am "software rendering" very basic stuff into a framebuffer which is then displayed by OpenGL and I also do keypress checks in there because yeah, its the main function of functionality. In the future this is the place where I would like to handle the controls for multiple oscillators, filters and stuff.
Anyway even if I stop the Rendering and Input handling from being called every iteration I still get audio glitches.
I tried pulling all the sound processing in OSXProcessFrameAndRunGameLogic into an own function void* RunSound(void *GameData) and changed it to:
pthread_t soundThread;
pthread_create(&soundThread, NULL, RunSound, GameData);
pthread_join(soundThread, NULL);
However I got mixed results and was not even sure if multithreading is done like that. Creating and destroying threads 60 times a second didn't seem the way to go.
I also had the idea to let sound processing happen on a completely different thread before the application actually runs into the main loop. Something like two simultaneously running while loops where the first processes audio and the latter UI and input.
Questions:
I get glitchy audio. Rendering and input seem to work correctly but audio sometimes glitches, sometimes it doesn't. From the code I provided, can you maybe see me doing something wrong?
Am I using the core audio technology in a wrong way in order to achieve real time low latency signal generation?
Should I do sound processing in a separate thread like I talked about above? How would threading in this context be done correctly? It would make sense to have a thread only dedicated for sound am I right?
Am I right that the basic audio processing should not be done in the render callback of core audio? Is this function only for outputting the provided sound buffer?
And if sound processing should be done right here, how can I access information like the keyboard state from inside the callback?
Are there any resources you could point me to that I maybe missed?
This is the only place I know where I can get help with this project. I would really appreciate your help.
And if something is not clear to you please let me know.
Thank you :)
In general when dealing with low-latency audio you want to achieve the most deterministic behaviour possible.
This, for example, translates to:
Don't hold any locks on the audio thread (priority inversion)
No memory allocation on the audio thread (takes often too much time)
No file/network IO on the audio thread (takes often too much time)
Question 1:
There are indeed some problems with your code for when you want to achieve continuous, realtime, non-glitching audio.
1. Two different clock domains.
You are providing audio data from a (what I call) a different clock domain than the clock domain asking for data. Clock domain 1 in this case is defined by your TargetFramesPerSecond value, clock domain 2 defined by Core Audio. However, due too how scheduling works you have no guarantee that you thread is finishing in time and on time. You try to target your rendering to n frames per second, but what happens when you don't make it time wise? As far as I can see you don't compensate for the deviation a render cycle took compared to the ideal timing.
The way threading works is that ultimately the OS scheduler decides when your thread is active. There are never guarantees and this causes you render cycles to be not very precise (in term of precision you need for audio rendering).
2. There is no synchronisation between the render thread and the Core Audio rendercallback thread.
The thread where the OSXAudioUnitCallback runs is not the same as where your OSXProcessFrameAndRunGameLogic and thus OutputTestSineWave run. You are providing data from your main thread, and data is being read from the Core Audio render thread. Normally you would use some mutexes to protect you data, but in this case that's not possible because you would run into the problem of priority inversion.
A way of dealing with race conditions is to use a buffer which uses atomic variables to store the usage and pointers of the buffer and let only 1 producer and 1 consumer use this buffer.
Good examples of such buffers are:
https://github.com/michaeltyson/TPCircularBuffer
https://github.com/andrewrk/libsoundio/blob/master/src/ring_buffer.h
3. There are a lot of calls in you audio render thread which prevent deterministic behaviour.
As you wrote you are doing a lot more inside the same audio render thread. Changes are quite high that there will be stuff going on (under the hood) which prevents your thread from being on time. Generally, you should avoid calls which take either too much time or are not deterministic. With all the OpenGL/keypres/framebuffer rendering there is no way to be certain you thread will "arrive on time".
Below are some resources worth looking into.
Question 2:
AFAICT generally speaking, you are using the Core Audio technology correctly. The only problem I think you have is on the providing side.
Question 3:
Yes. Definitely! Although, there are multiple ways of doing this.
In your case you have a normal-priority thread running to do the rendering and a high-performance, realtime thread on which the audio render callback is being called. Looking at your code I would suggest putting the generation of the sine wave inside the render callback function (or call OutputTestSineWave from the render callback). This way you have the audio generation running inside a reliable high prio thread, there is no other rendering interfering with the timing precision and there is no need for a ringbuffer.
In other cases where you need to do "non-realtime" processing to get audiodata ready (think of reading from a file, reading from a network or even from another physical audio device) you cannot run this logic inside the Core Audio thread. A way to solve this is to start a separate, dedicated thread to do this processing. To pass the data to the realtime audio thread you would then make use of the earlier mentioned ringbuffer.
It basically boiles down to two simple goals: for the realtime thread it is necessary to have the audio data available at all times (all render calls), if this failes you will end up sending invalid (or better zeroed) audio data.
The main goal for the secondary thread is to fill up the ringbuffer as fast as possible and to keep the ringbuffer as full as possible. So, whenever there is room to put new audio data into the ringbuffer the thread should do just that.
The size of the ringbuffer in this case will dicate how much tolerance there will be for delay. The size of the ringbuffer will be a balance between certainty (bigger buffer) and latency (smaller buffer).
BTW. I'm quite certain Core Audio has all the facilities to do all this for you.
Question 4:
There are multiple ways of achieving you goal, and rendering the stuff inside the render callback from Core Audio is definitely one of them. The one thing you should keep in mind is that you have to make sure the function returns in time.
For changing parameters to manipulate the audio rendering you'll have to find a way of passing messages which enables the reader (audio renderer function) to get messages without locking and waiting. The way I have done this is to create a second ringbuffer which hold messages from which the audio renderer can consume. This can be as simple as a ringbuffer which hold structs with data (or even pointers to data). As long as you stick to the rules of no locking.
Question 5:
I don't know what resources you are aware of but here are some must-reads:
http://atastypixel.com/blog/four-common-mistakes-in-audio-development/
http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing
https://developer.apple.com/library/archive/qa/qa1467/_index.html
You basic problem is that you are trying to push audio from your game loop instead of letting the audio system pull it; e.g. instead of always having (or quickly being able to create *) enough audio samples ready for the amount requested by the audio callback to be pulled by the audio callback. The "always" has to account for enough slop to cover timing jitter (being called late or early or too few times) in your game loop.
(* with no locks, semaphores, memory allocation or Objective C messages)
So I have this CNN which I train on the GPU. During the training, I regularly save checkpoint.
Later on, I want to have a small script that reads .meta file and the checkpoint and do some tests on a CPU. I use the following the code:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
with sess.as_default():
with tf.device('/cpu:0'):
saver = tf.train.import_meta_graph('{}.meta'.format(model))
saver.restore(sess,model)
I keep getting this error which tell me that the saver is trying to put the operation on the GPU.
How can i change that?
Move all the ops to CPU using _set_device API. https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/framework/ops.py#L2255
with tf.Session() as sess:
g = tf.get_default_graph()
ops = g.get_operations()
for op in ops:
op._set_device('/device:CPU:*')
Hacky work-around, open your graph definition file (ending with .pbtxt), and remove all lines starting with device:
For programmatic approach you can see how TensorFlow exporter does this with clear_devices although that uses regular Saver, not meta graph exporter
Spark DStream has mapPartition API, while Flink DataStream API doesn't. Is there anyone who could help explain the reason. What I want to do is to implement a API similar to Spark reduceByKey on Flink.
Flink's stream processing model is quite different from Spark Streaming which is centered around mini batches. In Spark Streaming each mini batch is executed like a regular batch program on a finite set of data, whereas Flink DataStream programs continuously process records.
In Flink's DataSet API, a MapPartitionFunction has two parameters. An iterator for the input and a collector for the result of the function. A MapPartitionFunction in a Flink DataStream program would never return from the first function call, because the iterator would iterate over an endless stream of records. However, Flink's internal stream processing model requires that user functions return in order to checkpoint function state. Therefore, the DataStream API does not offer a mapPartition transformation.
In order to implement functionality similar to Spark Streaming's reduceByKey, you need to define a keyed window over the stream. Windows discretize streams which is somewhat similar to mini batches but windows offer way more flexibility. Since a window is of finite size, you can call reduce the window.
This could look like:
yourStream.keyBy("myKey") // organize stream by key "myKey"
.timeWindow(Time.seconds(5)) // build 5 sec tumbling windows
.reduce(new YourReduceFunction); // apply a reduce function on each window
The DataStream documentation shows how to define various window types and explains all available functions.
Note: The DataStream API has been reworked recently. The example assumes the latest version (0.10-SNAPSHOT) which will be release as 0.10.0 in the next days.
Assuming your input stream is single partition data (say String)
val new_number_of_partitions = 4
//below line partitions your data, you can broadcast data to all partitions
val step1stream = yourStream.rescale.setParallelism(new_number_of_partitions)
//flexibility for mapping
val step2stream = step1stream.map(new RichMapFunction[String, (String, Int)]{
// var local_val_to_different_part : Type = null
var myTaskId : Int = null
//below function is executed once for each mapper function (one mapper per partition)
override def open(config: Configuration): Unit = {
myTaskId = getRuntimeContext.getIndexOfThisSubtask
//do whatever initialization you want to do. read from data sources..
}
def map(value: String): (String, Int) = {
(value, myTasKId)
}
})
val step3stream = step2stream.keyBy(0).countWindow(new_number_of_partitions).sum(1).print
//Instead of sum(1), you can use .reduce((x,y)=>(x._1,x._2+y._2))
//.countWindow will first wait for a certain number of records for perticular key
// and then apply the function
Flink streaming is pure streaming (not the batched one). Take a look at Iterate API.
I want to write MJPEG picture internet stream viewer. I think getting jpeg images using sockets it's not very hard problem. But i want to know how to make accurate streaming.
while (1)
{
get_image()
show_image()
sleep (SOME_TIME) // how to make it accurate?
}
Any suggestions would be great.
In order to make it accurate, there are two possibilities:
Using framerate from the streaming server. In this case, the client needs to keep the same framerate (calculate each time when you get frame, then show and sleep for a variable amount of time using feedback: if the calculated framerate is higher than on server -> sleep more; if lower -> sleep less; then, the framerate on the client side will drift around the original value from server). It can be received from server during the initialization of streaming connection (when you get picture size and other parameters) or it can be configured.
The most accurate approach, actually, is using of timestamps from server per each frame (which is either taken from file by demuxer or generated in image sensor driver in case of camera device). If MJPEG is packeted into RTP stream, these timestamps are already in RTP header. So, client's task is trivial: show picture using time calculating from time offset, current timestamp and time base.
Update
For the first solution:
time_to_sleep = time_to_sleep_base = 1/framerate;
number_of_frames = 0;
time = current_time();
while (1)
{
get_image();
show_image();
sleep (time_to_sleep);
/* update time to sleep */
number_of_frames++;
cur_time = current_time();
cur_framerate = number_of_frames/(cur_time - time);
if (cur_framerate > framerate)
time_to_sleep += alpha*time_to_sleep;
else
time_to_sleep -= alpha*time_to_sleep;
time = cur_time;
}
, where alpha is a constant parameter of reactivity of the feedback (0.1..0.5) to play with.
However, it's better to organize queue for input images to make the process of showing smoother. The size of queue can be parametrized and could be somewhere around 1 sec time of showing, i.e. numerically equal to framerate.