Fast start for fdsink: first 5 MB async, following bytes synchronous - c

I wrote a little HTTP video streaming server using GStreamer. Essentially the client does a GET request and receives a continuous HTTP stream.
The stream should be sent synchronously, i.e. at the same speed as the bitrate is. Problem is that some players (mplayer is a prominent example) don't buffer variable bitrate content well, thus lacking every other second.
I want to circumvent the buffer underruns by transmitting the first, say, 5 MB immediately, ignoring the pipeline's clock. The rest of the stream should be transmitted at the appropriate speed.
I figured setting the fdsink sync=TRUE for the first 5 MB, and sync=FALSE from then on should do the trick, but that does not work, as the fdsink patiently waits for the pipeline clock to catch up to the already sent data. In my test with a very low bitrate, there is no data transmitted for quite some seconds.
My fdsink reader thread currently looks like this:
static void *readerThreadFun(void*) {
int fastStart = TRUE;
g_object_set(G_OBJECT(fdsink0), "sync", FALSE, NULL);
for(uint64_t position = 0;;) {
// (On the other side there is node.js,
// that's why I don't do the HTTP chunking here)
ssize_t readCount = splice(gstreamerFd, NULL, remoteFd,
NULL, 1<<20, SPLICE_F_MOVE|SPLICE_F_MORE);
if(readCount == 0) {
break;
} else if(readCount < 0) {
goto error;
}
position += readCount;
if(fastStart && position >= 5*1024*1024) {
fastStart = FALSE;
g_object_set(G_OBJECT(fdsink0), "sync", TRUE, NULL);
}
}
...
}
How can I make GStreamer "forget" the duration the wall clock has to catch up with? Is there some "reset" function? Am I misunderstanding sync? Is there another method to realize a "fast start" in GStreamer?

That's not quite the solution I was looking for:
gst_base_sink_set_ts_offset(GST_BASE_SINK(fdsink0), -10ll*1000*1000*1000);
The sink will stream the first 10 seconds immediately.

Related

D3D11 Map call is slow even after fence signaled

Im running a compute shader. It takes about .22ms to complete (according to D3D11 Query), but real world time is around .6ms on average. The map call definitely takes up some time! It computes a dirty RECT that contains all the pixels that have changed since the last frame and returns it back to the CPU. This all works great!
My issue is that the map call is very slow. Normal performance is optimal, at around .6ms, but if I run this is the background with any GPU intensive application (blender, AAA game, etc) it really starts to slow down, I see times jump the normal .6ms all the way to 9ms! I need to figure out why the map is taking so long!
The shader is passing the result into this structure here
buffer_desc.ByteWidth = sizeof(RECT);
buffer_desc.StructureByteStride = sizeof(RECT);
buffer_desc.Usage = D3D11_USAGE_DEFAULT;
buffer_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
buffer_desc.CPUAccessFlags = 0;
hr = ID3D11Device5_CreateBuffer(ctx->device, &buffer_desc, NULL, &ctx->reduction_final);
With a UAV of
uav_desc.Format = DXGI_FORMAT_UNKNOWN;
uav_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uav_desc.Buffer.FirstElement = 0;
uav_desc.Buffer.NumElements = 1;
uav_desc.Buffer.Flags = 0;
hr = ID3D11Device5_CreateUnorderedAccessView(ctx->device, (ID3D11Resource*) ctx->reduction_final, &uav_desc, &ctx->reduction_final_UAV);
Copying it into this staging buffer here
buffer_desc.ByteWidth = sizeof(RECT);
buffer_desc.StructureByteStride = sizeof(RECT);
buffer_desc.Usage = D3D11_USAGE_STAGING;
buffer_desc.BindFlags = 0;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
buffer_desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
hr = ID3D11Device5_CreateBuffer(ctx->device, &buffer_desc, NULL, &ctx->cpu_staging);
and mapping it like so
hr = ID3D11DeviceContext4_Map(ctx->device_ctx, (ID3D11Resource*) ctx->cpu_staging, 0, D3D11_MAP_READ, 0, &mapped_res);
I added a fence to my system so I could timeout if it took too long (if it took longer than 1ms).
Creation of fence and event
hr = ID3D11Device5_CreateFence(ctx->device, 1, D3D11_FENCE_FLAG_NONE, &IID_ID3D11Fence, (void**) &ctx->fence);
ctx->fence_handle = CreateEventA(NULL, FALSE, FALSE, NULL);
Then after my dispatch calls I copy into the staging, then signal the fence
ID3D11DeviceContext4_CopyResource(ctx->device_ctx, (ID3D11Resource*) ctx->cpu_staging, (ID3D11Resource*) ctx->reduction_final);
ID3D11DeviceContext4_Signal(ctx->device_ctx, ctx->fence, ++ctx->fence_counter);
hr = ID3D11Fence_SetEventOnCompletion(ctx->fence, ctx->fence_counter, ctx->fence_handle);
This all seems to be working. The fence signals and allows WaitForSingleObject to pass, and when I wait on it, I can timeout here and move on to other things while this finishes, then come back (this is not OK obviously, I want the data from the shader ASAP).
DWORD waitResult = WaitForSingleObject(ctx->fence_handle, 1);
if (waitResult != WAIT_OBJECT_0)
{
LOG("Timeout Count: %u", ++timeout_count);
return false;
}
else
{
LOG("Timeout Count: %u", timeout_count);
ID3D11DeviceContext4_Map(ctx->device_ctx, (ID3D11Resource*) ctx->cpu_staging, 0, D3D11_MAP_READ, 0, &mapped_res);
memcpy(rect, mapped_res.pData, sizeof(RECT));
ID3D11DeviceContext4_Unmap(ctx->device_ctx, (ID3D11Resource*) ctx->cpu_staging, 0);
return true;
}
Problem is, the map can still take a long time (ranging from .6 to 9ms). The fence signals once everything is done right? Which takes less than 1ms. But the map call itself can take a long time still. What gives? What am I missing about how mapping works? Is it possible to reduce the time to map? Is the fence signaling before the copy happens?
So just to re-cap. My shader runs fast under normal operations. around .6ms total time, with .22ms time for the shader itself (and Im assuming the rest is the mapping). Once I start running my shader while rendering with blender, it starts to seriously slow down. The fence tells me that the work is still being complete in record time, but the map call still is taking a very long time (upwards of 9ms).
Other random things i've tried:
Setting SetGPUThreadPriority to 7.
Setting AvSetMmThreadCharacteristicsW to "DisplayPostProcessing", "Games", and "Capture"
Setting AvSetMmThreadPriority to AVRT_PRIORITY_CRITICAL

gstreamer 1.14.5 multiple rtspsrc element pipeline, reconnect individual streams when disconnected via 'C' code

Hello GStreamer community & fans,
I have a working pipeline that connects to multiple H.264 IP camera streams using multiple rtspsrc elements aggregated into a single pipeline for downstream video processing.
Intermittently & randomly, streams coming in from remote & slower connections will have problems, timeout, retry and go dead, leaving that stream with a black image when viewing the streams post processing. The other working streams continue to process normally. The rtspsrc elements are setup to retry the rtsp connection, and that seems to somewhat work, but for when it doesn't, I'm looking for a way to disconnect the stream entirely from the rtspsrc element and restart that particular stream without disrupting the other streams.
I haven't found any obvious examples or ways to accomplish this, so I've been tinkering with the rtspsrc element code itself using this public function to access the rtspsrc internals that handle connecting.
__attribute__ ((visibility ("default"))) GstRTSPResult my_gst_rtspsrc_conninfo_reconnect(GstRTSPSrc *, gboolean);
GstRTSPResult
my_gst_rtspsrc_conninfo_reconnect(GstRTSPSrc *src, gboolean async)
{
int retries = 0, port = 0;
char portrange_buff[32];
// gboolean manual_http;
GST_ELEMENT_WARNING(src, RESOURCE, READ, (NULL),
(">>>>>>>>>> Streamer: A camera closed the streaming connection. Trying to reconnect"));
gst_rtspsrc_set_state (src, GST_STATE_PAUSED);
gst_rtspsrc_set_state (src, GST_STATE_READY);
gst_rtspsrc_flush(src, TRUE, FALSE);
// manual_http = src->conninfo.connection->manual_http;
// src->conninfo.connection->manual_http = TRUE;
gst_rtsp_connection_set_http_mode(src->conninfo.connection, TRUE);
if (gst_rtsp_conninfo_close(src, &src->conninfo, TRUE) == GST_RTSP_OK)
{
memset(portrange_buff, 0, sizeof(portrange_buff));
g_object_get(G_OBJECT(src), "port-range", portrange_buff, NULL);
for (retries = 0; portrange_buff[retries] && isdigit(portrange_buff[retries]); retries++)
port = (port * 10) + ((int)(portrange_buff[retries]) + 48);
if (port != src->client_port_range.min)
GST_ELEMENT_WARNING(src, RESOURCE, READ, (NULL), (">>>>>>>>>> Streamer: port range start mismatch"));
GST_WARNING_OBJECT(src, ">>>>>>>>>> Streamer: old port.min: %d, old port.max: %d, old port-range: %s\n", (src->client_port_range.min), (src->client_port_range.max), (portrange_buff));
src->client_port_range.min += 6;
src->client_port_range.max += 6;
src->next_port_num = src->client_port_range.min;
memset(portrange_buff, 0, sizeof(portrange_buff));
sprintf(portrange_buff, "%d-%d", src->client_port_range.min, src->client_port_range.max);
g_object_set(G_OBJECT(src), "port-range", portrange_buff, NULL);
for (retries = 0; retries < 5 && gst_rtsp_conninfo_connect(src, &src->conninfo, async) != GST_RTSP_OK; retries++)
sleep(10);
}
if (retries < 5)
{
gst_rtspsrc_set_state(src, GST_STATE_PAUSED);
gst_rtspsrc_set_state(src, GST_STATE_PLAYING);
return GST_RTSP_OK;
}
else return GST_RTSP_ERROR;
}
I realize this is probably not best practice and I'm doing this to find a better way once I understand the internals better through this learning experience.
I appreciate any feedback anyone has to this problem.
-Doug

Alternative to blocking code

Attempting to use mbed OS scheduler for a small project.
As mbed os is Asynchronous I need to avoid blocking code.
However the library for my wireless receiver uses a blocking line of:
while (!(wireless.isRxData()));
Is there an alternative way to do this that won't block all the code until a message is received?
static void listen(void) {
wireless.quickRxSetup(channel, addr1);
sprintf(ackData,"Ack data \r\n");
wireless.acknowledgeData(ackData, strlen(ackData), 1);
while (!(wireless.isRxData()));
len = wireless.getRxData(msg);
}
static void motor(void) {
pc.printf("Motor\n");
m.speed(1);
n.speed(1);
led1 = 1;
wait(0.5);
m.speed(0);
n.speed(0);
}
static void sendData() {
wireless.quickTxSetup(channel, addr1);
strcpy(accelData, "Robot");
wireless.transmitData(accelData ,strlen(accelData));
}
void app_start(int, char**) {
minar::Scheduler::postCallback(listen).period(minar::milliseconds(500)).tolerance(minar::milliseconds(1000));
minar::Scheduler::postCallback(motor).period(minar::milliseconds(500));
minar::Scheduler::postCallback(sendData).period(minar::milliseconds(500)).delay(minar::milliseconds(3000));
}
You should remove the while (!(wireless.isRxData())); loop in your listen function. Replace it with:
if (wireless.isRxData()) {
len = wireless.getRxData(msg);
// Process data
}
Then, you can process your data in that if statement, or you can call postCallback on another function that will do your processing.
Instead of looping until data is available, you'll want to poll for data. If RX data is not available, exit the function and set a timer to go off after a short interval. When the timer goes off, check for data again. Repeat until data is available. I'm not familiar with your OS so I can't offer any specific code. This may be as simple as adding a short "sleep" call inside the while loop, or may involve creating another callback from the scheduler.

NDIS filter driver' FilterReceiveNetBufferLists handler isn't called

I am developing an NDIS filter driver, and I fount its FilterReceiveNetBufferLists is never called (the network is blocked) under certain condition (like open Wireshark or click the "Interface List" button of it). But When I start the capturing, the FilterReceiveNetBufferLists get to be normal (network restored), this is so strange.
I found that when I mannually return NDIS_STATUS_FAILURE for the NdisFOidRequest function in an OID originating place of WinPcap driver (BIOCQUERYOID & BIOCSETOID switch branch of NPF_IoControl), then driver won't block the network (Also the winpcap can't work).
Is there something wrong with the NdisFOidRequest call?
The DeviceIO routine in Packet.c that originates OID requests:
case BIOCQUERYOID:
case BIOCSETOID:
TRACE_MESSAGE(PACKET_DEBUG_LOUD, "BIOCSETOID - BIOCQUERYOID");
//
// gain ownership of the Ndis Handle
//
if (NPF_StartUsingBinding(Open) == FALSE)
{
//
// MAC unbindind or unbound
//
SET_FAILURE_INVALID_REQUEST();
break;
}
// Extract a request from the list of free ones
RequestListEntry = ExInterlockedRemoveHeadList(&Open->RequestList, &Open->RequestSpinLock);
if (RequestListEntry == NULL)
{
//
// Release ownership of the Ndis Handle
//
NPF_StopUsingBinding(Open);
SET_FAILURE_NOMEM();
break;
}
pRequest = CONTAINING_RECORD(RequestListEntry, INTERNAL_REQUEST, ListElement);
//
// See if it is an Ndis request
//
OidData = Irp->AssociatedIrp.SystemBuffer;
if ((IrpSp->Parameters.DeviceIoControl.InputBufferLength == IrpSp->Parameters.DeviceIoControl.OutputBufferLength) &&
(IrpSp->Parameters.DeviceIoControl.InputBufferLength >= sizeof(PACKET_OID_DATA)) &&
(IrpSp->Parameters.DeviceIoControl.InputBufferLength >= sizeof(PACKET_OID_DATA) - 1 + OidData->Length))
{
TRACE_MESSAGE2(PACKET_DEBUG_LOUD, "BIOCSETOID|BIOCQUERYOID Request: Oid=%08lx, Length=%08lx", OidData->Oid, OidData->Length);
//
// The buffer is valid
//
NdisZeroMemory(&pRequest->Request, sizeof(NDIS_OID_REQUEST));
pRequest->Request.Header.Type = NDIS_OBJECT_TYPE_OID_REQUEST;
pRequest->Request.Header.Revision = NDIS_OID_REQUEST_REVISION_1;
pRequest->Request.Header.Size = NDIS_SIZEOF_OID_REQUEST_REVISION_1;
if (FunctionCode == BIOCSETOID)
{
pRequest->Request.RequestType = NdisRequestSetInformation;
pRequest->Request.DATA.SET_INFORMATION.Oid = OidData->Oid;
pRequest->Request.DATA.SET_INFORMATION.InformationBuffer = OidData->Data;
pRequest->Request.DATA.SET_INFORMATION.InformationBufferLength = OidData->Length;
}
else
{
pRequest->Request.RequestType = NdisRequestQueryInformation;
pRequest->Request.DATA.QUERY_INFORMATION.Oid = OidData->Oid;
pRequest->Request.DATA.QUERY_INFORMATION.InformationBuffer = OidData->Data;
pRequest->Request.DATA.QUERY_INFORMATION.InformationBufferLength = OidData->Length;
}
NdisResetEvent(&pRequest->InternalRequestCompletedEvent);
if (*((PVOID *) pRequest->Request.SourceReserved) != NULL)
{
*((PVOID *) pRequest->Request.SourceReserved) = NULL;
}
//
// submit the request
//
pRequest->Request.RequestId = (PVOID) NPF6X_REQUEST_ID;
ASSERT(Open->AdapterHandle != NULL);
Status = NdisFOidRequest(Open->AdapterHandle, &pRequest->Request);
//Status = NDIS_STATUS_FAILURE;
}
else
{
//
// Release ownership of the Ndis Handle
//
NPF_StopUsingBinding(Open);
//
// buffer too small
//
SET_FAILURE_BUFFER_SMALL();
break;
}
if (Status == NDIS_STATUS_PENDING)
{
NdisWaitEvent(&pRequest->InternalRequestCompletedEvent, 1000);
Status = pRequest->RequestStatus;
}
//
// Release ownership of the Ndis Handle
//
NPF_StopUsingBinding(Open);
//
// Complete the request
//
if (FunctionCode == BIOCSETOID)
{
OidData->Length = pRequest->Request.DATA.SET_INFORMATION.BytesRead;
TRACE_MESSAGE1(PACKET_DEBUG_LOUD, "BIOCSETOID completed, BytesRead = %u", OidData->Length);
}
else
{
if (FunctionCode == BIOCQUERYOID)
{
OidData->Length = pRequest->Request.DATA.QUERY_INFORMATION.BytesWritten;
if (Status == NDIS_STATUS_SUCCESS)
{
//
// check for the stupid bug of the Nortel driver ipsecw2k.sys v. 4.10.0.0 that doesn't set the BytesWritten correctly
// The driver is the one shipped with Nortel client Contivity VPN Client V04_65.18, and the MD5 for the buggy (unsigned) driver
// is 3c2ff8886976214959db7d7ffaefe724 *ipsecw2k.sys (there are multiple copies of this binary with the same exact version info!)
//
// The (certified) driver shipped with Nortel client Contivity VPN Client V04_65.320 doesn't seem affected by the bug.
//
if (pRequest->Request.DATA.QUERY_INFORMATION.BytesWritten > pRequest->Request.DATA.QUERY_INFORMATION.InformationBufferLength)
{
TRACE_MESSAGE2(PACKET_DEBUG_LOUD, "Bogus return from NdisRequest (query): Bytes Written (%u) > InfoBufferLength (%u)!!", pRequest->Request.DATA.QUERY_INFORMATION.BytesWritten, pRequest->Request.DATA.QUERY_INFORMATION.InformationBufferLength);
Status = NDIS_STATUS_INVALID_DATA;
}
}
TRACE_MESSAGE1(PACKET_DEBUG_LOUD, "BIOCQUERYOID completed, BytesWritten = %u", OidData->Length);
}
}
ExInterlockedInsertTailList(&Open->RequestList, &pRequest->ListElement, &Open->RequestSpinLock);
if (Status == NDIS_STATUS_SUCCESS)
{
SET_RESULT_SUCCESS(sizeof(PACKET_OID_DATA) - 1 + OidData->Length);
}
else
{
SET_FAILURE_INVALID_REQUEST();
}
break;
Three Filter OID routines:
_Use_decl_annotations_
NDIS_STATUS
NPF_OidRequest(
NDIS_HANDLE FilterModuleContext,
PNDIS_OID_REQUEST Request
)
{
POPEN_INSTANCE Open = (POPEN_INSTANCE) FilterModuleContext;
NDIS_STATUS Status;
PNDIS_OID_REQUEST ClonedRequest=NULL;
BOOLEAN bSubmitted = FALSE;
PFILTER_REQUEST_CONTEXT Context;
BOOLEAN bFalse = FALSE;
TRACE_ENTER();
do
{
Status = NdisAllocateCloneOidRequest(Open->AdapterHandle,
Request,
NPF6X_ALLOC_TAG,
&ClonedRequest);
if (Status != NDIS_STATUS_SUCCESS)
{
TRACE_MESSAGE(PACKET_DEBUG_LOUD, "FilerOidRequest: Cannot Clone Request\n");
break;
}
Context = (PFILTER_REQUEST_CONTEXT)(&ClonedRequest->SourceReserved[0]);
*Context = Request;
bSubmitted = TRUE;
//
// Use same request ID
//
ClonedRequest->RequestId = Request->RequestId;
Open->PendingOidRequest = ClonedRequest;
Status = NdisFOidRequest(Open->AdapterHandle, ClonedRequest);
if (Status != NDIS_STATUS_PENDING)
{
NPF_OidRequestComplete(Open, ClonedRequest, Status);
Status = NDIS_STATUS_PENDING;
}
}while (bFalse);
if (bSubmitted == FALSE)
{
switch(Request->RequestType)
{
case NdisRequestMethod:
Request->DATA.METHOD_INFORMATION.BytesRead = 0;
Request->DATA.METHOD_INFORMATION.BytesNeeded = 0;
Request->DATA.METHOD_INFORMATION.BytesWritten = 0;
break;
case NdisRequestSetInformation:
Request->DATA.SET_INFORMATION.BytesRead = 0;
Request->DATA.SET_INFORMATION.BytesNeeded = 0;
break;
case NdisRequestQueryInformation:
case NdisRequestQueryStatistics:
default:
Request->DATA.QUERY_INFORMATION.BytesWritten = 0;
Request->DATA.QUERY_INFORMATION.BytesNeeded = 0;
break;
}
}
TRACE_EXIT();
return Status;
}
//-------------------------------------------------------------------
_Use_decl_annotations_
VOID
NPF_CancelOidRequest(
NDIS_HANDLE FilterModuleContext,
PVOID RequestId
)
{
POPEN_INSTANCE Open = (POPEN_INSTANCE) FilterModuleContext;
PNDIS_OID_REQUEST Request = NULL;
PFILTER_REQUEST_CONTEXT Context;
PNDIS_OID_REQUEST OriginalRequest = NULL;
BOOLEAN bFalse = FALSE;
FILTER_ACQUIRE_LOCK(&Open->OIDLock, bFalse);
Request = Open->PendingOidRequest;
if (Request != NULL)
{
Context = (PFILTER_REQUEST_CONTEXT)(&Request->SourceReserved[0]);
OriginalRequest = (*Context);
}
if ((OriginalRequest != NULL) && (OriginalRequest->RequestId == RequestId))
{
FILTER_RELEASE_LOCK(&Open->OIDLock, bFalse);
NdisFCancelOidRequest(Open->AdapterHandle, RequestId);
}
else
{
FILTER_RELEASE_LOCK(&Open->OIDLock, bFalse);
}
}
//-------------------------------------------------------------------
_Use_decl_annotations_
VOID
NPF_OidRequestComplete(
NDIS_HANDLE FilterModuleContext,
PNDIS_OID_REQUEST Request,
NDIS_STATUS Status
)
{
POPEN_INSTANCE Open = (POPEN_INSTANCE) FilterModuleContext;
PNDIS_OID_REQUEST OriginalRequest;
PFILTER_REQUEST_CONTEXT Context;
BOOLEAN bFalse = FALSE;
TRACE_ENTER();
Context = (PFILTER_REQUEST_CONTEXT)(&Request->SourceReserved[0]);
OriginalRequest = (*Context);
//
// This is an internal request
//
if (OriginalRequest == NULL)
{
TRACE_MESSAGE1(PACKET_DEBUG_LOUD, "Status= %p", Status);
NPF_InternalRequestComplete(Open, Request, Status);
TRACE_EXIT();
return;
}
FILTER_ACQUIRE_LOCK(&Open->OIDLock, bFalse);
ASSERT(Open->PendingOidRequest == Request);
Open->PendingOidRequest = NULL;
FILTER_RELEASE_LOCK(&Open->OIDLock, bFalse);
//
// Copy the information from the returned request to the original request
//
switch(Request->RequestType)
{
case NdisRequestMethod:
OriginalRequest->DATA.METHOD_INFORMATION.OutputBufferLength = Request->DATA.METHOD_INFORMATION.OutputBufferLength;
OriginalRequest->DATA.METHOD_INFORMATION.BytesRead = Request->DATA.METHOD_INFORMATION.BytesRead;
OriginalRequest->DATA.METHOD_INFORMATION.BytesNeeded = Request->DATA.METHOD_INFORMATION.BytesNeeded;
OriginalRequest->DATA.METHOD_INFORMATION.BytesWritten = Request->DATA.METHOD_INFORMATION.BytesWritten;
break;
case NdisRequestSetInformation:
OriginalRequest->DATA.SET_INFORMATION.BytesRead = Request->DATA.SET_INFORMATION.BytesRead;
OriginalRequest->DATA.SET_INFORMATION.BytesNeeded = Request->DATA.SET_INFORMATION.BytesNeeded;
break;
case NdisRequestQueryInformation:
case NdisRequestQueryStatistics:
default:
OriginalRequest->DATA.QUERY_INFORMATION.BytesWritten = Request->DATA.QUERY_INFORMATION.BytesWritten;
OriginalRequest->DATA.QUERY_INFORMATION.BytesNeeded = Request->DATA.QUERY_INFORMATION.BytesNeeded;
break;
}
(*Context) = NULL;
NdisFreeCloneOidRequest(Open->AdapterHandle, Request);
NdisFOidRequestComplete(Open->AdapterHandle, OriginalRequest, Status);
TRACE_EXIT();
}
Below is the mail I received from Jeffrey, I think it is the best answer for this question:)
The packet filter works differently for LWFs versus Protocols. Let me give you some background. You’ll already know some of this, I’m sure, but it’s always helpful to review the basics, so we can be sure that we’re both on the same page. The NDIS datapath is organized like a tree:
Packet filtering happens at two places in this stack:
(a) once in the miniport hardware, and
(b) at the top of the stack, just below the protocols.
NDIS will track each protocols’ packet filter separately, for efficiency. If one protocol asks to see ALL packets (promiscuous mode), then not all protocols have to sort through all that traffic. So really, there are (P+1) different packet filters in the system, where P is the number of protocols:
Now if there are all these different packet filters, how does an OID_GEN_CURRENT_PACKET_FILTER actually work? What NDIS does is NDIS tracks each protocols’ packet filter, but also merges the filter at the top of the miniport stack. So suppose protocol0 requests a packet filter of A+B, and protocol1 requests a packet filter of C, and protocol2 requests a packet filter of B+D:
Then at the top of the stack, NDIS merges the packet filters to A+B+C+D. This is what gets sent down the filter stack, and eventually to the miniport.
Because of this merging process, no matter what protocol2 sets as its packet filter, protocol2 cannot affect the other protocols. So protocols don’t have to worry about “sharing” the packet filter. However, the same is not true for a LWF. If LWF1 decides to set a new packet filter, it does not get merged:
In the above picture, LWF1 decided to change the packet filter to C+E. This overwrote the protocols’ packet filter of A+B+C+D, meaning that flags A, B, and D will never make it to the hardware. If the protocols were relying on flags A, B, or D, then the protocols’ functionality will be broken.
This is by design – LWFs have great power, and they can do anything to the stack. They are designed to have the power to veto the packet filters of all other protocols. But in your case, you don’t want to mess with other protocols; you want your filter to have minimal effects on the rest of the system.
So what you want to do is to always keep track of what the packet filter is, and never remove flags from the current packet filter. That means that you should query the packet filter when your filter attaches, and update your cached value whenever you see an OID_GEN_CURRENT_PACKET_FILTER come down from above.
If your usermode app needs more flags than what the current packet filter has, you can issue the OID and add additional flags. This means that the hardware’s packet filter will have more flags. But no protocol’s packet filter will change, so the protocols will still see the same stuff.
In the above example, the filter LWF1 is playing nice. Even though LWF1 only cares about flag E, LWF1 has still passed down all flags A, B, C, and D too, since LWF1 knows that the protocols above it want those flags to be set.
The code to manage this isn’t too bad, once you get the idea of what needs to be done to manage the packet filter:
Always track the latest packet filter from protocols above.
Never let the NIC see a packet filter that has fewer flags than the protocols’ packet filter.
Add in your own flags as needed.
Ok, hopefully that gives you a good idea of what the packet filter is and how to manage it. The next question is how to map “promiscuous mode” and “non-promiscuous mode” into actual flags? Let’s define these two modes carefully:
Non-promiscuous mode: The capture tool only sees the receive traffic that the operating system would normally have received. If the hardware filters out traffic, then we don’t want to see that traffic. The user wants to diagnose the local operating system in its normal state.
Promiscuous mode: Give the capture tool as many receive packets as possible – ideally every bit that is transferred on the wire. It doesn’t matter whether the packet was destined for the local host or not. The user wants to diagnose the network, and so wants to see everything happening on the network.
I think when you look at it that way, the consequences for the packet filter flags are fairly straightforward. For non-promiscuous mode, do not change the packet filter. Just let the hardware packet filter be whatever the operating system wants it to be. Then for promiscuous mode, add in the NDIS_PACKET_TYPE_PROMISCUOUS flag, and the NIC hardware will give you everything it possibly can.
So if it’s that simple for a LWF, why did the old protocol-based NPF driver need so many more flags? The old protocol-based driver had a couple problems:
It can’t get “non-promiscuous mode” perfectly correct
It can’t easily capture the send-packets of other protocols
The first problem with NPF-protocol is that it can’t easily implement our definition of “non-promiscuous mode” correctly. If NPF-the-protocol wants to see receive traffic just as the OS sees it, then what packet filter should it use? If it sets a packet filter of zero, then NPF won’t see any traffic. So NPF can set a packet filter of Directed|Broadcast|Multicast. But that’s only an assumption of what TCPIP and other protocols are setting. If TCPIP decided to set a Promiscuous flag (certain socket flags cause this to happen), then NPF would actually be seeing fewer packets than what TCPIP would see, which is wrong. But if NPF sets the Promiscuous flag, then it will see more traffic than TCPIP would see, which is also wrong. So it’s tough for a capturing protocol to decide which flags to set so that it sees exactly the same packets that the rest of the OS sees. LWFs don’t have that problem, since LWFs get to see the combined OID after all protocols’ filters are merged.
The second problem with NPF-protocol is that it needed loopback mode to capture sent-packets. LWFs don’t need loopback -- in fact, it would be actively harmful. Let’s use the same diagram to see why. Here’s NPF capturing the receive path in promiscuous mode:
Now let’s see what happens when a unicast packet is received:
Since the packet matches the hardware’s filter, the packet comes up the stack. Then when the packet gets to the protocol layer, NDIS gives the packet to both protocols, tcpip and npf, since both protocols’ packet filters match the packet. So that works well enough.
But now the send path is tricky:
tcpip sent a packet, but npf never got a chance to see it! To solve this problem, NDIS added the notion of a “loopback” packet filter flag. This flag is a little bit special, since it doesn’t go to the hardware. Instead, the loopback packet filter tells NDIS to bounce all send-traffic back up the receive path, so that diagnostics tools like npf can see the packets. It looks like this:
Now the loopback path is really only used for diagnostics tools, so we haven’t spent much time optimizing it. And, since it means that all send packets travel across the stack twice (once for the normal send path, and again in the receive path), it has at least double the CPU cost. This is why I said that an NDIS LWF would be able to be capture at a higher throughput than a protocol, since LWFs don’t need the loopback path.
Why not? Why don’t LWFs need loopback? Well if you go back and look at the last few diagrams, you’ll see that all of our LWFs saw all the traffic – both send and receive – without any loopback. So the LWF meets the requirements of seeing all traffic, without needing to bother with loopback. That’s why a LWF should normally never set any loopback flags.
Ok, that email got longer than I wanted, but I hope that clears up some of the questions around the packet filter, the loopback path, and how LWFs are different from protocols. Please let me know if anything wasn’t clear, or if the diagrams didn’t come through.

SDL: how to stop audio - not resume, which SDL_PauseAudio(1) does actually?

Developing iOS application which uses CoreAudio framework, I am dealing with IMHO nonsense behavior of SDL reg. playing audio. SDL plays audio in loop, and only way how to trigger playback is to call SDL_PauseAudio(0), and the only way how to stop it (without other side effects, which I won't talk about here) is to call SDL_PauseAudio(1). As far as I know.
What is the problem for me in SDL here? Simply - next call to SDL_PauseAudio(1) actually RESUMES the playback, causing the framework to play some mess *before asking for new sound data*. This is because of the way how SDL_CoreAudio.c implements the playback loop.
It means, that SDL does not implement STOP, it implements just PAUSE/RESUME and incorrectly manages audio processing. It means, that if you play sampleA, and want to play sampleB later on, you will hear also fragments of sampleA while expecting just to hear playback of sampleB.
If am wrong, please correct me.
If not, here's my diff, that I used to implement also STOP behavior: as soon as I finish playing sampleA, I call SDL_PauseAudio(2) so that playback loop quits and next call to SDL_PauseAudio(0) starts it again, this time by playing no mess from sampleA, but correctly playing just data from smapleB.
Index: src/audio/coreaudio/SDL_coreaudio.c
===================================================================
--- src/audio/coreaudio/SDL_coreaudio.c
+++ src/audio/coreaudio/SDL_coreaudio.c
## -250,6 +250,12 ##
abuf = &ioData->mBuffers[i];
SDL_memset(abuf->mData, this->spec.silence, abuf->mDataByteSize);
}
+ if (2 == this->paused)
+ {
+ // this changes 'pause' behavior to 'stop' behavior - next
+ // plyaing starts from beginning, i.e. it won't resume
+ this->hidden->bufferOffset = this->hidden->bufferSize;
+ }
return 0;
}
I am shamed that I edited SDL code, but I have no connection to authors and haven't found any help around. Well, it is strange for me, that no one seems to need STOP behavior in SDL?
A way around your issue is for instance to better manage your audio device with SDL. Here is what I suggest :
void myAudioCallback(void *userdata, Uint8 *stream, int len) { ... }
SDL_AudioDeviceID audioDevice;
void startAudio()
{
// prepare the device
static SDL_AudioSpec audioSpec;
SDL_zero(audioSpec);
audioSpec.channels = 2;
audioSpec.freq = 44100;
audioSpec.format = AUDIO_S16SYS;
audioSpec.userdata = (void*)myDataLocation;
audioSpec.callback = myAudioCallback;
audioDevice = SDL_OpenAudioDevice(NULL, 0, &audioSpec, &audioSpec, 0);
// actually start playback
SDL_PauseAudioDevice(audioDevice, 0);
}
void stopAudio()
{
SDL_CloseAudioDevice(audioDevice);
}
This works for me, the callback is not called after stopAudio() and no garbage is sent to the speaker either.

Resources