In my program i am detecting the face of a person, my code is working well, but i am worry about this code, as for eye detection "cascade.detectMultiScale()" have many parameters, while for Face detection i am using these few parameters, and how it detects the face, whether we have not initialized the size of detecting object in "cascade.detectMultiScale()"
cascade.detectMultiScale(gray, faces, 1.2, 2);
for (int i = 0; i < faces.size(); i++)
{
Rect r = faces[i];
rectangle(src, Point(r.x, r.y), Point(r.x + r.width, r.y + r.height), CV_RGB(0,0,255));
}
You should probably read some manual regarding integrated Open CV functions (Open CV cascade classifier). Last 2 parameters are "minSize" and "maxSize", which can set minimum and maximum size of detected object. For my project I am detecting narrator's face on some 1080p HDTV channel, so my configuration is like this:
face_cascade.detectMultiScale( frame_gray, faces, 1.1, 2, 0|CV_HAAR_SCALE_IMAGE, Size(500, 500) );
...which means I have scale factor=1.1 with only 1 possible face detected.
CV_HAAR_SCALE_IMAGE means that algorithm is in charge of scaling the image, not the detector (which is, in general, slower). You can also use something like 0|CV_HAAR_FIND_BIGGEST_OBJECT if you want to extract the biggest object among all the candidates. In my case, I also forced detector to search for objects not smaller than 500x500 px, which have also speed up my real-time processing and prevent detector from making false detections.
You also should keep in mind, that integrated detector is derived from some predefined parameters (especially in training phase). If you are really interested in making better detector (and better detection accuracy and/or performance) for your application of use, you should consider custom made classifier. But be aware: although modified parameters (number of training iterations, training mode, object properties, object alignment, etc.) COULD make things better, but good understanding of each one of them (and impact between them and on the final result), as well as fine-tuning is needed in order to make some reasonable improvements.
Related
i have fairly large files with 3D scan points (200.000 ish) and try to make a TopoDS_Shape with BRepOffsetAPI_Sewing
gp_Pnt p1(0,0,100);
...
TopoDS_Edge e1 = BRepBuilderAPI_MakeEdge( p4, p1);
...
TopoDS_Wire w1 = BRepBuilderAPI_MakeWire(e1, e2, e3, e4);
...
TopoDS_Face f1 = BRepBuilderAPI_MakeFace(w1);
...
BRepOffsetAPI_Sewing sew(0.1);
sew.Add(f1);sew.Perform();TopoDS_Shape sewedShape = sew.SewedShape();
of corse with all the points in loops etc. above code is just a sample how I try to create things.
with 200.000 points it takes 20-30 second to produce the face.
my next approach was to save the produced shape after generated and load it later as a workaround.
BRepTools::Write(sewedShape, sFile);
but even that is slow.
I did similar things in Java3D and it was way faster. So I make something wrong.
only showing the points with
Handle_Graphic3d_ArrayOfPoints points3d = new Graphic3d_ArrayOfPoints(totPoints, true, false);
gp_Pnt pnt(x, y, z);
points3d->AddVertex(pnt, aColor); // adding 200.000 points
Handle(AIS_PointCloud) m_points = new AIS_PointCloud();
m_points->SetPoints(points3d);
m_occView->getContext()->Display(m_points, true);
is almost instant (less then a second)
my goal is to build 2 of those faces and find the intersection with OCBRepAlgoAPI_Section
Thanks for help in advance!
As far as I understand, your current approach is:
Create a TopoDS_Face per quad in Point Cloud.
To avoid unnecessary overhead on sewing operation, you would need reconsidering your workflow and create shared shapes from the scratch. E.g., instead of creating TopoDS_Vertex for the same point in Point Cloud multiple times, you should create a single one and reuse it in construction of connected edges / quads; the same applies to TopoDS_Edge construction. What sewing operation does for you is finding and repairing shared information between geometrically connected faces - which is a plenty of work that could be entirely avoided.
But as you have been already pointed out (by trying to dump produced shape into a file), mapping tessellation to B-Rep is counter-efficient approach in general. Just take a look at all these TopoDS_Vertex, TopoDS_Edge, TopoDS_Wire, TopoDS_Face to figure out how much more data structures are needed in B-Rep for mapping a very single triangle or quad. This structure is heavy not only from memory utilization point of view, but also for algorithms you might want to do with it like Boolean operations.
Possible alternatives:
Create a Poly_Triangulation from your point cloud and a single TopoDS_Face from it. You would be able to efficiently visualize it in 3D Viewer and perform some operations like computing surface area. Unfortunately, such geometry definition is not yet supported by all OCCT algorithms, so that you wouldn't be able performing Boolean operations.
Create an approximated B-Spline surface from your Point Clouds. This could be done with help of GeomPlate or SSP (Surface from Scattered Points) algorithms. Approximated surface would be a better fit to B-Rep geometry definition, but might loose some details of original surface and might be tricky to apply (you might need splitting a complex surface into several pieces).
Use OMF product (Mesh Framework) to perform Boolean operations on meshes. In case if Boolean operation on meshes is all you need, OMF could be helpful.
I am using GL_LINE_LOOP to draw a circle in C and openGL! Is it possible for me to fill the circle with colors?
If needed, this is the code I'm using:
const int circle_points=100;
const float cx=50+i, cy=50+x, r=50;
const float pi = 3.14159f;
int i = 50;
glColor3f(1, 1, 1);
glBegin(GL_LINE_LOOP);
for(i=0;i<circle_points;i++)
{
const float theta=(2*pi*i)/circle_points;
glVertex2f(cx+r*cos(theta),cy+r*sin(theta));
}
glEnd();
Lookup polygon triangulation!
I hope something here is somehow useful to someone, even though this question was asked in February. There are many answers, even though a lot of people would give none. I could witter forever, but I'll try to finish before then.
Some would even say, "You never would," or, "That's not appropriate for OpenGL," I'd like to say more than them about why. Converting polygons into the triangles that OpenGL likes so much is outside of OpenGL's job-spec, and is probably better done on the processor side anyway. Calculate that stage in advance, as few times as possible, rather than repeatedly sending such a chunky problem on every draw call.
Perhaps the original questioner drifted away from OpenGL since February, or perhaps they've become an expert. Perhaps I'll re-inspire them to look at it again, to hack away at some original 'imposters'. Or maybe they'll say it's not the tool for them after all, but that would be disappointing. Whatever graphics code you're writing, you know that OpenGL can speed it up!
Triangles for convex polygons are easy
Do you just want a circle? Make a triangle fan with the shared point at the circle's origin. GL_POLYGON was, for better or worse, deprecated then killed off entirely; it will not work with current or future implementations of OpenGL.
Triangles for concave polygons are hard
You'll want more general polygons later? Well, there are some tricks you could play with, for all manner of convex polygons, but concave ones will soon get difficult. It would be easy to start five different solutions without finishing a single one. Then it would be difficult, on finishing one, to make it quick, and nearly impossible to be sure that it's the quickest.
To achieve it in a future-proofed way you really want to stick with triangles -- so "polygon triangulation" is the subject you want to search for. OpenGL will always be great for drawing triangles. Triangle strips are popular because they reuse many vertices, and a whole mesh can be covered with only triangle strips, (perhaps including the odd lone triangle or pair of triangles). Drawing with only one primitive usually means the entire mesh can be rendered with a single draw call, which could improve performance. (Number of draw calls is one performance consideration, but not always the most important.)
Polygon triangulation gets more complex when you allow convex polygons or polygons with holes. (Finding algorithms for triangulating a general polygon, robustly yet quickly, is actually an area of ongoing research. Nonetheless, you can find some pretty good solutions out there that are probably fit for purpose.)
But is this what you want?
Is a filled polygon crucial to your final goals in OpenGL? Or did you simply choose what felt like it would be a simple early lesson?
Frustratingly, although drawing a filled polygon seems like a simple thing to do -- and indeed is one of the simplest things to do in many languages -- the solution in OpenGL is likely to be quite complicated. Of course, it can be done if we're clever enough -- but that could be a lot of effort, without being the best route to take towards your later goals.
Even in languages that implement filled polygons in a way that is simple to program with, you don't always know how much strain it puts on the CPU or GPU. If you send a sequence of vertices, to be linked and filled, once every animation frame, will it be slow? If a polygon doesn't change shape, perhaps you should do the difficult part of the calculation just once? You will be doing just that, if you triangulate a polygon once using the CPU, then repeatedly send those triangles to OpenGL for rendering.
OpenGL is very, very good at doing certain things, very quickly, taking advantage of hardware acceleration. It is worth appreciating what it is and is not optimal for, to decide your best route towards your final goals with OpenGL.
If you're looking for a simple early lesson, rotating brightly coloured tetrahedrons is ideal, and happens early in most tutorials.
If on the other hand, you're planning a project that you currently envision using filled polygons a great deal -- say, a stylized cartoon rendering engine for instance -- I still advise going to the tutorials, and even more so! Find a good one; stick with it to the end; you can then think better about OpenGL functions that are and aren't available to you. What can you take advantage of? What do you need or want to redo in software? And is it worth writing your own code for apparently simple things -- like drawing filled polygons -- that are 'missing from' (or at least inappropriate to) OpenGL?
Is there a higher level graphics library, free to use -- perhaps relying on OpenGL for rasterisation -- that can already do want you want? If so, how much freedom does it give you, to mess with the nuts and bolts of OpenGL itself?
OpenGL is very good at drawing points, lines, and triangles, and hardware accelerating certain common operations such as clipping, face culling, perspective divides, perspective texture accesses (very useful for lighting) and so on. It offers you a chance to write special programs called shaders, which operate at various stages of the rendering pipeline, maximising your chance to insert your own unique cleverness while still taking advantage of hardware acceleration.
A good tutorial is one that explains the rendering pipeline and puts you in a much better position to assess what the tool of OpenGL is best used for.
Here is one such tutorial that I found recently: Learning Modern 3D Graphics Programming
by Jason L. McKesson. It doesn't appear to be complete, but if you get far enough for that to annoy you, you'll be well placed to search for the rest.
Using imposters to fill polygons
Everything in computer graphics is an imposter, but the term often has a specialised meaning. Imposters display very different geometry from what they actually have -- only more so than usual! Of course, a 3D world is very different from the pixels representing it, but with imposters, the deception goes deeper than usual.
For instance, a rectangle that OpenGL actually constructs out of two triangles can appear to be a sphere if, in its fragment shader, you write a customised depth value to the depth coordinate, calculate your own normals for lighting and so on, and discard those fragments of the square that would fall outside the outline of the sphere. (Calculating the depth on those fragments would involve a square root of a negative number, which can be used to discard the fragment.) Imposters like that are sometimes called flat cards or billboards.
(The tutorial above includes a chapter on imposters, and examples doing just what I've described here. In fact, the rectangle itself is constructed only part way through the pipeline, from a single point. I warn that the scaling of their rectangle, to account for the way that perspective distorts a sphere into an ellipse in a wide FOV, is a non-robust fudge . The correct and robust answer is tricky to work out, using mathematics that would be slightly beyond the scope of the book. I'd say it is beyond the author's algebra skills to work it out but I could be wrong; he'd certainly understand a worked example. However, when you have the correct solution, it is computationally inexpensive; it involves only linear operations plus two square roots, to find the four limits of a horizontally- or vertically-translated sphere. To generalise that technique for other displacements requires one more square root, for a vector normalisation to find the correct rotation, and one application of that rotation matrix when you render the rectangle.)
So just to suggest an original solution that others aren't likely to provide, you could use an inequality (like x * x + y * y <= 1 for a circle or x * x - y * y <= 1 for a hyperbola) or a system of inequalities (like three straight line forms to bound a triangle) to decide how to discard a fragment. Note that if inequalities have more than linear order, they can encode perfect curves, and render them just as smoothly as your pixelated screen will allow -- with no limitation on the 'geometric detail' of the curve. You can also combine straight and curved edges in a single polygon, in this way.
For instance, a fragment shader (which would be written in GLSL) for a semi-circle might have something like this:
if (y < 0) discard;
float rSq = x * x + y * y;
if (1 < rSq) discard;
// We're inside the semi-circle; put further shader computations here
However, the polygons that are easy to draw, in this way, are very different from the ones that you're used to being easy. Converting a sequence of connected nodes into inequalities means yet more code to write, and deciding on the Boolean logic, to deal with combining those inequalities, could then get quite complex -- especially for concave polygons. Performing inequalities in a sensible order, so that some can be culled based on the results of others, is another ill-posed headache of a problem, if it needs to be general, even though it is easy to hard-code an optimal solution for a single case like a square.
I suggest using imposters mainly for its contrast with the triangulation method. Something like either one could be a route to pursue, depending on what you're hoping to achieve in the end, and the nature of your polygons.
Have fun...
P.S. have a related topic... Polygon triangulation into triangle strips for OpenGL ES
As long as the link lasts, it's a more detailed explanation of 'polygon triangulation' than mine. Those are the two words to search for if the link ever dies.
A line loop is just an outline.
To fill the middle as well, you want to use GL_POLYGON.
I'm doing a project with a lot of calculation and i got an idea is throw pieces of work to GPU, but i wonder whether could we retrieve results from GLSL, if it is posible, how?
GLSL does not provide outputs besides what is placed in the frame buffer.
To program a GPU and get results more conveniently, use CUDA (NVidia only) or OpenCL (cross-platform).
In general, what you want to do is use OpenCL for general-purpose GPU tasks. However, if you are insistent about pretending that OpenGL is not a rendering API...
Framebuffer Objects make it relatively easy to render to multiple outputs. This of course means that you have to structure your processing such that what gets rendered matches what you want. You can render to 32-bit floating-point "images", so you have access to plenty of precision. The biggest difficulty is what I stated: figuring out how to structure your task to match rendering.
It's a bit easier when using transform feedback. This is the ability to write the output of the vertex (or geometry) shader processing to a buffer object. This still requires structuring your tasks into something like rendering, but it's easier because vertex shaders have a strict one-vertex-to-one-vertex mapping. For every input vertex, there is exactly one output. And if you draw GL_POINTS, it's not too difficult to use attributes to pass the data that changes.
Both easier and harder is the use of shader_image_load_store. This is effectively the ability to read/write from/to arbitrary images "whenever you want". I put that last part in quotes because there are lots of esoteric rules about data race conditions: reading from a value written by another shader invocation and so forth. These are not trivial to deal with. You can try to structure your code to avoid them, by not writing to the same image location in the same shader. But in many cases, if you could do that, you could just render to the framebuffer.
Ultimately, it's pretty much impossible to answer this question in the general case, without knowing what exactly you're trying to actually do. How you approach GPGPU through a rendering API depends greatly on exactly what you're trying to compute.
We're currently using the Silverlight VideoSink to capture video from users' local webcams, kinda like so:
protected override void OnSample(long sampleTime, long frameDuration, byte[] sampleData)
{
if (FrameShouldBeSubmitted())
{
byte[] resampledData = ResizeFrame(sampleData);
mediaController.SetVideoFrame(resampledData);
}
}
Now, on most of the machines that we've tested, the video sample provided in the byte[] sampleData parameter is upside-down, i.e., if you try to take the RGBA data and turn it into, say, a WriteableBitmap, the bitmap will be upside-down. That's odd, but fairly easy to correct, of course -- you just have to reverse the array as you encode it.
The problem is that at least on some machines (e.g., the single Macintosh in our test environment), the video sample provided is no longer upside-down, but right-side up, and hence, flipping the image actually results in an image that's received upside-down on the far side.
I reported this to MS as a bug, but their (terse) response was that it was "As Designed". Further attempts at clarification have so far been ignored.
Now, I'll grant that it's kinda entertaining to imagine the discussions behind this design decision: "OK, just to make it interesting, let's play the video rightside up on a Mac, but let's turn it upside down for Windows!" "Great idea!" "Yeah, that'll keep those developers guessing!" But beyond that, I can't find this, umm, "feature" documented anywhere, nor can I find any documentation on how one is supposed to be able to tell that a given video sample is upside down or rightside up. Any thoughts on how to tell this?
EDIT 3/29/10 4:50 pm - I got a response from MS which said that the appropriate way to tell was through the Stride property on the VideoFormat object, i.e., if the stride value is negative, the image will be upside-down. However, my own testing indicates that unless I'm doing something wrong, this isn't the case. At least on my own machine, whether the stride value is zero or negative (the only options I see), the sampled image is still upside-down.
I was going to suggest looking at VideoFormat.Stride provided at VideoSink.OnFormatChange but then I noticed your edit. I went ahead and tested it at my dev machine, image is upside down and stride is negative as expected. Have you checked again recently?
Even though stride made perfect sense for native applications (using stride at pointer operations), I agree that current behavior is not what you expect from a modern API. However performance wise, it is better not to make changes on data received from native API.
Yet at this point, while we are talking about performance, why not provide samples in formats other than PixelFormatType.Format32bppArgb so that we can avoid color space conversion? BTW, there is a VideoCaptureDevice.DesiredFormat property which only works for resolution as there is no alternative to PixelFormatType.Format32bppArgb.
Not unlike a clap detector ("Clap on! clap clap Clap off! clap clap Clap on, clap off, the Clapper! clap clap ") I need to detect when a door closes. This is in a vehicle, which is easier than a room or household door:
Listen: http://ubasics.com/so/van_driver_door_closing.wav
Look:
It's sampling at 16bits 4khz, and I'd like to avoid lots of processing or storage of samples.
When you look at it in audacity or another waveform tool it's quite distinctive, and almost always clips due to the increase in sound pressure in the vehicle - even when the windows and other doors are open:
Listen: http://ubasics.com/so/van_driverdoorclosing_slidingdoorsopen_windowsopen_engineon.wav
Look:
I expect there's a relatively simple algorithm that would take readings at 4kHz, 8 bits, and keep track of the 'steady state'. When the algorithm detects a significant increase in the sound level it would mark the spot.
What are your thoughts?
How would you detect this event?
Are there code examples of sound pressure level calculations that might help?
Can I get away with less frequent sampling (1kHz or even slower?)
Update: Playing with Octave (open source numerical analysis - similar to Matlab) and seeing if the root mean square will give me what I need (which results in something very similar to the SPL)
Update2: Computing the RMS finds the door close easily in the simple case:
Now I just need to look at the difficult cases (radio on, heat/air on high, etc). The CFAR looks really interesting - I know I'm going to have to use an adaptive algorithm, and CFAR certainly fits the bill.
-Adam
Looking at the screenshots of the source audio files, one simple way to detect a change in sound level would be to do a numerical integration of the samples to find out the "energy" of the wave at a specific time.
A rough algorithm would be:
Divide the samples up into sections
Calculate the energy of each section
Take the ratio of the energies between the previous window and the current window
If the ratio exceeds some threshold, determine that there was a sudden loud noise.
Pseudocode
samples = load_audio_samples() // Array containing audio samples
WINDOW_SIZE = 1000 // Sample window of 1000 samples (example)
for (i = 0; i < samples.length; i += WINDOW_SIZE):
// Perform a numerical integration of the current window using simple
// addition of current sample to a sum.
for (j = 0; j < WINDOW_SIZE; j++):
energy += samples[i+j]
// Take ratio of energies of last window and current window, and see
// if there is a big difference in the energies. If so, there is a
// sudden loud noise.
if (energy / last_energy > THRESHOLD):
sudden_sound_detected()
last_energy = energy
energy = 0;
I should add a disclaimer that I haven't tried this.
This way should be possible to be performed without having the samples all recorded first. As long as there is buffer of some length (WINDOW_SIZE in the example), a numerical integration can be performed to calculate the energy of the section of sound. This does mean however, that there will be a delay in the processing, dependent on the length of the WINDOW_SIZE. Determining a good length for a section of sound is another concern.
How to Split into Sections
In the first audio file, it appears that the duration of the sound of the door closing is 0.25 seconds, so the window used for numerical integration should probably be at most half of that, or even more like a tenth, so the difference between the silence and sudden sound can be noticed, even if the window is overlapping between the silent section and the noise section.
For example, if the integration window was 0.5 seconds, and the first window was covering the 0.25 seconds of silence and 0.25 seconds of door closing, and the second window was covering 0.25 seconds of door closing and 0.25 seconds of silence, it may appear that the two sections of sound has the same level of noise, therefore, not triggering the sound detection. I imagine having a short window would alleviate this problem somewhat.
However, having a window that is too short will mean that the rise in the sound may not fully fit into one window, and it may apppear that there is little difference in energy between the adjacent sections, which can cause the sound to be missed.
I believe the WINDOW_SIZE and THRESHOLD are both going to have to be determined empirically for the sound which is going to be detected.
For the sake of determining how many samples that this algorithm will need to keep in memory, let's say, the WINDOW_SIZE is 1/10 of the sound of the door closing, which is about 0.025 second. At a sampling rate of 4 kHz, that is 100 samples. That seems to be not too much of a memory requirement. Using 16-bit samples that's 200 bytes.
Advantages / Disadvantages
The advantage of this method is that processing can be performed with simple integer arithmetic if the source audio is fed in as integers. The catch is, as mentioned already, that real-time processing will have a delay, depending on the size of the section that is integrated.
There are a couple of problems that I can think of to this approach:
If the background noise is too loud, the difference in energy between the background noise and the door closing will not be easily distinguished, and it may not be able to detect the door closing.
Any abrupt noise, such as a clap, could be regarded as the door is closing.
Perhaps, combining the suggestions in the other answers, such as trying to analyze the frequency signature of the door closing using Fourier analysis, which would require more processing but would make it less prone to error.
It's probably going to take some experimentation before finding a way to solve this problem.
You should tap in to the door close switches in the car.
Trying to do this with sound analysis is overengineering.
There are a lot of suggestions about different signal processing
approaches to take, but really, by the time you learn about detection
theory, build an embedded signal processing board, learn the processing
architecture for the chip you chose, attempt an algorithm, debug it, and then
tune it for the car you want to use it on (and then re-tune and re-debug
it for every other car), you will be wishing you just stickey taped a reed
switch inside the car and hotglued a magnet to the door.
Not that it's not an interesting problem to solve for the dsp experts,
but from the way you're asking this question, it's clear that sound
processing isn't the route you want to take. It will just be such a nightmare
to make it work right.
Also, the clapper is just an high pass filter fed into a threshold detector. (plus a timer to make sure 2 claps quickly enough together)
There is a lot of relevant literature on this problem in the radar world (it's called detection theory).
You might have a look at "cell averaging CFAR" (constant false alarm rate) detection. Wikipedia has a little bit here. Your idea is very similar to this, and it should work! :)
Good luck!
I would start by looking at the spectral. I did this on the two audio files you gave, and there does seem to be some similarity you could use. For example the main difference between the two seems to be around 40-50Hz. My .02.
UPDATE
I had another idea after posting this. If you can, add an accelerometer onto the device. Then correlate the vibrational and acoustic signals. This should help with cross vehicle door detection. I'm thinking it should be well correlated since the sound is vibrationally driven, wheres the stereo for example, is not. I've had a device that was able to detect my engine rpm with a windshield mount (suction cup), so the sensitivity might be there. (I make no promises this works!)
(source: charlesrcook.com)
%% Test Script (Matlab)
clear
hold all %keep plots open
dt=.001
%% Van driver door
data = wavread('van_driver_door_closing.wav');
%Frequency analysis
NFFT = 2^nextpow2(length(data));
Y = fft(data(:,2), NFFT)/length(data);
freq = (1/dt)/2*linspace(0,1,NFFT/2);
spectral = [freq' 2*abs(Y(1:NFFT/2))];
plot(spectral(:,1),spectral(:,2))
%% Repeat for van sliding door
data = wavread('van_driverdoorclosing.wav');
%Frequency analysis
NFFT = 2^nextpow2(length(data));
Y = fft(data(:,2), NFFT)/length(data);
freq = (1/dt)/2*linspace(0,1,NFFT/2);
spectral = [freq' 2*abs(Y(1:NFFT/2))];
plot(spectral(:,1),spectral(:,2))
The process for finding distinct spike in audio signals is called transient detection. Applications like Sony's Acid and Ableton Live use transient detection to find the beats in music for doing beat matching.
The distinct spike you see in the waveform above is called a transient, and there are several good algorithms for detecting it. The paper Transient detection and classification in energy matters describes 3 methods for doing this.
I would imagine that the frequency and amplitude would also vary significantly from vehicle to vehicle. Best way to determine that would be taking a sample in a Civic versus a big SUV. Perhaps you could have the user close the door in a "learning" mode to get the amplitude and frequency signature. Then you could use that to compare when in usage mode.
You could also consider using Fourier analysis to eliminate background noises that aren't associated with the door close.
Maybe you should try to detect significant instant rise in air pressure that should mark a door close. You can pair it with this waveform and sound level analysis and these all might give you a better result.
On the issue of less frequent sampling, the highest sound frequency which can be captured is half of the sampling rate. Thus, if the car door sound was strongest at 1000Hz (for example) then a sampling rate below 2000Hz would lose that sound entirely
A very simple noise gate would probably do just fine in your situation. Simply wait for the first sample whose amplitude is above a specified threshold value (to avoid triggering with background noise). You would only need to get more complicated than this if you need to distinguish between different types of noise (e.g. a door closing versus a hand clap).