I'm trying to write an audio analysis application, and I need to identify local maxima in a 2D array which represents a spectrogram. I've already got an open source library that can generate the spectrogram using Fast Fourier Transforms, but I was wondering if anybody knew of any good libraries to help me with actually finding the maxima? I'm not quite sure what to search Google for - the best I could think of was "numerical library" but that hasn't got me very far.
Preferably in C, but I'm open to other suggestions.
Peak finding is a fairly general problem. It has already been discussed once on SO as Peak detection of measured signal.
The answers provided include several viable heuristics.
Of course, I prefer my own answer if you need rigor, but ROOT is written in c++, and is almost certainly too heavy for your application, so you'll need to strip out just the code you want...
The GNU Scientific Library features a multidimensional minimization framework that can be made to work for maximization easily enough. It's designed to only return a single minimum rather than a bunch of different minima, however.
Related
I'm currently doing a primality test on huge numbers (up to 10M digits).
Right now, I'm using a c program using the GMP library. I did some parallelization using OpenMP and got a nice speedup (3.5~ with 4 cores). The problem is that I don't have enough CPU cores to make it feasible to run with my whole dataset.
I have an NVidia GPU and, I tried to find an alternative to GMP, but for GPUs. It can be either CUDA or OpenCL.
Is there an arbitrary precision library that I can run on my GPU? I'm also open to using another programming language if there is a simple or more elegant way of doing it.
It seems the Julia Language is already able to do multiprecision arithmetic and use the GPU (see here for a simple example combining these two), but you might have to learn Julia and rewrite your code.
The CUMP library is meant to be a GMP substitute for CUDAs, it attempts to make it easier for porting GMP code to CUDAs by offering an interface similar to GMP, for example you replace mpf_... variables and functions by cumpf_... ones. There is a demo you can make if it fits your CUDA. No documentation or support though, you'll have to go through the code to see if it works.
The CAMPARY library from people in LAAS-CNRS might be a shot as well, but no documentation either. It has been applied in more research than CUMP, so some chance there. An answer here gives some clarification on how to use it.
GRNS uses the residue number system on CUDA-compatible processors, it has no documentation but an interesting paper is available. Also, see this one.
XMP comes directly from NVIDIA labs, but seems incomplete and has no docs also. Some info here and a paper here.
XMP 2.0 seems newer but only support sizes up to 32k bits for now.
GPUMP looks promising, but seems not available for download, perhaps by contacting the authors.
The MPRES-BLAS is a multiprecision library for linear algebra on CUDAs, and does - of course - have code for basic arithmetic. Also, papers here and here.
I haven't tested any of those, so please let us know if any of those work well.
Facebook created an opensource fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios, called ZStandard.
I have been looking for a tutorial that describes the c to swift wrapping, like this, but looks not comprehensive enough, what are the prerequests do I have to know before writing a wrapper?
when I finish writing it, I will also make it open source.
Thank you for a good question. I've looked at the library and played with it, it seems pretty interesting.
I would say you need to be comfortable using the ZSTD C library in a C program. You need to be also comfortable programming in Swift. Depending on the parts of the API you want to wrap, you may need to understand how to deal with raw memory in Swift (Unsafe... types).
Some challenges to consider when wrapping ZSTD:
The streaming API with dictionaries is experimental and subject to
change, yet use of dictionaries is one of the advantages of ZSTD.
When dealing with memory buffers, we want to minimize copying them,
since the buffers may be quite large and copying them would adversely
affect performance. This, of course, complicates memory management.
There is a multitude of approaches you can choose from when writing a wrapper. For example, you can write wrappers in C that will expose simple APIs you will wrap in Swift. You can include C code in your wrapper framework, or you can keep it in separate C libraries.
I have come up with a simple example of a wrapper around a small subset of the API, you can take a look at https://github.com/omniprog/SwiftZSTD.
I'm new to CUDA programming and I was wondering how the performance of pyCUDA is compared to programs implemented in plain C.
Will the performance be roughly the same? Are there any bottle necks that I should be aware of?
EDIT:
I obviously tried to google this issue first, and was surprised to not find any information. i.e. I would have excepted that the pyCUDA people have this question answered in their FAQ.
If you're using CUDA -- whether directly through C or with pyCUDA -- all the heavy numerical work you're doing is done in kernels that execute on the gpu and are written in CUDA C (directly by you, or indirectly with elementwise kernels). So there should be no real difference in performance in those parts of your code.
Now, the initialization of arrays, and any post-work analysis, will be done in python (probably with numpy) if you use pyCUDA, and that generally will be significantly slower than doing it directly in a compiled language (though if you've built your numpy/scipy in such a way that it links directly to high-performance libraries, then those calls at least would perform the same in either language). But hopefully, your initialization and finalization are small fractions of the total amount of work you have to do, so that even if there is significant overhead there, it still hopefully won't have a huge impact on overall runtime.
And in fact if it turns out that the python parts of the computation does hurt your application's performance, starting out doing your development in pyCUDA may still be an excellent way to get started, as the development is significantly easier, and you can always re-implement those parts of the code that are too slow in Python in straight C, and call those from python, gaining some of the best of both worlds.
If you're wondering about performance differences by using pyCUDA in different ways, see SimpleSpeedTest.py included in the pyCUDA Wiki examples. It benchmarks the same task completed by a CUDA C kernel encapsulated in pyCUDA, and by several abstractions created by pyCUDA's designer. There's a performance difference.
I've been using pyCUDA for a little while an I like prototyping with it because it speeds up the process of turning an idea into working code.
With pyCUDA you will be writing the CUDA kernels using C++, and it's CUDA, so there shouldn't be a difference in performance of running that code. But there will be a difference in the performance of the code you write in Python to setup or use the results of the pyCUDA kernel vs the one you write in C.
I was looking for an answer for the original question in this post and I see the problem Is deeper as I thought.
I my experience, I compared Cuda kernels and CUFFT's written in C with that written in PyCuda. Surprisingly, I found that, on my computer, the performance of suming, multiplying or making FFT's vary from each implentatiom. For example, I got almost the same performance in cuFFT for vector sizes until 2^23 elements. However, suming and multiplying complex vectors show some troubles. The speed up obtained in C/Cuda was ~6X for N=2^17, whilst in PyCuda only ~3X. It also depends on the way that the sumation was performed. By using SourceModule and wrapping the Raw Cuda code, I found the problem that my kernel, for complex128 vectors, was limitated for a lower N (<=2^16) than that used for gpuarray (<=2^24).
As a conclusion, is a good job testing and comparing the two sides of the problem and evaluate if it is convenient spend time in writing a Cuda script or gain readbility and pay the cost of a lower performance.
Make sure you're using -O3 optimizations there and use nvprof/nvvp to profile your kernels if you're using PyCUDA and you want to get high performance. If you want to use Cuda from Python, PyCUDA is probably THE choice. Because interfacing C++/Cuda code via Python is just hell otherwise. You have to write a hell lot of ugly wrappers. And for numpy integration even more hardcore wrap-up code would be necessary.
I am faced with the task of building a new component to be integrated into a large existing C codebase. The component is essentially a kind of compiler, and will be complicated enough that I would like to write it in OCaml (for reasons along the lines of those given here). I know that OCaml-C interaction is possible (as per the manual and this tutorial), but it looks somewhat painful.
What I'd like to know is whether others here have attempted large-scale integration of OCaml and C code, what were some of the unexpected gotchas they found, and whether at the end of the day they concluded that they would have been better off just writing the new code in C.
Note, I'm not trying to start a debate about the merits of functional versus imperative programming: let's just say we assume that OCaml happens to be the right tool for the job I have in mind, and the potential difficulty in integration is the only issue. I also don't have the option of rewriting the rest of the codebase.
To give a little more detail about the task: the component I need to implement is a certain kind of query optimizer that incorporates some research ideas my group at UC Davis is working on, and will be integrated into PostgreSQL so that we can run experiments. (A query optimizer is, essentially, a compiler.) The component would be invoked from C code, would function mostly independently but would make a certain number of calls to other PostgreSQL components to retrieve things like system catalog information, and would construct a complex C data structure (representing a physical query plan) as output.
Apologies for the somewhat open-ended question, but I'm hoping the community might be able to save me a little trouble :)
Thanks,
TJ
Great question. You should be using the better tool for the job.
If in fact your intentions are to use the better tool for the job (and you are sure lexx and yacc are going to be a pain) then I have something to share with you; it's not painful at all to call ocaml from c, and vice versa. Most of the time I've been writing ocaml calling C, but I have written a few the other way. They've mostly been debug functions that don't return a result. Although, the callings back and fourth is really about packing and unpacking the ocaml value type on the C side. That tutorial you mention covers all of that, and very well.
I'm opposed to Ron Savage remarks that you have to be an expert in the language. I recall starting out where I work, and within a few months, without knowing what a "functor" was, being able to call C, and writing a thousand lines of C for numerical recipes, and abstract data types, and there were some hiccups (not with unpacking types, but with garbage collection of an abstract data-types), but it wasn't bad at all. Most of the inner loops in the project are written in C --taking advantage of SSE, external libraries (lapack), tighter optimized loops, and some in-lined hand optimized assembly.
I think you might need to be experienced with designing a large project and demarcating functional and imperative sections. I would really assess how much ocaml you are going to be writing, and what kind of values you want to pass to C --I'm saying this because I'd be fearful of recommending to someone to pass a recursive data-structure from ocaml to C, actually, it would be lots of unpacking tuples, their contents, and thus a lot of possibility for confusion and bugs.
I one wrote a reasonably complex OCaml-C hybrid program. I was frustrated by what I found to be inadequate documentation, and I ended up spending too much time dealing with garbage collection issues. However, the resulting program worked and was fast.
I think there is a place for OCaml-C integration, but make sure it is worth the hassle. It might be simpler to have the programs communicate over a socket (assuming such IO operations won't eliminate the performance you want). It might also be more sane to just write the whole thing in C.
Interoperability is the achilles heel of standalone implementations of statically typed languages, particularly those without JIT compilation like OCaml. My own experience having been using OCaml for over 5 years is that the only reliable bindings are across simple APIs that do little more than pass large arrays, e.g. LAPACK. Even slightly more complicated bindings like those to FFTW took years to stabilize and others, like OpenGL and GLU, remain an unsolved problem. In particular, I found major bugs in binding code written by two of the authors of the OCaml compiler. If they cannot get it right then there is little hope for the rest of us...
However, all is not lost. The solution is simply to use looser bindings. Rather than handling interoperability at the C level with a low-level type-unsafe interface, use a high-level interface like XML-RPC with string passing or even over sockets. This is much easier to get right and, as you say, will let you leverage the enormous practical benefits offered by OCaml for this application.
My rule of thumb is to stick with the language / model / style used in the existing code-base, so that future maintenance developers inherit a consistent and understandable set of application code.
The only way I could justify something like what you are suggesting would be if:
You are an Expert at OCaml AND a Novice at C (so you'll be 20x as productive)
You have successfully integrated it with a C library before (apparently not)
If you are at all more familiar with C than OCaml, you've just lost any "theoretical" gain from OCaml being easier to use when writing a compiler - plus it seems at though you will have more peers familiar with C around you than OCaml.
That's my "grumpy old coder" 2 cents (which used to only cost a penny!).
I'm working on a small application and thinking about integrating BLAST or other local alignment searches into my application. My searching has only brought up programs, which need to be installed and called as an external program.
Is there a way short of me implementing it from scratch? Any pre-made library perhaps?
Does it have to be in C, or would C++ also be OK? If so, you might want to look at the SeqAn library here.
This is a topic which has also to do with reproducibility of results: it is always better to use the raw blast binary provided by NCBI or UCSC, because it will make your results easeir to reproduce by other scientists and will save you a lot of time spent on writing tests (more time than you can imagine).
For the day-to-day work I have often used exonerate, a tool written in C which can do both global and local alignment, has a simple unix-like interface, and doesn't require to format your input as with blast.
Moreover, take in mind that people usually use a combination of makefiles and scripts to define a pipeline, instead of calling everything from a script: most programming languages are not good to define pipelines, while automated build tools like Make are not useful for scripting tasks. Have a look at these examples: http://skam.sourceforge.net/skam-intro.html http://swc.scipy.org/lec/build.html
I just stumbled across the thing I would have wanted: The NCBI C++ Toolkit. Thanks for all the suggestions though.
The BLAST algorithm was implemented ~20 years ago, it is now a very big algorithm and I cannot imagine it can be easily implemented from scratch. You can try to learn about it when looking at the sources of the 'blastall' program in the NCBI toolkit.
A simpler pairwise algorithm (Swith Waterman, Needleman-Wunsch )should be easier to implement:
Computational Molecular Biology: An Introduction has code for Smith-Waterman and other dynamic programming alignment algorithms.
I use NetBLAST through the blastcl3 client binary. I believe that the blastcl3 binary is a pretty thin client for the NetBLAST web service.
If so, it shouldn't be too hard to sniff the packets and implement your own client. Depending on your use case, this might be faster/easier than implementing your own alignment algorithm. It does, however, introduce a dependency to NCBI's web services.
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/netblast.html
I posted a similar question (running BLAST (bl2seq) without creating sequence files)
Basically, the answer I came up with was running this command:
bl2seq -i<(echo sequence1) -j(echo sequence2) -p blastn
That pipes the result of the echo command to the bl2seq (blast 2 sequences) program.
But I couldn't get it to work via calling system from Python