wrapping a library like facebook's ZStandard in swift? - c

Facebook created an opensource fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios, called ZStandard.
I have been looking for a tutorial that describes the c to swift wrapping, like this, but looks not comprehensive enough, what are the prerequests do I have to know before writing a wrapper?
when I finish writing it, I will also make it open source.

Thank you for a good question. I've looked at the library and played with it, it seems pretty interesting.
I would say you need to be comfortable using the ZSTD C library in a C program. You need to be also comfortable programming in Swift. Depending on the parts of the API you want to wrap, you may need to understand how to deal with raw memory in Swift (Unsafe... types).
Some challenges to consider when wrapping ZSTD:
The streaming API with dictionaries is experimental and subject to
change, yet use of dictionaries is one of the advantages of ZSTD.
When dealing with memory buffers, we want to minimize copying them,
since the buffers may be quite large and copying them would adversely
affect performance. This, of course, complicates memory management.
There is a multitude of approaches you can choose from when writing a wrapper. For example, you can write wrappers in C that will expose simple APIs you will wrap in Swift. You can include C code in your wrapper framework, or you can keep it in separate C libraries.
I have come up with a simple example of a wrapper around a small subset of the API, you can take a look at https://github.com/omniprog/SwiftZSTD.

Related

Looking for a source of middle sized code chunks

The longer I'm working as a C developer I find myself lacking some source of middle sized code chunks.
I have source of code snippets and libraries, but I can't find a good source for code sized in between. Something that is a header, or a header+implementation file but isn't a library but is included into the project.
Stuff like a dynamic array, or linked list or some debugging or logging helpers.
I know that its partially due to C developers DIY mentality, but I just don't believe that people don't share stuff like this.
You might want to check out http://nothings.org for some single file (moderately sized) projects that include (image) decompression, font rasterization and other useful things.
You may also want to look at CCAN.
http://www.koders.com/ is worth checking. You might find something usefull now and then.
You can also sort the results by license which is pretty handy feature.
There's a handful of utility libraries that spring to mind quickly; glib provides a wide variety of useful little utilities, including:
doubly- and singly-linked lists, hash tables, dynamic strings and string utilities, such as a lexical scanner, string chunks (groups of strings), dynamic arrays, balanced binary trees, N-ary trees
(And yes, glib is useful even in non-graphical environments; don't let its GNOME-background fool you. :)
The Apache portable runtime is a library that helps abstract away platform-specific knowledge; I've seen a handful of programs use it. It feels like enough programmers are content with "It runs on Linux" to not really worry about platform differences, and forgo learning Yet Another Library as a result. It feels more like a systems-level toolkit:
Memory allocation and memory pool functionality, Atomic operations, Dynamic library handling, File I/O, Command argument parsing, Locking, Hash tables and arrays, Mmap functionality, Network sockets and protocols, Thread, process and mutex functionality, Shared memory functionality, Time routines, User and group ID services
I always look at the Python (C) source code first when I am looking for the "best" way to code something up in C. Guido van Rossum's C coding style is concise and clear and given the number functions and features supported in the standard python libraries there is nearly always a useful/relevent snippet of code in there.

Mixing OCaml and C: is it worth the pain?

I am faced with the task of building a new component to be integrated into a large existing C codebase. The component is essentially a kind of compiler, and will be complicated enough that I would like to write it in OCaml (for reasons along the lines of those given here). I know that OCaml-C interaction is possible (as per the manual and this tutorial), but it looks somewhat painful.
What I'd like to know is whether others here have attempted large-scale integration of OCaml and C code, what were some of the unexpected gotchas they found, and whether at the end of the day they concluded that they would have been better off just writing the new code in C.
Note, I'm not trying to start a debate about the merits of functional versus imperative programming: let's just say we assume that OCaml happens to be the right tool for the job I have in mind, and the potential difficulty in integration is the only issue. I also don't have the option of rewriting the rest of the codebase.
To give a little more detail about the task: the component I need to implement is a certain kind of query optimizer that incorporates some research ideas my group at UC Davis is working on, and will be integrated into PostgreSQL so that we can run experiments. (A query optimizer is, essentially, a compiler.) The component would be invoked from C code, would function mostly independently but would make a certain number of calls to other PostgreSQL components to retrieve things like system catalog information, and would construct a complex C data structure (representing a physical query plan) as output.
Apologies for the somewhat open-ended question, but I'm hoping the community might be able to save me a little trouble :)
Thanks,
TJ
Great question. You should be using the better tool for the job.
If in fact your intentions are to use the better tool for the job (and you are sure lexx and yacc are going to be a pain) then I have something to share with you; it's not painful at all to call ocaml from c, and vice versa. Most of the time I've been writing ocaml calling C, but I have written a few the other way. They've mostly been debug functions that don't return a result. Although, the callings back and fourth is really about packing and unpacking the ocaml value type on the C side. That tutorial you mention covers all of that, and very well.
I'm opposed to Ron Savage remarks that you have to be an expert in the language. I recall starting out where I work, and within a few months, without knowing what a "functor" was, being able to call C, and writing a thousand lines of C for numerical recipes, and abstract data types, and there were some hiccups (not with unpacking types, but with garbage collection of an abstract data-types), but it wasn't bad at all. Most of the inner loops in the project are written in C --taking advantage of SSE, external libraries (lapack), tighter optimized loops, and some in-lined hand optimized assembly.
I think you might need to be experienced with designing a large project and demarcating functional and imperative sections. I would really assess how much ocaml you are going to be writing, and what kind of values you want to pass to C --I'm saying this because I'd be fearful of recommending to someone to pass a recursive data-structure from ocaml to C, actually, it would be lots of unpacking tuples, their contents, and thus a lot of possibility for confusion and bugs.
I one wrote a reasonably complex OCaml-C hybrid program. I was frustrated by what I found to be inadequate documentation, and I ended up spending too much time dealing with garbage collection issues. However, the resulting program worked and was fast.
I think there is a place for OCaml-C integration, but make sure it is worth the hassle. It might be simpler to have the programs communicate over a socket (assuming such IO operations won't eliminate the performance you want). It might also be more sane to just write the whole thing in C.
Interoperability is the achilles heel of standalone implementations of statically typed languages, particularly those without JIT compilation like OCaml. My own experience having been using OCaml for over 5 years is that the only reliable bindings are across simple APIs that do little more than pass large arrays, e.g. LAPACK. Even slightly more complicated bindings like those to FFTW took years to stabilize and others, like OpenGL and GLU, remain an unsolved problem. In particular, I found major bugs in binding code written by two of the authors of the OCaml compiler. If they cannot get it right then there is little hope for the rest of us...
However, all is not lost. The solution is simply to use looser bindings. Rather than handling interoperability at the C level with a low-level type-unsafe interface, use a high-level interface like XML-RPC with string passing or even over sockets. This is much easier to get right and, as you say, will let you leverage the enormous practical benefits offered by OCaml for this application.
My rule of thumb is to stick with the language / model / style used in the existing code-base, so that future maintenance developers inherit a consistent and understandable set of application code.
The only way I could justify something like what you are suggesting would be if:
You are an Expert at OCaml AND a Novice at C (so you'll be 20x as productive)
You have successfully integrated it with a C library before (apparently not)
If you are at all more familiar with C than OCaml, you've just lost any "theoretical" gain from OCaml being easier to use when writing a compiler - plus it seems at though you will have more peers familiar with C around you than OCaml.
That's my "grumpy old coder" 2 cents (which used to only cost a penny!).

What language should we use to let people extend our terminal/sniffer program?

We have a very versatile terminal/sniffer application which can do all sorts of things with TCP, UDP and serial connections.
We are looking to make it extensible -- i.e, allow people to write their own protocol parsers, highlighters, etc.
We created a C-like language for extending the product, and then discovered that for some coders, this presents a steep learning curve.
We are now pondering the question: Should we stick to C or go with something like Ruby or Lua?
C is beautiful for low-level stuff (like parsing binary data), because it supports pointers. But for exactly that reason, it can be tough to learn.
Ruby (etc) are easy to learn, but don't have pointers, so anything that has to do with parsing binary data gets ugly very fast.
What do you think? For extending a product that parses binary data -- Ruby/Lua or C/C++?
Would be great if you could give some background when you respond -- especially if you've done something similar.
Wireshark, the "world's foremost network protocol analyzer", is also a packet sniffer/analyzer, formerly also called Ethereal. It uses Lua to enable writing custom dissectors and taps, see the manual.
However, note that I have not used it, so I cannot tell how nice/effective/easy to learn the API is.
Like TCL, Lua was designed to be tightly integrated with an application. Personally, I find Lua's syntax and idioms to be much easier to deal with than TCL.
Lua is easy to integrate with an existing system, and easy to extend. It is also fairly easy to create safe sandboxes in which user-supplied code can run without full access to the innards of your product.
If you have an API written does it make a difference? The person using the C-like API would only have to understand the difference between passing by value or reference.
Your core does one thing very good, so fine. Let it be that way. I think you should create an API based on std in/out, just like the way of good unix design. Then anyone can extend it in any language of choice.
Tcl was designed with the goal to allow scripting for C programs, so it would be much easier to implement.
http://en.wikipedia.org/wiki/Tcl#Interfacing_with_other_languages
I second Johan's idea. Although in past when I had to do something like this I stuck to
C language APIs and people were restricted to use C language only. But now I look at it,
I realize that it would have been more efficient if we would have done the way Johan describes
PS: And by coincidence it was a protocol testing app using packet sniffer
perl, sed, awk, lex, antler, ... These are languages I'm somewhat familiar with that I'd like to write something like this in. It depends on the data flow, though.
It's really hard to say what the correct thing to use is. Something that I don't think anyone else has mentioned is to keep in mind that the scripts will have bugs. It's very easy to design something like this in such a way that bugs in the scripts (especially run time errors) either just show up a "error in script" or kill the whole system.
You should keep that the scripts should be unit testable and that failures should be reproducible.
I don't think it matters what you do as long as you do one thing, drop the in-house language. It sounds like you choose to make C into a scripting language. One issue I see with this is it will look familiar to C programmers, but not be the same. I can't imagine you have mimicked the semantics of C that would make existing C programmers comfortable. And as you have mentioned, others will find it hard to learn.
The company I am working at have developed their own language. It uses XML for structure so parsing is easy. The language grows "as needed." Meaning if a feature is missing then it will be added. I'm pretty sure it went from an XML database to something that needed control flow. But my point is that if you aren't thinking about building it as a language, then you'll be limiting what users can do with it unintentionally.
Personally I've been looking at how I can get the company to start taking advantage of Lua. And specifically Lua for several reasons. Lua was developed as an extension language that was general purpose. It easily interfaces with the language, including Python and Ruby. It is small and simple for use by non-programmers (not really needed in your case). It is simple enough to replace XML, INI... for configuration settings and powerful enough to replace the need for another programming language.
http://www.lua.org/spe.html

Best Practice for Multi-programming-language Projects

Does anyone have any experience with doing this? I'm working on a Java decompiler right now in C++, but would like a higher level language to do the actual transformations of the internal trees. I'm curious if the overhead of marshaling data between languages is worth the benefit of a more expressive and language for better articulating what I'm trying to accomplish (like Haskell). Is this actually done in the "real world", or is it usually pick a language at the beginning of a project and stick with it? Any tips from those who have attempted it?
I'm a big advocate of always choosing the right programming language for each challenge. If there is another language which handles some otherwise tricky task easily, I'd say go for it.
Does it happen in the real world? Yes. I am currently working on a project which is made up of both PHP and objective-c code.
The trick is, as you pointed out, the communication between the two languages. If at all possible, let each language stick to its own domain, and have the two sections communicate in the simplest way possible. In my case, it was XML documents sent via http. In your case, some kind of formatted text file might be the answer.
Marshalling costs depend on the languages and architecture you're working with. For example, if you're on the CLR or JVM, there are low-cost interop solutions available - though I know you are working with probably unmanaged C++.
Another avenue is an embedded domain-specific language. Tree transformations are often expressible via pattern matching and application of a relatively small number of functions. You could consider writing a simple tree pattern-matcher - e.g. something that looks like Lisp s-exprs but uses placeholders to capture variables - with associated actions that are functions that transform the matched subtree.
John Ousterhout, the inventor of Tcl/Tk was a stong advocate of multi-language programming and wrote quite extensively about it. In order to do it, you need a clean interface mechanism between the languages you are using for it. There are quite a few mechanisms for this. Examples of different mechanisms for doing this are:
SWIG (Simplified Wrapper and
Interface Generator can take a c
or c++ (or several other languages)
header file and generate an
interface for a high level language
such as perl or python that allows
you to access the API. There are
other systems that use this
approach.
Java supports JNI, and various
other systems such as Python's
ctypes, VisualWorks DLL/C
connect are native mechanisms
that allow you to explicitly
construct the call to the lower
level subsystem.
Tcl/Tk was designed explicitly to be
embeded, and has a native API
for a C library to add hooks into
the language. The constructs for
this resemble argv[] structures in
C, and were designed to make it
relatively easy to interface a
command-line based C program into
Tcl. This is similar to the above
example, but coming from the opposite
direction. Many scripting languages
such as Python, Lua and Tcl support
this type of mechanism.
Explicit glue mechanisms such as
Pyrex, which are similar to a
wrapper generator, but have their
own language for defining the
interface. Pyrex is actually a
complete programming language.
Middleware such as COM or
CORBA allow a generic
interface definition to be built
externally to the application in an
interface definition language
and language bindings for the
languages concerned to use the
common interface mechanism.

A C library for finding local maxima?

I'm trying to write an audio analysis application, and I need to identify local maxima in a 2D array which represents a spectrogram. I've already got an open source library that can generate the spectrogram using Fast Fourier Transforms, but I was wondering if anybody knew of any good libraries to help me with actually finding the maxima? I'm not quite sure what to search Google for - the best I could think of was "numerical library" but that hasn't got me very far.
Preferably in C, but I'm open to other suggestions.
Peak finding is a fairly general problem. It has already been discussed once on SO as Peak detection of measured signal.
The answers provided include several viable heuristics.
Of course, I prefer my own answer if you need rigor, but ROOT is written in c++, and is almost certainly too heavy for your application, so you'll need to strip out just the code you want...
The GNU Scientific Library features a multidimensional minimization framework that can be made to work for maximization easily enough. It's designed to only return a single minimum rather than a bunch of different minima, however.

Resources