I'm assigned to do a project that consists in changing the quantization in the JPEG source-code, from the quantization tables to Lloyd-Max quantization. The problem is not knowing what to do (I know how to change the quantization), but where to find the code I'm suposed to change.
If someone is familiar with the libjpeg-turbo, could you give me some advice on doing so?
I refrained from responding because it has been a long time since I have prowled around in the LIBJPEG code and I understand that it has been rewritten. The code functions well and is efficient but it is quite torturous to read and understand.
This is a C++ library that apparently was written for instructive purposes. For understandability it is about as good as you are going to get with JPEG:
http://www.colosseumbuilders.com/sourcecode/imagelib403.zip
However, if I remember correctly, this one, like LIbJPEG, combines some steps of the DCT and quantization.
Related
I am testing some convolution algorithms i found in some sites but none of them apply the matrix filters as it should.
I am writing a very simple 24 bits bmp library on my own, but now i need a little help with the convolution, i don't need FFT or complex algorithm, running time is not important at this time.
The last code i was testing was this: http://lodev.org/cgtutor/filtering.html But i didn't work fine.
Could some one indicate me a code or algorithm in C?.
Thank you very much.
You can have a look at this algorithm - this is the closest which i can find:
Convolution to blur the image
Know that the basic convolution algorithm is more or less the same, the affect changes only by the kernel values.
There is an open source C# library which provides methods to perform image convolution of simple filters. It would be an easy port to C.
The actual methods to perform convolution can be found here. The BitmapContext class is used to just wrap a pointer to bitmap. I believe in C# this is treated as int*, so this code is operating on 4 bytes at a time.
I created Image Convolution library for simple cases - https://github.com/RoyiAvital/Projects/tree/master/ImageConvolution.
It is pretty fast (OpenMP + SIMD).
Though I'm not an advanced programmer of something, just tried doing it to do first steps in utilizing SIMD.
Still, from what can be seen in VS 2015, the CPU utilization is pretty good.
If you have ideas to make it even faster, I will be happy.
Feel free to use it in any manner you'd like.
I use matlab to write a program with many iterations. It cannot be vectorized since the data processing in each iteration is related to that in the previous iteration.
Then I transform the matlab code to mex using the build-in MATLAB coder and the resulting speed is even lower. I don't know whether I need to write the mex code by myself since it seems the mex code doesn't help.
I'd suggest that if you can, you get in touch with MathWorks to ask them for some advice. If you're not able to do that, then I would suggest really reading through the documentation and trying everything you find before giving up.
I've found that a few small changes to the way one implements the MATLAB code, and a few small changes to the project settings (such as disabling responsiveness to Ctrl-C, extrinsic calls back to MATLAB) can make give a speed difference of an order of magnitude or more in the generated code. There are not many people outside MathWorks who would be able to give good advice on exactly what changes might be worthwhile/sensible for you.
I should say that I've only used MATLAB Coder on one project, and I'm not at all an expert (actually not even a competent) C programmer. Nevertheless I've managed to produce C code that was about 10-15 times as fast as the original MATLAB code when mexed. I achieved that by a) just fiddling with all the different settings to see what happened and b) methodically going through the documentation, and seeing if there were places in my MATLAB code where I could apply any of the constructs I came across (such as coder.nullcopy, coder.unroll etc). Of course, your code may differ substantially.
I often write codes in MATLAB/Python to test whether my algorithm is feasible (& actually works). I then need to convert the entire code into C and sometimes, in FORTRAN90.
What would be a good way to manually convert a medium sized code from one language to another?
I have tried :
Converting the entire code from one into another and then testing it.
(Sometimes, there are errors and bugs which just won't go away and the finding the source of the error becomes a problem)
Go line by line and check for consistency of outputs every few lines.
(Too time consuming)
Use converters like f2c.
(In my experience, they are extremely horrible. I link to a lot of libraries which have different function calls for C and Fortran)
Also,:
I am fairly conversant with the programming languages I deal with so I don't need manuals or reference guides for my work (i.e. I know the syntax).
I am not asking this question specifically about MATLAB and C but rather as a translation paradigm.
Regarding the size, the codes are less than 100 lines long.
I dont want to call the code of one language to another. Please don't suggest that.
Different languages call for different paradigms. You definitely don't write and design code the same way in eg. Matlab, Python, C# or C++. Even object hierarchies will change a lot depending on the language.
That said, if your code consists in a few interconnected procedures, then you may go away with a direct line by line translation (every language allow you to write two or three interconnected functions while remaining idiomatic). But this is the case only for the simplest programs.
Prototyping in a high level language and then implementing the same idea in a robust and clean way in a "production" language is a very good practice, but involves two very different things :
Prototype in whatever language you want. Test, experiment, and convince yourself that the idea works. Pay attention to the big picture, don't focus on performance but on the high level ideas. Pay also attention to difficulties that you encounter when implementing, as you'll face them again in step 2.
Implement from scratch the idea in the production environment in language X. It will be quicker than if you did not do the prototyping stage, since most of the difficulties have been met in stage 1. Use idiomatic X, and focus on correctness. Pay attention to corner cases, general robustness, and once it works correctly, performance. You'll notice that roughly half of your code is made of new things which did not appear in 1. (eg. error checking, corner case handling, input/output, unit testing, etc).
You can see that line by line translation is obviously not a good idea, since you don't translate into the same program.
Also, when not prototyping, I find myself throwing away the first version and making another one that I like better, ie. I find myself prototyping ! Implementing the same thing twice is not a loss of time, it is normal development flow.
You may want to consider using a higher level domain specific language with multiple backends (e.g., Matlab, C, Fortran), producing clean and idiomatic code for each target language, probably with some optimisations. If your problem domain is narrow and every piece of code is more or less typical, it should be fairly trivial to design and implement such a DSL.
Break the source down into psuedo-code with input/process/output and then write your new code base to fit that spec.
It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:
1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?
If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.
You could start by looking at Huffman Encoding. A lot of computer science classes implement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project available in Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).
To answer your questions:
C is suitable.
It depends on the algorithm, or the way you are thinking about `compression'.
My opinion will be, first decide whether you want to do a lossless compression or a lossy compression, then pick an algorithm to implement. Here are a few pointers:
For the lossless one, some are very intuitive, such as the run-length encoding,
e.g., if there is 11 as and 5 bs, you just encode them as 11a5b.
Some algorithms use a dictionary, please refer to LZW encoding.
Finally, I do recommend Huffman encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).
For lossy ones, Discrete Fourier Transform (DFT), or wavelet, is used in JPEG compression. This is useful to understand multimedia compression.
Wikipedia page is a good starting point.
Yes, C is well suited for this kind of work.
Whether you work with bytes or bits will depend on the algorithm that you decide to implement. For example, Huffman coding is inherently bit-oriented whereas many other compression algorithms are not.
C is a great choice for writing a compression program. You can use plenty of other languages too, though.
Your computer probably can't directly address units of memory smaller than a byte (pretty much by definition), so working with bytes is probably a good choice. Some of how you work with the data will be affected by the compression algorithm you choose.
Good luck!
1) Would c be a suitable programming language to accomplish this task?
Yes.
2) Should I be working in byte's with the input file? Or at a binary level somehow?
They're the same, so the question makes no sense.
not use a pre-existing compression library
Can you use a pre-existing compression algorithm? There are dozens and "compression algorithm" -- when used with Google -- will reveal a great deal of helpful information.
every c program is converted to machine code, if this binary is distributed. Since the instruction set of a computer is well known, is it possible to get back the C original program?
You can never get back to the exact same source since there is no meta-data about that saved with the compiled code.
But you can re-create code out from the assembly-code.
Check out this book if you are interested in these things: Reversing: Secrets of Reverse Engineering.
Edit
Some compilers-101 here, if you were to define a compiler with another word and not as technical as "compiler", what would it be?
Answer: Translator
A compiler translates the syntax / phrases you have written into another language a C compiler translates to Assembly or even Machine-code. C# Code is translated to IL and so forth.
The executable you have is just a translation of your original text / syntax and if you want to "reverse it" hence "translate it back" you will most likely not get the same structure as you had at the start.
A more real life example would be if you Translate from English to German and the from German back to English, the sentance structure will most likely be different, other words might be used but the meaning, the context, will most likely not have changed.
The same goes for a compiler / translator if you go from C to ASM, the logic is the same, it's just a different way of reading it ( and of course its optimized ).
It depends on what you mean by original C program. Things like local variable names, comments, etc... are not included in the binary, so there's no way to get the exact same source code as the one used to produce the binary. Tools such as IDA Pro might help you disassemble a binary.
I would guestimate the conversion rate of a really skilled hacker at about 1 kilobyte of machine code per day. At common Western salaries, that puts the price of, say, a 100 KB executable at about $25,000. After spending that much money, all that's gained is a chunk of C code that does exactly what yours does, minus the benefit of comments and whatnot. It is no way competitive with your version, you'll be able to deliver updates and improvements much quicker. Reverse engineering those updates is a non trivial effort as well.
If that price tag doesn't impress you, you can arbitrarily raise the conversion cost by adding more code. Just keep in mind that skilled hackers that can tackle large programs like this have something much better to do. They write their own code.
One of the best works on this topic that I know about is:
Pigs from sausages? Reengineering from assembler to C via FermaT.
The claim is you get back a reasonable C program, even if the original asm code was not written in C! Lots of caveats apply.
The Hex-Rays decompiler (extension to IDA Pro) can do exactly that. It's still fairly recent and upcoming but showing great promise. It takes a little getting used to but can potentially speed up the reversing process. It's not a "silver bullet" - no c decompiler is, but it's a great asset.
The common name for this procedure is "turning hamburger back into cows." It's possible to reverse engineer binary code into a functionally equivalent C program, but whether that C code bears a close resemblance to the original is an open question.
Working on tools that do this is a research activity. That is, it is possible to get something in the easy cases (you won't recover local variables names unless debug symbols are present, for instance). It's nearly impossible in practice for large programs or if the programmer had decided to make it difficult.
There is not a 1:1 mapping between a C program and the ASM/machine code it will produce - one C program can compile to a different result on different compilers or with different settings) and sometimes two different bits of C could produce the same machine code.
You definitely can generate C code from a compiled EXE. You just can't know how similar in structure it will be to the original code - apart from variable/function names being lost, I assume it won't know the original way the code was split amongst many files.
You can try hex-rays.com, it has a really nice decompiler which can decompile assembly code into C with 99% accuracy.