Extract function source code from existing open source C library

Extract function source code from existing open source C library - c

I need to extract source code for a function from the existing C library (the library is open source). The problem is that functions are created using macros in header files, and when I write a test project and link the library to it the debugger points me to that header file on 'go to definition' action. I have the source code of the library and I guess i need to build it together with my test code (maybe this is not correct, I am not sure). Any advice on how to proceed, what to use? Thank you.

I need to extract source code for a function from the existing C library (the library is open source).
Several C compilers are themselves open source. Both GCC and Clang are (and so is tinycc). So you legally could improve them (but that could take months of work).
In addition, recent GCC versions (e.g. in july 2020, GCC 10) accept plugins. Your GCC plugin could work on some internal GCC representations (e.g. GIMPLE, GENERIC) so will know about functions (even obtained by preprocessor expansion).
You could also consider using some open source static program analyzers, such as Frama-C or Clang static analyzer.
PS. Take into account open source license issues (legal ones). I am not a lawyer (and you might need to ask one, if you mix various software of different open source licenses).

Related

How do I read the source code for a C library in CLion

OS: Deepin 20 (base on Debian 10)
CLion: 2020.1.2
GCC: gcc (Uos 8.3.0.3-3+rebuild) 8.3.0
Make: 4.2.1 x86_64-pc-linux-gnu
Cmake: 3.18.1
I am a newcomer who just started learning C language. When I was writing C code using CLion, I could access it by Ctrl + mouse click .
I'm calling the method inside the header function. For example, if I use printf , I can access the stdio.h file, which can be seen at line 332 extern int printf (const char * ___, RESTRICT, format,...) ; .
But if I want to see the details of this method
I can't see it. According to Navigate in the code structure
Use Ctrl+Alt+Home to switch. But the IDE prompts No related file .
How can I get the source code to call a method? I want to learn from the good experiences of others by looking at their implementation logic in their libraries
Thank you for your review. I would really appreciate it if you could help me.

Even if most of GNU/Linux software is open source, it is not installed (in source code form) by default on your computer.
Regarding C programming, see Modern C (and the C11 standard n1570) and read the documentation of your C compiler (perhaps GCC or Clang, or simpler ones like nwcc or tinycc), your linker (probably binutils), your build automation tool (e.g. GNU make or ninja or cmake). Enable all warnings and DWARF debug info, so if using GCC compile with at least gcc -Wall -Wextra -g; then improve your C code to get no warnings. Once you have debugged your C source code (using GDB and perhaps valgrind), add optimization flags such as -O2. Order of arguments to gcc matters!
Consider, for some tasks, generating some of your C code (perhaps some #include-d header file) with tools like GNU bison, ANTLR, SWIG, RPCGEN, AWK, GUILE, GPP, GNU m4, GNU autoconf - or your own program or script.
I want to learn from the good experiences of others by looking at their implementation logic in their libraries
You need to fetch the source code from elsewhere.
For examples, see GNU libc or musl-libc, and the Linux kernel (and others: GTK, PostGreSQL, sqlite, GUILE, etc.... including many open source programs mentionned in this answer) and look also on websites like github, gitlab, sourceforge
Read also Advanced Linux Programming and syscalls(2). See also http://linuxfromscratch.org/
In 2020, a recent GCC compiler happens to handle specially calls to printf when asked to optimize. See the softwareheritage and Frama-C projects.
In some cases, consider accepting plugins in your program with dlopen(3) and dlsym(3) (see also elf(5) and How to Write Shared Libraries). You might even generate some code at runtime with libraries like libgccjit (or generate C code at runtime, then compile it as a plugin, and load it; such an approach is called metaprogramming and is related to partial evaluation; see also the blog of the late J.Pitrat for more insights).
Of course, you need tools to navigate in source code. Consider using GNU emacs combined with GNU grep for that, or some other source navigator. For large programs of millions of source code lines, consider writing your own GCC plugin to understand them.
Use also tools like strace(1) and GDB to understand the dynamic behavior of programs.
Expect several months of full time work to explore all this.
You could be interested by ACM conference papers also.
For your own source code, consider using some version control tool such as git. Of course read its documentation. And use LibreOffice, Lout or LaTeX, MarkDown (perhaps combined with inkscape or diagrams for figures) to write the documentation of your software.
In some cases, you might consider generating parts of the documentation from parts of your source code (e.g. using literate programming techniques like nuweb or documentation generators like doxygen).

When do I need to care about static vs. dynamic linking in C for programs which use std functionality?

All of my programs tend to be rather rudimentary console applications in C. For example, I may write some code to parse a file header and then print some data from the header to the screen. To do this, I would just use functions/symbols from stdio.h stdlib.h, string.h, stdbool.h such as printf(), fopen(), fread(), etc... I usually get away with writing my code in the main.c file as well as several .h files and .c files to go along with them. When it comes time to compile, I will do something like: gcc main.c file1.c file2.c -g -Wall -o my_program
The program runs fine, and with my colleagues, I simply share the source code, or if they're on the same OS, I share the binary and they can typically just either build the code just as I did and run it, or run the binary directly. If my colleague is on a different OS, he/she will just build the source on that machine OR I will build it for them on a machine I have with that OS.
I've never really had to consider how my dependencies were being linked at all in fact. This is probably because I write mostly internal tools and am not releasing to large audiences. That being said, in which situations would the above method fail to run on a system? Is it possible that somebody who has the same version of gcc installed would not be able to just run my executable or build the code themselves, then run it, when I'm only using std C functionality? In fact, I've taken my very same C code from a linux box, copy/pasted it into Visual Studio and compiled with MSVC, and it still works fine with the standard functions... So even cross-compiler, I've not needed to think about the linking yet.

For Linux/Linux compatibility:
Usually, as long as you move to machine which has the same/better glibc (and other libraries that you use), you will not face problem. The glibc (C standard library/runtime) is very good at being backward compatible. It will usually work across distributions. In most cases, you can take your binary to a machine with a minor lower version of the library, and it will work (minor versions are suppose to have only bug fixes, so unless your code trigger bugs it should work).
For Linux/Windows compatibility: in most cases, you will need to recompile, as the libraries runtimes and executable formats are different. Disclaimer: I'm not an expert on this topic.

why can I use stdio.h without a corresponding stdio.c [duplicate]

This may seem a little stupid:) But it's been bothering a while. When I include some header files which are written by others in my C++/C program, how does the compiler know where is the implementation of the class member function declared in the header files?
Say I want to write some program which takes advantage of the OpenCV library. Normally I would want to use:
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
However, these are just header files which, as far as I can tell, only declares functions but without implementation. Then how does the compiler know where to find the implementation? Especially when I want to build a .so file.
There is a similar post. Basically it said thrid-party library, esp. commercial product don't release source code, so they ship the lib file with the header. However, it didn't make clear how does the compiler know where to find the lib file. In addition, The answer in that post mentioned if I want to compile the code of my own, I would need the source code of the implementation of those header files. Does that mean I cannot build a .so file without the source of the implementation?

In general, the implementation is distributed as form of pre-compiled libraries. You need to tell the compiler where they are located.
For example, for gcc, quoting the online manual
-llibrary
-l library
Search the library named library when linking. [...]
and,
-Ldir
Add directory dir to the list of directories to be searched for -l.
Note: you don't need to explicitly specify the standard libraries, they are automatically linked. Rather, if you don't want them to be linked with you binary, you need to inform the compiler by passing the -nostdlib option.

The exact answer is platform specific, but in general I'd say that some libraries are in fact header-only, and others include the implementation of the library's methods in binary object files. I believe OpenCV belongs to the second kind, i.e. provides the implementation in object files, for either dynamic or static linking, and your program links against them. If your build works, then it is already configured to link against those libraries. At this point the details become very much platform and build-system specific.
Note that for common platforms like Windows, Mac and Linux you seldom need to build popular libraries like OpenCV yourself. You mentioned .so files, which implies dynamic linking on Linux. This library is open-source so in theory you could build it yourself, but in practice I'd much rather use my distribution's package installation tool (e.g. apt-get or yum) to install opencv-dev (or something similar) from my distribution's repository.

As the others already explained, you need to tell your compiler where to look for the files.
This implies that you should know which path to specify for your compiler.
Some components provide a mechanism where you don't need to know the exact path but can automatically retrieve it from your system.
For example if you want to compile using GTK+3 you need to specify these flags for your compiler:
CFLAGS:= -I./ `pkg-config --cflags gtk+-3.0`
LIBS:= -lm `pkg-config --libs gtk+-3.0`
This will automatically result in the required include and library path flags for GCC.

The compiler toolchain contains at least two major tools: the compiler and the link editor (it is very common to name compiler the whole chain, but strictly speaking it is wrong).
The compiler is in charge of producing object code from the available source code. In that phase the compiler knows where to locate standard headers, and can be told to use non-standard dirs to locate headers. For example, gcc uses -I to let you specify some more alternate dirs that may contains headers.
The link editor is in charge of producing executable files (its basic common usage) from object codes. To produce an executable it needs to find every implementation of declared things at compile-time for which you didn't provide source code. These can be other object codes, object codes in libraries, etc. The link editor knows where are located standard libraries and can be told to let you specify non-standard dirs. For example you can tell the gcc toolchain to use alternate dirs with L that may contain libraries. You may be aware that link edition is now usually a two phase process: location-and-name-resolution at link-time and real link-edition at run-time (dynamic libraries are very common).
Basically a library is just a collection of object code. Consult internet to see how you can easily build libraries either from source code of from object code.

How can I compile ANSI C99-based MEX code delivered with Linux makefiles under Win64 MATLAB?

It seems I've got a real problem here due to my lack of any knowledge about Linux systems:
I have downloaded some open source code, which
is written in C
uses complex.h, so I assume it is ANSI C99
comes with makefiles designed for compilation under Linux systems
provides interfaces to IDL, MATLAB, Python etc.
I am indeed familiar about compiling C/MEX files under Windows-based MATLAB environments, but in this case I don't even know where to start. The project is distributed in several folders and consists of dozens of source and header files. And, to begin with, the Visual Studio 2010 compiler I've used to compile MEX files until now does not comply with the C99 standard, i.e. it does not recognize the complex.h header.
Any help towards getting this project compiled would be highly appreciated. In particular, I have the following questions:
1) Is there any possibility to automatically extract compilation information from the MEX files and transfer it to Windows reality?
2) Is there any free compiler being able to compile C99 stuff, which is also easy to embed in MATLAB?

I have done this (moved in-house legacy code inc. mex files to Win64). I can't recommend the experience.
You will have to recompile, no way around it.
Supported compilers for mex depend on your MATLAB version
This File Exchange entry for using Pelles C may be a starting point (if it works with your version of MATLAB).
I am guessing that there is a main makefile which then works through the makefiles in the subdirectories - have a read through the instructions for compiling under Linux, it will give you some idea of what's going on and may also discuss what to do if you want to change compiler. Once you've found a compatible compiler, the next stage is to understand what the makefiles are doing and edit them accordingly (change paths, compiler, compiler flags, etc.)
Then, from memory (it was a while ago), you get to enjoy a magical mystery tour through increasingly obscure compiler errors. Document everything because if you do get it working, you won't be in a mood to do this twice.

MATLAB R2016b on Windows now supports the MinGW compiler. I'm successfully using this to compile code written primarily for Linux/gcc. I installed this from the Add-On menu in MATLAB (search MinGW).
For my case, I'm building with the legacy code tool. The only thing I needed to do differently than normal was to tell the compiler to support c99 via a compiler flag. This does the trick:
legacy_code('compile', def, {'CFLAGS=-std=c99'})
I had trouble getting the flag command just right (I had some extra quotes that apparently broke things), and asked The MathWorks, so credit is due to their support team for this.
If you are using mex, I would expect to do something very similar.
I would guess that the makefiles are irrelevant for your application; you will need to tell the mex or legacy_code function about all of the files necessary to build the whole application or link against pre-built libraries (which it sounds like you don't have).
I hope this helps!

Writing a GCC-compatible wrapper around a .lib file

I recently received a closed-source SDK consisting of a C header file (.h), a library file (.lib), and a dynamic library (.dll). They were compiled using Microsoft's Visual C++. However, I am attempting to write my code using MinGW (GCC ported to Windows, for anyone unfamiliar with the project). It appears that ld is unable to link to the .lib file. I was wondering if it was possible to write a compatibility wrapper between the VS-compiled code and the GCC code I'm writing.

Is there an ABI mismatch or does it just not want to to link against the object format? If it's just a linking problem, you can extract the functions you care about, disassemble them, and then reassemble them into an object your linker can handle. Even easier, maybe objcopy(1) can speak both formats and can help you out?
If you do have an ABI problem to deal with, you can do the same but also add a shim layer to thunk the ABI so that the function calls will work. How complicated that layer is and how difficult it will be to write will depend on the interfaces of the functions you're trying to use.
Don't get too discouraged by the comments - it's software, so pretty much anything is possible.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight