Counting the number of functions and data structures in a C codebase - c

Is there a way to take a C file (or a directory/project) and count the number of functions + data structures? This is similar to counting the LOC but instead is focused on counting the number of "conceptual units" the program handles as a way to measure its complexity.

It sounds like you are in need of perusing your source code. Doxygen is an excellent tool for summarizing just about every aspect of a C project. (and many other languages). It is OpenSource, and easily downloaded. Additionally, the list of features is extensive.

On a Linux environment, you'll want to look at a tool like objdump, that will show you a bunch of information about the compiled output.
There are pages that explain some of its complicated output, such as this.
But perhaps one of the simplest is objdump -T.

Related

C theory/general practice related to "splitting" the system into x number of source files

I have a quick question when programming in C. I am writing a simple application in C as the title suggests but i find myself defining rather large functions in separate source files so it makes maintenance and debugging much easier but my question is is there a standard X amount of lines in a c source file before you should "split" it up into multiple files or is it very dependant on the system/functions in question.
Say for example i have 20 source files with 1 function in each say the functions are somewhat related but they all do different things (e.g. they all manipulate the same struct in some way) should you in theory have these 20 files, or 1 larger file with 20 functions and keep the modification of X structure in the same file?
My idea is the more "split" the better/easier the coding becomes, but then again im quite new to C.
Any input will be appreciated.
Cheers,
Chris.
It makes sense to put code related to the same conceptional area together. If you have functions which work on matrices for example, it would seem to make sense to have a file called matrices.c within which, there are X number of matrix functions. A function called render would obviously not belong there.
Yet if the number of matrix function were to grow huge, it started to feel wrong to shove them all into a single file. Under such a situation I would look for sub-categories and create separate files for each, e.g 2d_matrix.c, 3d_matrix.c, etc.
As for the number of functions you place in a file before you recategorize it, that's is up to personal choice and sometimes development rules of the team you work for.
The same consideration sometimes applies to the size of a function. One team I have worked for would not allow code which is over two screens high, feeling that such code should be broken up into a number of smaller functions which would make the code more readable.
To me, structure your code in a way that makes sense. Keep related code together and be sensible with sizes of functions, number of functions in a file (both too few or too many).
The larger a function gets, the more easy it is to accidentally break it.
The more code you shove in one file, the more likely it will be for other people to be a little sloppy and shove more, and possibly unrelated code in the same file.
Splitting up of a file is not function/system dependant. That entirely depends on the programmer. I have seen 1000-1500 or even more lines of code in a single C file. Keeping twenty functions in a same file makes sense if they are not very different from each other. However if you split the functions among the files, make sure that you write the Makefile properly when compiling them. The phrase " the more split, the easier coding becomes" is debatable.
I liked alk's answer in the closed duplicate: If you follow an object oriented style in C, i.e. use structures and operations on them, the files separate quite naturally in the same way as they would in C++. Operations on the same data types, together forming a "poor man's class", go together.

How Debuggers Find Expressions From Code Lines

A debugger gets a line number of an expression and translates it into an program address, what does the implementation look like? I want to implement this in a program I'm writing and the most promising library I've found to accomplish this is libbfd. All I would need is the address of the expression, and I can wait for it with ptrace(2). I can imagine that the debugger looks for the function name from the C file within the executable, but after that I'm lost.
Does anyone know? I don't need a code example, just enough info so that I can get an idea.
And I don't mind architecture-specific answers, the only ones I really care about are Arm and x86-64.
You should take a look at the DWARF2 format to try to understand how the mapping is done. Do consider how DWARF2 is vast and complex. It's not for everyone, but reading about it might satisfy your curiosity faster and more easily than reading the source for GCC/GDB.

Best Practice in Module (file) Size for C

I have read that it is best to aim keep functions to no more than approx a screen full of lines.
Is there a similar guideline for module (file) sizes?
I have read several C programming style guidelines but cannot find reference to recommended module sizes (only that of functions)
I apologise if this is akin to asking how long a piece of string is - but I would be very interested in seeing if there is some agreement among experts on this?
I would recommend using a separate .h and .c file for each struct and associated functions, and if possible not have more than a 1000 lines per file.
I have been taught that module size isn't the issue, but rather code readability. That is why the "screen full of lines" for functions is best, as well as lines no more that around 80-100 characters long, no more than 2 cyclomatic nests (for loop-if/then-for loop-if/then...), etc. As long as your code is organized, I don't see any real limit to the size of a module, as long as the principle of cohesion is practiced when constructing a module. That is the real standard, which allows the user of your code to include, as much as it is possible, only what he or she needs to get the job done and not much else.
If you mean file length, it should be as per the ANSI C standard. You can refer to stdio.h FILENAME_MAX. Note that most of the implementations support file name length more than specified in the C standard.
Dont make file size very large. On an average, the size could go from 1000 to 2000 files. But it depends on how you have written the functions.
Keep your source code manageable. Separate cohesive units into modules and keep each module in a separate '.c' file and each '.c' file should have an accompanying '.h' file. Following this system and based on the complexity of your project, you may have '.c' files ranging from a few lines to approximately 1000 lines. Those numbers are reasonable and are easy on your complier and platform.

c99 dynamic array

I'm writing a very small, project-specific OpenGLES engine for iphone and I really need to use a good, solid, and proven dynamic array library/macro in c99 dialect. (No C++, Obj-C, stl whatsoever)
It's strongly necessary for render batch and polygon mesh, so it should be able to handle various types of data, and additionally causes minimal overhead when array size changes and new data is inserted.
I've been searching around and found two candidates for my need.
the first one is from ccCArray from Cocos2d.
and another one is utarray written by Troy D. Hanson.
ccCArray IS rock solid, thoroughly proven by community. utarray looks fine but I cannot find anyone actually uses it.
Any more suggestion?
A library ?! A C++ template would be more than suitable for this need. I'd say about AT MOST 15 functions (excluding alternative constructors and const getters), and you're done. Also able to use it for ANY type, ANY size and ANY size type (byte, int etc.) And it's just one file: a .h or, better said, a .hpp
Any reason you're rejecting it ? Seems like you want to make life harder for yourself :)

Is there a widespread C library for reading name/value pairs from a file?

My program is reading a text file containing various lines of text for a settings file. Some of the lines could get very large. Currently the buffer size is 4096 chars. It is possible that some lines could exceed this, whether through maliciousness or due to various factors operating within the program.
The current routines were rather tedious to write and now I want to expand the possible contents of the file which will require more of this tedious repetitive code. (This is for a settings type file, consisting of name value pairs and the occasional section header. Some numerical values need to be read as strings due to multiple precision).
The main thing I want is to read an arbitrary length line without buffer overflow. I've just discovered getline can do this for me, but, is there for heavens sake a library that will just do the whole lot of this tediousness for me?
edit:
I don't wish to be forced to place an = sign between the name and values, a blank space should suffice as separator.
By widespread, I mean the library should be available in the standard packages of the popular Linux distributions.
I'm aware of libconfig but it seems complete overkill for my requirements.
Look into libini, sounds about right. It is quite old and not exactly undergoing frantic development, but if it already works for your problem, that should be fine.
A more up to date library, with a bunch of other benefits, is glib, it has a key-value-parser API.
My suggestion is, DIY, since it's quite easy.
Read each line
count chars until your separator and after your separator
allocate buffers
and read name value pairs with sscanf
like:
sscanf(line, "%[^:]: %[^\n]", key, value);
You will be safe since you counted chars before sccanf.
I contributed an updated fork of libini at CCAN. It also contains a very useful dictionary implementation as well as some simple hashing algorithms. Rusty put it in the repo, so I guess I did a reasonably good job of bringing it up to date and fixing the few minor bugs.
The latest version of the library can be found if you poke through this tree, it contains basic token support as well as basic transaction support (useful for re-reading configuration files and reverting if there's a parsing error). It also contains a much more updated set of unit tests.
I don't actively maintain the fork any more, as the original author of libini became active again, however the module is maintained in CCAN.

Resources