When someone wants to build a C library for dealing with I/O (dealing with a specific file format), they pretty much have to provide the following:
/* usual opaque struct setup */
struct my_context;
typedef struct my_context my_context_t;
/* Open context for reading from user specified callbacks */
my_context_t* my_open_callback(void* userdata,
size_t(*read_cb)(void* data, size_t size, size_t count, void* userdata),
int(*close_cb)(void* userdata),
void(*error_cb)(const char* error_msg)
);
And then later provide some common ones:
/* Open directly from file */
my_context_t* my_open_file(const char * filename);
/* Open from an existing memory block */
my_context_t* my_open_memory(const char* buf, size_t len);
As far as as understand there are possibly others, but is this one considered to reduce inconsistencies, unsafe practices and inefficiencies in the design, or is there something else considered best practice ? Is there a name for this convention/best practice ?
These are interface design questions. A good interface provides a useful abstraction and hides implementation details. In your example, my_context_t elides some of the implementation details from your user base, provided you don't fully define the type in a public header. This provides you with the freedom to make substantial changes to your implementation without forcing your entire user base to rewrite their code. It is a very good practice, provided the rest of your abstraction is a good fit to the problem space. Sometimes you just have to commit to exposing additional detail at the interface level.
Related
I have very basic question on C language.
Situation
There is a library named "lib".
The library has a member of array named "tmp".
int tmp[ARRAY_SIZE];
currently ARRAY_SIZE is defined inside lib.
Even though it is treated as library, we compile the library and app at the same time.(we are in non-OS environment)
We can use static memory allocation only (no dynamic allocation)
because of embedded system environment (where using heap is likely to be avoided)
Since this is the library, we would like to be independent from application layer as much as possible
Goal
We would like to make ARRAY_SIZE configurable from app layer.
Question
In this situation, how do you modify this library to achieve the goal ?
Here is my ideas.
define "tmp" at application layer,
then pass it as a pointer to the library at initialization time.
define the MACRO at compile time, like -DARRAY_SIZE=10
create header file like lib_setting.h at app layer, the include it from lib.h
Any other ideas ?
If you were me, how do you implement ?
John
Any other ideas ?
No, these are exactly the solutions to such problems.
If you were me, how do you implement ?
It strongly depends on what the library is for and what exactly ARRAY_SIZE represent. For sure I would not call it ARRAY_SIZE, as the name ARRAY_SIZE() is used in linux kernel I'm familiar with, so I would definitely pick a LIB_unique LIB_name LIB_with LIB_prefix.
If it's a library meant for normal regular use, I would definitely go with option 1 whenever possible.
struct lib_s {
char *buf;
size_t bufsize;
};
static int lib_init(struct lib_s *t, char *buf, size_t bufsize) {
assert(t);
assert(buf);
assert(bufsize);
t->buf = buf;
t->bufsize = bufsize;
return 0;
}
int lib_dosomething(struct lib_s *t);
// client
int main() {
char mysuperbuffer[2046];
struct lib_s lib;
lib_init(&lib, mysuperbuffer, sizeof(mysuperbuffer));
}
Such design is easy to unit test. Is re-entrant. The lifetime is manageable. It's not spaghetti code. It's easy to understand and track variables. If user changes his mind, the user can choose if he wants to malloc or not. It's easy to refactor and extend later. Such design is found in many functions - fopenbuf() comes to my head from POSIX.
If it's a highly specialized library that profiling shows it's the bootleneck and the library needs super speed of execution and you don't want to waste sizeof(char*) + sizeof(size_t) bytes of memory and a lot more for indirection, I would go with option 3. Many projects use it - autotools generates a main config.h file, mbedtls.h has mbedtls/config.h file, etc.
If the configuration option is very constant and rarely changes, for example changes only when switching platforms, on windows it's something on linux it's something else, then maybe I would consider option 2, but I believe I would prefer to go option 3 anyway. Using a file is easier to track in control version systems and build systems will recompile only files that depend on a header that sees that definition. A macro passed on command line is harder to track which files use it (ie. spaghetti code) and usually when you change it's value, you have to recompile the whole project.
Let say we have a function:
void persist_result(FILE* to, unsigned char* b, int b_len) {...}
which would save some result in the given FILE* to.
Now I would like to get the data before the data is written to to, do something with it (assume encrypt it, etc..) and then call the actual IO operation, directly or indirectly.
One solution could be setting a buffer, but I don't know how to trigger my method for the encryption operation.
Also I was thinking to get some handle of file in memory, but don't know if there is any ISO way to do that?
Or any better solution?
Consider the following:
Size of the data need to be written by the persist_result is unknown, it could be 1 or more bytes.
I cannot change the source of persist_result.
No C++; it must be a portable C solution.
What you are looking for is the Observer Pattern.
When your function is called, actually you can first capture that call, do whatever you prefer and then continue with what you were doing. You could implement it in C using pointer to functions.
You can get inspiration from the following example
There is no way to capture every operation in standard C without changing the calls. Things like encryption need context (like key) to work; that complicates life in general, but maybe persist_result() handles that automatically. How will you handle things like fseek() or rewind()?
I think you are in for a world of pain unless you write your I/O operations to a non-standard C API that allows you to do what's necessary cleanly. For example, your code might be written to call functions such as pr_fwrite(), pr_putc(), pr_fprintf(), pr_vfprintf(), pr_fseek(), pr_rewind(), etc — you probably wouldn't be applying this to either stdin or stdout — and have those do what's necessary.
If I were going to try this, I'd adopt prefixes (pr_ and PR) and create a header "prstdio.h" to be used in place of, or in addition to, <stdio.h>. It could contain (along with comments and header guards, etc):
#include <stdarg.h>
// No need for #include <stdio.h>
typedef struct PRFILE PRFILE;
extern PRFILE *pr_fopen(const char *name, const char *mode);
extern int pr_fclose(PRFILE *fp);
extern int pr_fputc(char c, PRFILE *fp);
extern size_t pr_fwrite(const void *buffer, size_t size, size_t number, PRFILE *fp);
extern int pr_fprintf(PRFILE *fp, char *fmt, ...);
extern int pr_vfprintf(PRFILE *fp, char *fmt, va_list args);
extern int pr_fseek(PRFILE *fp, long offset, int whence);
extern void pr_rewind(PRFILE *fp);
…
and all the existing I/O calls that need to work with the persist_result() function would be written to use the prstdio.h interface instead. In your implementation file, you actually define the structure struct PRFILE, which would include a FILE * member plus any other information you need. You then write those pr_* functions to do what's necessary, and your code that needs to persist results is changed to call the pr_* functions (and use the PRFILE * type) whenever you currently use the stdio.h functions.
This has the merit of being simply compliant with the C standard and can be made portable. Further, the changes to existing code that needs to use the 'persistent result' library are very systematic.
In a comment to the main question — originally responding to a now-deleted comment of mine (the contents of which are now in this answer) — the OP asked:
I need to do the encryption operation before the plain data write operation. The encryption context is ready for work. I was thinking using the disk on memory, but is it ISO and can be used in Android NDK and iOS too?
Your discussion so far is in terms of encrypting and writing the data. Don't forget the other half of the I/O equation — reading and decrypting the data. You'd need appropriate input functions in the header to be able to handle that. The pr_ungetc() function could cause some interesting discussions.
The scheme outlined here will be usable on other systems where you can write the C code. It doesn't rely on anything non-standard. This is a reasonable way of achieving data hiding in C. Only the implementation files for the prstdio library need know anything about the internals of the PRFILE structure.
Since 'disk in memory' is not part of standard C, any code using such a concept must be using non-standard C. You'd need to consider carefully what it means for portability, etc. Nevertheless, the external interface for the prstdio library could be much the same as described here, except that you might need one or more control functions to manipulate the placement of the data in memory. Or you might modify pr_fopen() to take extra arguments which control the memory management. That would be your decision. The general I/O interface need not change, though.
I need to create a library in C and I am wondering how to manage objects: returning allocated (ex: fopen, opendir) or in-place initialization (ex: GNU hcreate_r).
I understand that it is mostly a question of taste, and I'm inclined to choose the allocating API because of the convenience when doing lazy initialization (by testing if the object pointer is NULL).
However, after reading Ulrich's paper (PDF), I'm wondering if this design will cause locality of reference problems, especially if I compose objects from others:
struct opaque_composite {
struct objectx *member1;
struct objecty *member2;
struct objectz *member2;
/* ... */
};
Allocation of such an object will make a cascade of other sub-allocations. Is this a problem in practice? And are there other issues that I should be aware of?
The thing to consider is whether the type of the object the function constructs is opaque. An opaque type is only forward-declared in the header file and the only thing you can do with it is having a pointer to it and passing that pointer to separately compiled API functions. FILE in the standard library is such an opaque type. For an opaque type, you have no option but have to provide an allocation and a deallocation function as the user has no other way to obtain a reference to an object of that type.
If the type is not opaque – that is, the definition of the struct is in the header file – it is more versatile to have a function that does only initialization – and, if required, another that does finalization – but no allocation and deallocation. The reason is that with this interface, the user can decide whether to put the objects on the stack…
struct widget w;
widget_init(&w, 42, "lorem ipsum");
// use widget…
widget_fini(&w);
…or on the heap.
struct widget * wp = malloc(sizeof(struct widget));
if (wp == NULL)
exit(1); // or do whatever
widget_init(wp, 42, "lorem ipsum");
// use widget…
widget_fini(wp);
free(wp);
If you think that this is too much typing, you – or your users themselves – can easily provide convenience functions.
inline struct widget *
new_widget(const int n, const char *const s)
{
struct widget wp = malloc(sizeof(struct widget));
if (wp != NULL)
widget_init(wp, n, s);
return wp;
}
inline void
del_widget(struct widget * wp)
{
widget_fini(wp);
free(wp);
}
Going the other way round is not possible.
Interfaces should always provide the essential building blocks to compose higher-level abstractions but not make legitimate uses impossible by being overly restrictive.
Of course, this leaves us with the question when to make a type opaque. A good rule of thumb – that I have first seen in the coding standards for the Linux kernel – might be to make types opaque only if there are no data members your users could meaningfully access. I think this rule should be refined a little to take into account that non-opaque types allow for “member” functions to be provided as inline versions in the header files which might be desirable from a performance point of view. On the other hand, opaque types provide better encapsulation (especially since C has no way to restrict access to a struct's members). I would also lean towards an opaque type more easily if making it not opaque would force me to #include headers into the header file of my library because they provide definitions of the types used as members in my type. (I'm okay with #includeing <stdint.h> for uint32_t. I'm a little less easy about #includeing a large header such as <unistd.h> and I'd certainly try to avoid having to #include a header from a third-party library such as <curses.h>.)
IMO the "cascade of sub-allocations" is not a problem if you keep the object opaque so you can keep it in a consistent state. The creation and destruction routines will have some added complexity dealing with an allocation failure part way through creation, but nothing too onerous.
Besides the option to have a static/stack-allocated copy (which I'm generally not fond of anyway), in my mind, the main advantage of a scheme like:
x = initThang(thangPtr);
is the ease of returning a variety of more specific error codes.
In ex26 of 'Learn C the Hard Way' in the db.c file Zed defines two functions:
static FILE *DB_open(const char *path, const char *mode) {
return fopen(path, mode);
}
static void DB_close(FILE *db) {
fclose(db);
}
I have a hard time understanding the purpose/need for wrapping these very simple calls to fopen and fclose. What are, if any, the advantages of wrapping very simple functions, like the example given above?
In this particular case a wrapper is used to hide the detail that DB_open, DB_read or DB_close all map to file operations.
This approach implements an abstraction layer through which all database related functions are to be accessed. Also this provides modularity, which may later allow adding more methods to open/read/close databases.
As explained by Michael Kohne in the comments, this wrapper should be improved to totally hide e.g. the type of FILE *db, by substituting it with struct DB_context *context;.
Wrappers, (or stubs), are often used to guard the other areas of your code from changes in the functions being wrapped.
It's also a useful way of interacting with dynamic libraries and shared objects.
Basically a wrapper means , hiding all the related information of underlying routines by using our/a_developer's custom function.
You've mentioned that file name itself is db.c , so maybe the developer wants all the critical & important function which are used/declared must have starting of DB_
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was making a code and I had like 5 include files in there, I was defining functions in this file and then realized why should not I just make separate header files for all the functions and then just include them in a file at last. But, I have seen that this is not done usually. Why not? Is there a particular disadvantage of doing this?
This is not a real answer, because the question has a wrong assumption:
But, I have seen that this is not done usually.
This is not true. This is a common practice. A good example is ffmpeg.h. The header is a front-end to an extensive library.
The argument of long compilation times is bogus. Today the systems are very fast. This is only important for really huge systems, but I really don't think that you work with them. I never encountered such systems myself.
And compilation times aren't execution times. This is another misconception.
For your convenience the whole code of ffmpeg.h:
#ifndef _INCLUDE_FFMPEG_H_
#define _INCLUDE_FFMPEG_H_
#ifdef HAVE_FFMPEG
#include <avformat.h>
#endif
#include <stdio.h>
#include <stdarg.h>
/* Define a codec name/identifier for timelapse videos, so that we can
* differentiate between normal mpeg1 videos and timelapse videos.
*/
#define TIMELAPSE_CODEC "mpeg1_tl"
struct ffmpeg {
#ifdef HAVE_FFMPEG
AVFormatContext *oc;
AVStream *video_st;
AVCodecContext *c;
AVFrame *picture; /* contains default image pointers */
uint8_t *video_outbuf;
int video_outbuf_size;
void *udata; /* U & V planes for greyscale images */
int vbr; /* variable bitrate setting */
char codec[20]; /* codec name */
#else
int dummy;
#endif
};
/* Initialize FFmpeg stuff. Needs to be called before ffmpeg_open. */
void ffmpeg_init(void);
/* Open an mpeg file. This is a generic interface for opening either an mpeg1 or
* an mpeg4 video. If non-standard mpeg1 isn't supported (FFmpeg build > 4680),
* calling this function with "mpeg1" as codec results in an error. To create a
* timelapse video, use TIMELAPSE_CODEC as codec name.
*/
struct ffmpeg *ffmpeg_open(
char *ffmpeg_video_codec,
char *filename,
unsigned char *y, /* YUV420 Y plane */
unsigned char *u, /* YUV420 U plane */
unsigned char *v, /* YUV420 V plane */
int width,
int height,
int rate, /* framerate, fps */
int bps, /* bitrate; bits per second */
int vbr /* variable bitrate */
);
/* Puts the image pointed to by the picture member of struct ffmpeg. */
void ffmpeg_put_image(struct ffmpeg *);
/* Puts the image defined by u, y and v (YUV420 format). */
void ffmpeg_put_other_image(
struct ffmpeg *ffmpeg,
unsigned char *y,
unsigned char *u,
unsigned char *v
);
/* Closes the mpeg file. */
void ffmpeg_close(struct ffmpeg *);
/*Deinterlace the image. */
void ffmpeg_deinterlace(unsigned char *, int, int);
/*Setup an avcodec log handler. */
void ffmpeg_avcodec_log(void *, int, const char *, va_list);
#endif /* _INCLUDE_FFMPEG_H_ */
Some people argue that putting functions in a separate file and including them through a header adds some overhead to the project and increases compilation (not execution) time. Although strictly speaking this is true, in practice the increased compilation time is negligible.
My take on the issue is more based on the purpose of the functions. I am against putting a single function per file with an associate header as this quickly becomes a mess to maintain. I do not think that putting everybody together in a single file is a good approach either (also a mess to maintain, although for different reasons).
My opinion is that the ideal trade off is to look at the purpose of the functions. You should ask yourself whether the functions can be used somewhere else or not. In other words, can these functions be used as a library for a series of common tasks in other programs? If yes, these functions should be grouped in a single file. Use as many files as you have general tasks. For instance, all functions to perform numerical integration in one file, all functions to handle file i/o in another, all functions to deal with strings in a third file and so on. In this way your libraries are consistent.
Finally, I would place a function that performs a task only meaningful for a specific program into the same file of the main function. For instance any function whose purpose is to initialize a series of variables.
But most of all, you should take any advice just as an advice. In the end of the day you should adopt the approach that makes your (or your team) development the most productive.
You should only make one "super-include-header" if you are writing some sort of API or library and want to make it easy for the user of that library to access the functions inside. The Windows OS API is the most obvious example of this, where one #include gives you access of thousands upon thousands of functions.
But even when writing such libraries, you should be wary of "super-headers". The reason why you should try to avoid them is related to program design. Object-oriented design dictates that you should strive to make isolated, autonomous modules that focus on their own task without knowing or caring about the rest of the program.
The rationale behind that design rule, is to reduce the phenomenon known as tight coupling, where every module is heavily dependant on other modules. Computer science research (like this study, for example) shows that tight coupling combined with complexity leads to far more software errors, and also far more severe errors. If a program with tight coupling gets a bug in one module, the bug might escalate throughout the whole program and cause disaster. While a bug in one autonomous module with loose coupling only leads to that particular module failing.
Each time you include a header file, you create a dependency between your program and that header file. So while it is tempting to just make one file that includes everything, you should avoid this, as it would create tight coupling between all modules in the project. It will also expose the modules to each other's global name space, possibly leading to more name space collisions with identical variable names etc.
Tight coupling is also, aside from the safety concerns, very annoying when you link/build your program. With tight coupling, suddenly your database module cannot work if something completely unrelated, like the GUI library isn't linked.