I am not quite sure if I understand difference between direct and sequential file accessing in C. Lets say we have a structure in some file:
struct students_s {
int points[5];
int number[10];
int id;
}student;
In this case we can use direct access to that structure since all of the variables are the same type, therefore they are same fixed size. This is my thinking, and I am not quite sure if it is correct. Can someone please help me with this?
Another question I have is, is it necessary that students in that file are arranged from 1 to n with the id variable or their arrangemenet does not matter?
I tried searching for it online but I can not seem to find detailed answer.
I was reading the chapter in Beautiful Code on the Linux kernel and the author discusses how Linux kernel implements inheritance in the C language (amongst other topics). In a nutshell, a 'base' struct is defined and in order to inherit from it the 'subclass' struct places a copy of the base at the end of the subclass struct definition. The author then spends a couple pages explaining a clever and complicated macro to figure out how many bytes to back in order to convert from the base part of the object to the subclass part of the object.
My question: Within the subclass struct, why not declare the base struct as the first thing in the struct, instead of the last thing?
The main advantage of putting the base struct stuff first is when casting from the base to the subclass you wouldn't need to move the pointer at all - essentially, doing the cast just means telling the compiler to let your code use the 'extra' fields that the subclass struct has placed after the stuff that the base defines.
Just to clarify my question a little bit let me throw some code out:
struct device { // this is the 'base class' struct
int a;
int b;
//etc
}
struct usb_device { // this is the 'subclass' struct
int usb_a;
int usb_b;
struct device dev; // This is what confuses me -
// why put this here, rather than before usb_a?
}
If one happens to have a pointer to the "dev" field inside of a usb_device object then in order to cast it back to that usb_device object one needs to subtract 8 from that pointer. But if "dev" was the first thing in a usb_device casting the pointer wouldn't need to move the pointer at all.
Any help on this would be greatly appreciated. Even advice on where to find an answer would be appreciated - I'm not really sure how to Google for the architectural reason behind a decision like this. The closest I could find here on StackOverflow is:
why to use these weird nesting structure
And, just to be clear - I understand that a lot of bright people have worked on the Linux kernel for a long time so clearly there's a good reason for doing it this way, I just can't figure out what it is.
The Amiga OS uses this "common header" trick in a lot of places and it looked like a good idea at the time: Subclassing by simply casting the pointer type. But there are drawbacks.
Pro:
You can extend existing data structures
You can use the same pointer in all places where the base type is expected, no pointer arithmetic needed, saving precious cycles
It feels natural
Con:
Different compilers tend to align data structures differently. If the base structure ended with char a;, then you could have 0, 1 or 3 pad bytes afterwards before the next field of the subclass starts. This led to quite nasty bugs, especially when you had to maintain backwards compatibility (i.e. for some reason, you have to have a certain padding because an ancient compiler version had a bug and now, there is lots of code which expects the buggy padding).
You don't notice quickly when you pass the wrong structure around. With the code in your question, fields get trashed very quickly if the pointer arithmetic is wrong. That is a good thing since it raises chances that a bug is discovered more early.
It leads to an attitude "my compiler will fix it for me" (which it sometimes won't) and all the casts lead to a "I know better than the compiler" attitude. The latter one would make you automatically insert casts before understanding the error message, which would lead to all kinds of odd problems.
The Linux kernel is putting the common structure elsewhere; it can be but doesn't have to be at the end.
Pro:
Bugs will show early
You will have to do some pointer arithmetic for every structure, so you're used to it
You don't need casts
Con:
Not obvious
Code is more complex
I'm new to the Linux kernel code, so take my ramblings here with a grain of salt. As far as I can tell, there is no requirement as to where to put the "subclass" struct. That is exactly what the macros provide: You can cast to the "subclass" structure, regardless of its layout. This provides robustness to your code (the layout of a structure can be changed, without having to change your code.
Perhaps there is a convention of placing the "base class" struct at the end, but I'm not aware of it. I've seen lots of code in drivers, where different "base class" structs are used to cast back to the same "subclass" structure (from different fields in the "subclass" of course).
I don't have fresh experience from the Linux kernel, but from other kernels. I'd say that this doesn't matter at all.
You are not supposed to cast from one to the other. Allowing casts like that should only be done in very specific situations. In most cases it reduces the robustness and flexibility of the code and is considered quite sloppy. So the deepest "architectural reason" you're looking for might just be "because that's the order someone happened to write it in". Or alternatively, that's what the benchmarks showed would be the best for performance of some important code path in that code. Or alternatively, the person who wrote it thinks it looks pretty (I always build upside-down pyramids in my variable declarations and structs if I have no other constraints). Or someone happened to write it this way 20 years ago and since then everyone else has been copying it.
There might be some deeper design behind this, but I doubt it. There's just no reason to design those things at all. If you want to find out from an authoritative source why it's done this way, just submit a patch to linux that changes it and see who yells at you.
It's for multiple inheritance. struct dev isn't the only interface you can apply to a struct in the linux kernel, and if you have more than one, just casting the sub class to a base class wouldn't work. For example:
struct device {
int a;
int b;
// etc...
};
struct asdf {
int asdf_a;
};
struct usb_device {
int usb_a;
int usb_b;
struct device dev;
struct asdf asdf;
};
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I want to know how to create a single linked list in C where each node has:
A string of any length.
A list of integers (the total number of integers to store will vary between nodes, so I cant use an array of integers).
Thanks in advance.
I'm not going to write the code for you, but here are some steps to get you started. This is one possible way of doing it; there are others that might work better given your actual needs.
First, determine your data structures. You need two. The first is the linked list of integers. The second is the linked list that contains both the string and the list. Here is what the nodes in the latter might look like:
typedef struct node {
char* someStr;
LIST* integerList;
struct node* next;
} NODE;
Then, have one function that creates the linked list of integers. Have another function that allocates memory for the string and returns a char*.
Have a third function that takes as parameters the linked list of NODEs, a string, and a linked list of ints. This function appends a new NODE to the list, setting the pointers to the linked list of int and the string.
At least get started with this. If you get stuck on any of the functions, you can ask specific questions about them. But you cannot just post the problem and expect someone to write the code for you, without saying what you have tried and where you are stuck.
From what I understood from:
A list of integers (the total number of integers to store will vary between nodes, so I cant use an array of integers).
It seems that you want a dynamic array for numbers, you can use This CCAN Module (It's just a header, just remove the config.h inclusion and you should be good to go)
NB: a List of numbers seems like a bad idea, so dynamic array is probably the most effecient way of doing this.
NB2: It's not hard to code your own dynamic array, just have a look at it's algorithm
Simplest usage is something like:
#include "darray.h"
struct node {
char *string;
darray(int) numbers;
};
inline void push_number(struct node *node, int number) {
darray_push(node->numbers, number);
}
http://c-faq.com/strangeprob/ptralign.html
In 16.7, the author explains:
s.i32 = *(long int *)p;
s.i16 = *(int *)p;
will get into trouble 'cause these casted pointer may not be aligned. So he uses byte wise manipulation instead for solution.
My question is, since this code:
struct mystruct {
char c;
long int i32;
int i16;
} s;
will have padding bytes after 'char c;', why didn't the author skip the padding when he try to get the 'long int i32;'?
Keep in mind this is supposed to be a FAQ list. You have to read the Q and the A as if they were written by different people. Theoretically, the Q is something that actually gets Asked Frequently by people who don't know the answer. In reality it's probably not a direct quote from an actual question, but a sort of idealized version of the question made up by the FAQ list maintainter. But still, when writing the Q section the author adopts a different persona.
In this case the questioning persona doesn't know about alignment or padding. He writes char buf[7] and the struct definition and thinks they should both be 7 bytes long. The 7-byte buffer is an external data format (in a file or a network protocol stream) that the questioner is trying to parse, the struct represents the variables he would like to parse it into, and the statements like s.i32 = *(long int *)p; are his unsuccessful attempt at doing it.
In the A section, our author drops that persona and gives the correct method of transferring the data from the packed 7-byte buffer into the struct. He doesn't explain every detail of alignment and padding rules as applied to the struct and the char buffer because he wants to keep the answer brief.
You're looking at a true old-style newsgroup FAQ list, which was designed to actually answer the questions that people ask frequently, not a corporate web-site style "FAQ" in which a marketing team makes up fake questions designed to flatter the company and avoid answering any complaints. (And does anybody else remember when there was a distinction between a FAQ which was a single question and a FAQL which was the FAQ List with answers? Where did that go?)
This is not exactly a technical question, since I know C kind of enough to do the things I need to (I mean, in terms of not 'letting the language get in your way'), so this question is basically a 'what direction to take' question.
Situation is: I am currently taking an advanced algorithms course, and for the sake of 'growing up as programmers', I am required to use pure C to implement the practical assignments (it works well: pretty much any small mistake you make actually forces you to understand completely what you're doing in order to fix it). In the course of implementing, I obviously run into the problem of having to implement the 'basic' data structures from the ground up: actually not only linked lists, but also stacks, trees, et cetera.
I am focusing on lists in this topic because it's typically a structure I end up using a lot in the program, either as a 'main' structure or as a 'helper' structure for other bigger ones (for example, a hash tree that resolves conflicts by using a linked list).
This requires that the list stores elements of lots of different types. I am assuming here as a premise that I don't want to re-code the list for every type. So, I can come up with these alternatives:
Making a list of void pointers (kinda inelegant; harder to debug)
Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB, 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
Your idea/solution
To make the question clear: which one of the above is best?
PS: Since I am basically in an academic context, I am also very interested in the view of people working with pure C out there in the industry. I understand that most pure C programmers are in the embedded devices area, where I don't think this kind of problem I am facing is common. However, if anyone out there knows how it's done 'in the real world', I would be very interested in your opinion.
A void * is a bit of a pain in a linked list since you have to manage it's allocation separately to the list itself. One approach I've used in the past is to have a 'variable sized' structure like:
typedef struct _tNode {
struct _tNode *prev;
struct _tNode *next;
int payloadType;
char payload[1]; // or use different type for alignment.
} tNode;
Now I realize that doesn't look variable sized but let's allocate a structure thus:
typedef struct {
char Name[30];
char Addr[50];
} tPerson;
tNode *node = malloc (sizeof (tNode) - 1 + sizeof (tPerson));
Now you have a node that, for all intents and purposes, looks like this:
typedef struct _tNode {
struct _tNode *prev;
struct _tNode *next;
int payloadType;
char Name[30];
char Addr[50];
} tNode;
or, in graphical form (where [n] means n bytes):
+----------------+
| prev[4] |
+----------------+
| next[4] |
+----------------+
| payloadType[4] |
+----------------+ +----------+
| payload[1] | <- overlap -> | Name[30] |
+----------------+ +----------+
| Addr[50] |
+----------+
That is, assuming you know how to address the payload correctly. This can be done as follows:
node->prev = NULL;
node->next = NULL;
node->payloadType = PLTYP_PERSON;
tPerson *person = &(node->payload); // cast for easy changes to payload.
strcpy (person->Name, "Bob Smith");
strcpy (person->Addr, "7 Station St");
That cast line simply casts the address of the payload character (in the tNode type) to be an address of the actual tPerson payload type.
Using this method, you can carry any payload type you want in a node, even different payload types in each node, without the wasted space of a union. This wastage can be seen with the following:
union {
int x;
char y[100];
} u;
where 96 bytes are wasted every time you store an integer type in the list (for a 4-byte integer).
The payload type in the tNode allows you to easily detect what type of payload this node is carrying, so your code can decide how to process it. You can use something along the lines of:
#define PAYLOAD_UNKNOWN 0
#define PAYLOAD_MANAGER 1
#define PAYLOAD_EMPLOYEE 2
#define PAYLOAD_CONTRACTOR 3
or (probably better):
typedef enum {
PAYLOAD_UNKNOWN,
PAYLOAD_MANAGER,
PAYLOAD_EMPLOYEE,
PAYLOAD_CONTRACTOR
} tPayLoad;
My $.002:
Making a list of void pointers (kinda diselegant; harder to debug)
This isn't such a bad choice, IMHO, if you must write in C. You might add API methods to allow the application to supply a print() method for ease of debugging. Similar methods could be invoked when (e.g.) items get added to or removed from the list. (For linked lists, this is usually not necessary, but for more complex data structures -- hash tables, for example) -- it can sometimes be a lifesaver.)
Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
I would avoid this like the plague. (Well, you did ask.) Having a manually-configured, compile-time dependency from the data structure to its contained types is the worst of all worlds. Again, IMHO.
Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB (sglib.sourceforge.net), 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
Intriguing idea, but since I don't know SGLIB, I can't say much more than that.
Your idea/solution
I'd go with the first choice.
I've done this in the past, in our code (which has since been converted to C++), and at the time, decided on the void* approach. I just did this for flexibility - we were almost always storing a pointer in the list anyways, and the simplicity of the solution, and usability of it outweighed (for me) the downsides to the other approaches.
That being said, there was one time where it caused some nasty bug that was difficult to debug, so it's definitely not a perfect solution. I think it's still the one I'd take, though, if I was doing this again now.
Using a preprocessor macro is the best option. The Linux kernel linked list is a excellent a eficient implementation of a circularly-linked list in C. Is very portable and easy to use. Here a standalone version of linux kernel 2.6.29 list.h header.
The FreeBSD/OpenBSD sys/queue is another good option for a generic macro based linked list
I haven't coded C in years but GLib claims to provide "a large set of utility functions for strings and common data structures", among which are linked lists.
Although It's tempting to think about solving this kind of problem using the techniques of another language, say, generics, in practice it's rarely a win. There are probably some canned solutions that get it right most of the time (and tell you in their documentation when they get it wrong), using that might miss the point of the assignment, So i'd think twice about it. For a very few number of cases, It might be feasable to roll your own, but for a project of any reasonable size, Its not likely to be worth the debugging effort.
Rather, When programming in language x, you should use the idioms of language x. Don't write java when you're using python. Don't write C when you're using scheme. Don't write C++ when you're using C99.
Myself, I'd probably end up using something like Pax's suggestion, but actually use a union of char[1] and void* and int, to make the common cases convenient (and an enumed type flag)
(I'd also probably end up implementing a fibonacci tree, just cause that sounds neat, and you can only implement RB Trees so many times before it loses it's flavor, even if that is better for the common cases it'd be used for.)
edit: based on your comment, it looks like you've got a pretty good case for using a canned solution. If your instructor allows it, and the syntax it offers feels comfortable, give it a whirl.
This is a good problem. There are two solutions I like:
Dave Hanson's C Interfaces and Implementations uses a list of void * pointers, which is good enough for me.
For my students, I wrote an awk script to generate type-specific list functions. Compared to preprocessor macros, it requires an extra build step, but the operation of the system is much more transparent to programmers without a lot of experience. And it really helps make the case for parametric polymorphism, which they see later in their curriculum.
Here's what one set of functions looks like:
int lengthEL (Explist *l);
Exp* nthEL (Explist *l, unsigned n);
Explist *mkEL (Exp *hd, Explist *tl);
The awk script is a 150-line horror; it searches C code for typedefs and generates a set of list functions for each one. It's very old; I could probably do better now :-)
I wouldn't give a list of unions the time of day (or space on my hard drive). It's not safe, and it's not extensible, so you may as well just use void * and be done with it.
One improvement over making it a list of void* would be making it a list of structs that contain a void* and some meta-data about what the void* points to, including its type, size, etc.
Other ideas: embed a Perl or Lisp interpreter.
Or go halfway: link with the Perl library and make it a list of Perl SVs or something.
I'd probably go with the void* approach myself, but it occurred to me that you could store your data as XML. Then the list can just have a char* for data (which you would parse on demand for whatever sub elements you need)....