Creating a File System, need C advice - c

I am building this file system in C. At the moment I am on the first step of the assignment which is just to create a simple file system that works in memory. My question is more based around C than it is around a Unix File System. I am trying to "emulate" a raw disk. I have the following structure:
struct disk {
void *data;
unsigned int numOfBlocks;
};
Let's pretend a block on this disk will be 512 Bytes (like the original Unix file system). I have some functions defined to create a disk, read from a disk, and write to a disk. It is then my job to implement the various things such as data blocks, i-node blocks, the super block etc.
Look at the void *data variable above. I want this to be a two dimensional array. It should be an array of block arrays. So...what makes the most sense to be is to use something like this.
unsigned char *data[30][512]; // Assuming the disk holds 30 blocks of 512 bytes each
Here comes the question: If I have other structures defined to represent an i-node, a super block, or a data block, and each also has a struct size of 512 bytes, how can I properly cast this unsigned char * to the i-node struct, or the data struct, etc?
Thanks.

I'm not sure I understand the question.
From what you say:
My question isn't how to access the characters in the data variable...it's how to take a 512 byte array of unsigned char for instance and convert it to some other type of struct that is 512 bytes long.
If you have:
unsigned char block[512];
and:
//total struct size: 512 bytes
struct something {
//members here
};
You could do this:
int main(void)
{
struct something *ptr;
ptr = block;
return 0;
}
You can just use the array bytes to hold the structure data. To avoid warnings, cast the address:
ptr = (struct something *)block;
After you assigned memory to the pointer you can use it as you normally would.
note: I may be wrong since I'm a beginner.

void is point to anything, so if you have control of what void * means, than the definition you have is enough.
To access data as a two dimensional array, you just need to cast the type ((char (*)[512])data
or
char *array[512] = data;
byte = array[x][y];
the compiler do not need to know the size of the second dimension, it is your responsibility to make sure your program will never access an invalid location.

Related

How to create a C struct with specific size to send over socket to DalmatinerDB?

I'm trying to create a C client for dalmatinerdb but having trouble to understand how to combine the variables, write it to a buffer and send it to the database. The fact that dalmatinerdb is written in Erlang makes it more difficult. However, by looking at a python client for dalmatinerdb i have (probably) found the necessary variable sizes and order.
The erlang client has a function called "encode", see below:
encode({stream, Bucket, Delay}) when
is_binary(Bucket), byte_size(Bucket) > 0,
is_integer(Delay), Delay > 0, Delay < 256->
<<?STREAM,
Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>;
According to the official dalmatinerdb protocol we can see the following:
-define(STREAM, 4).
-define(DELAY_SIZE, 8). /bits
-define(BUCKET_SS, 8). /bits
Let's say i would like to create this kind of structure in C,
would it look something like the following:
struct package {
unsigned char[1] mode; // = "4"
unsigned char[1] delay; // = for example "5"
unsigned char[1] bucketNameSize; // = "5"
unsigned char[1] bucketName; // for example "Test1"
};
Update:
I realized that the dalmatinerdb frontend (web interface) only reacts and updates when values have been sent to the bucket. With other words just sending the first struct won't give me any clue if it's right or wrong. Therefore I will try to create a secondary struct with the actual values.
The erland code snippet which encodes values looks like this:
encode({stream, Metric, Time, Points}) when
is_binary(Metric), byte_size(Metric) > 0,
is_binary(Points), byte_size(Points) rem ?DATA_SIZE == 0,
is_integer(Time), Time >= 0->
<<?SENTRY,
Time:?TIME_SIZE/?SIZE_TYPE,
(byte_size(Metric)):?METRIC_SS/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?DATA_SS/?SIZE_TYPE, Points/binary>>;
The different sizes:
-define(SENTRY, 5)
-define(TIME_SIZE, 64)
-define(METRIC_SS, 16)
-define(DATA_SS, 32)
Which gives me this gives me:
<<?5,
Time:?64/?SIZE_TYPE,
(byte_size(Metric)):?16/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?32/?SIZE_TYPE, Points/binary>>;
My guess is that my struct containing a value should look like this:
struct Package {
unsigned char sentry;
uint64_t time;
unsigned char metricSize;
uint16_t metric;
unsigned char pointSize;
uint32_t point;
};
Any comments on this structure?
The binary created by the encode function has this form:
<<?STREAM, Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>
First let's replace all the preprocessor macros with their actual values:
<<4, Delay:8/unsigned-integer,
(byte_size(Bucket):8/unsigned-integer, Bucket/binary>>
Now we can more easily see that this binary contains:
a byte of value 4
the value of Delay as a byte
the size of the Bucket binary as a byte
the value of the Bucket binary
Because of the Bucket binary at the end, the overall binary is variable-sized.
A C99 struct that resembles this value can be defined as follows:
struct EncodedStream {
unsigned char mode;
unsigned char delay;
unsigned char bucket_size;
unsigned char bucket[];
};
This approach uses a C99 flexible array member for the bucket field, since its actual size depends on the value set in the bucket_size field, and you are presumably using this structure by allocating memory large enough to hold the fixed-size fields together with the variable-sized bucket field, where bucket itself is allocated to hold bucket_size bytes. You could also replace all uses of unsigned char with uint8_t if you #include <stdint.h>. In traditional C, bucket would be defined as a 0- or 1-sized array.
Update: the OP extended the question with another struct, so I've extended my answer below to cover it too.
The obvious-but-wrong way to write a struct corresponding to the metric/time/points binary is:
struct Wrong {
unsigned char sentry;
uint64_t time;
uint16_t metric_size;
unsigned char metric[];
uint32_t points_size;
unsigned char points[];
};
There are two problems with the Wrong struct:
Padding and alignment: Normally, fields are aligned on natural boundaries corresponding to their sizes. Here, the C compiler will align the time field on an 8-byte boundary, which means there will be padding of 7 bytes following the sentry field. But the Erlang binary contains no such padding.
Illegal flexible array field in the middle: The metric field size can vary, but we can't use the flexible array approach for it as we did in the earlier example because such arrays can only be used for the final field of a struct. The fact that the size of metric can vary means that it's impossible to write a single C struct that matches the Erlang binary.
Solving the padding and alignment issue requires using a packed struct, which you can achieve with compiler support such as the gcc and clang __packed__ attribute (other compilers might have other ways of achieving this). The variable-sized metric field in the middle of the struct can be solved by using two structs instead:
typedef struct __attribute((__packed__)) {
unsigned char sentry;
uint64_t time;
uint16_t size;
unsigned char metric[];
} Metric;
typedef struct __attribute((__packed__)) {
uint32_t size;
unsigned char points[];
} Points;
Packing both structs means their layouts will match the layouts of the corresponding data in the Erlang binary.
There's still a remaining problem, though: endianness. By default, fields in an Erlang binary are big-endian. If you happen to be running your C code on a big-endian machine, then things will just work, but if not — and it's likely you're not — the data values your C code reads and writes won't match Erlang.
Fortunately, endianness is easily handled: you can use byte swapping to write C code that can portably read and write big-endian data regardless of the endianness of the host.
To use the two structs together, you'd first have to allocate enough memory to hold both structs and both the metric and the points variable-length fields. Cast the pointer to the allocated memory — let's call it p — to a Metric*, then use the Metric pointer to store appropriate values in the struct fields. Just make sure you convert the time and size values to big-endian as you store them. You can then calculate a pointer to where the Points struct is in the allocated memory as shown below, assuming p is a pointer to char or unsigned char:
Points* points = (Points*)(p + sizeof(Metric) + <length of Metric.metric>);
Note that you can't just use the size field of your Metric instance for the final addend here since you stored its value as big-endian. Then, once you fill in the fields of the Points struct, again being sure to store the size value as big-endian, you can send p over to Erlang, where it should match what the Erlang system expects.

Copying structure to char* buffer

Basicly i have a custom structure that contains different kind of data. For example:
typedef struct example_structure{
uint8_t* example_1[4];
int example_2[4];
int example_3;
} example_structure;
What i need to do is to copy context of this structure to a const char* buffer so i can send that copied data (buffer) using winsock2's send(SOCKET s, const char* buffer, int len, int flags) function. I tried using memcpy() but wouldn't i just copy address of pointers and not the data?
Yes, if you copied or sent that structure through a socket you would end up copying/sending pointers, which would obviously be meaningless to the recipient, however, if the recipient is running on different hardware (e.g. not the same endian), all of the data may be meaningless anyway. On top of that, differences in the amount of padding between structure members may also become a problem.
For non-trivial situations it is best to use an existing protocol (such as protobuf), or roll your own protocol, keeping in mind the potential differences in hardware representation of your data.
You need to design a protocol before you can encode the data in accord with that protocol. Decide exactly how the data will be encoded at the byte level. Then write code to encode and decode to that format that you decided on.
Do not skip the step of actually documenting the wire protocol at the byte level. It will save you pain later, I promise.
See this answer for a bit more detail.
const char* buffer
This buffer has a constant value so u cant copy anything to it. You probably don't need to copy anything. Just use send function in such a way:
send(s, (char*)&example_structure, sizeof(structure), flags)
But here is the problem with pointers in your structure (uint8_t* example_1[4];).
Sending pointers between different applications / machine does not make sense.
Hmm, your struct contains uint8_t * fields, what looks like C strings... It does not make sense copying or sending a pointer which is just a mere memory address in sending process user space.
If your struct has been (note, no pointers):
typedef struct example_structure{
uint8_t example_1[4];
int example_2[4];
int example_3;
} example_structure;
and provided you transfer it on exactly same architecture (same hardware, same compiler, same compiler options), you could do simply:
example_structure ex_struc;
// initialize the struct
...
send(s, &ex_struc, sizeof(ex_struc), flags);
And even in that case, I would strongly advise you to define and use a protocol - as already said by #DavidSchwartz, it could save you time and headaches later...
But as you have pointers, you cannot do that and must define a protocol.
it could be (but you are free to prefere little endian order, or 2 or 8 bytes for each int depending on your actual data):
one byte (or two) for length of first uint8_t array, followed by the array
above repeated 3 more times
four bytes in big endian order for first int of example_2
repeated 3 times
four bytes in big endian order for int of example_3
This clearly defines the format of a message.

Store struct in array

Im am trying to make some code, which mimics a simple malloc-function (in C), though it should only control the memory of a big array, and not the actual physical memory. To control the "memory", I would like to store segments of META-data in the memory-array. The META-data is stored as a struct.
My question is, how do I correctly store the struct in the bytes of the array? In the example shown here, I try to store some initial META-data on the starting element of the memory-array; however I have syntax wrong to do this.
typedef struct _xMetaData{
size_t xSize;
int* piNextBlock;
int iBlockFree;
}xMetaData;
int8_t memory[ALLOCATE_SIZE];
// Pointer to struct
xMetaData* pxMetaPtr;
xMetaData xInitialData = {BLOCKSIZE, &memory[INITIAL_BLOCK_ADDRESS], BLOCK_FREE};
&memory[0] = xInitialData;
You need to cast the block of memory to xMetaData:
*(xMetaData *) (&memory[0]) = xInitialData;
You should also be aware of structure padding if you're using a struct for this kind of thing (for example, make sure ALLOCATE_SIZE uses sizeof(xMetaData) and not a hardcoded length, and make sure you always access the memory using the struct.)

Accessing array as a struct *

This is one of those I think this should work, but it's best to check questions. It compiles and works fine on my machine.
Is this guaranteed to do what I expect (i.e. allow me to access the first few elements of the array with a guarantee that the layout, alignment, padding etc of the struct is the same as the array)?
struct thingStruct
{
int a;
int b;
int c;
};
void f()
{
int thingsArray[5];
struct thingStruct *thingsStruct = (struct thingStruct *)&thingsArray[0];
thingsArray[0] = 100;
thingsArray[1] = 200;
thingsArray[2] = 300;
printf("%d", thingsStruct->a);
printf("%d", thingsStruct->b);
printf("%d", thingsStruct->c);
}
EDIT: Why on earth would I want to do something like this? I have an array which I'm mmapping to a file. I'm treating the first part of the array as a 'header', which stores various pieces of information about the array, and the rest of it I'm treating as a normal array. If I point the struct to the start of the array I can access the pieces of header data as struct members, which is more readable. All the members in the struct would be of the same type as the array.
While I have seen this done frequently, you cannot (meaning it is not legal, standard C) make assumptions about the binary layout of a structure, as it may have padding between fields.
This is explained in the comp.lang.c faq: http://c-faq.com/struct/padding.htmls
Although it's likely to work in most places, it's still a bit iffy. If you want to give symbolic names to parts of the header, why not just do:
enum { HEADER_A, HEADER_B, HEADER_C };
/* ... */.
printf("%d", thingsArray[HEADER_A]);
printf("%d", thingsArray[HEADER_B]);
printf("%d", thingsArray[HEADER_C]);
As Evan commented on the question, this will probably work in most cases (again, probably best if you use #pragma pack to ensure their is no padding) assuming all the types in your struct are the same type as your array. Given the rules of C, this is legal.
My question to you is "why?" This isn't a particularly safe thing to do. If a float gets thrown into the middle of the struct, this all falls apart. Why not just use the struct directly? This really ins't a technique that I'd recommend in most cases.
Another solution for representing a header and the rest of file data is using a structure like this:
struct header {
long headerData1;
int headerData2;
int headerData3;
int fileData[ 1 ]; // <- data begin here
};
Then you allocate the memory block with a file contents and cast it as struct header *myFileHeader (or map the memory block on a file) and access all your file data with
myFileHeader->fileData[ position ]
for arbitrary big position. The language imposes no restriction on the index value, so it's only your responsibility to keep your arbitrary big posistion within the actual size of the memory block you allocated (or the mapped file's size).
One more important note: apart from switching off the struct members padding, which has been already described by others, you should carefully choose data types for the header members, so that they fit the actual file data layout despite compiler you use (say, int won't change from 32 to 64 bits...)

Reading and Writing Structures [C]

IMPORTANT EDIT:
Sorry everyone, i made a big mistake in the structure.
char *name; is meant to be outside of the structure, written to the file after the structure.
This way, you read the structure, find out the size of the name, then read in the string. Also explains why there is no need for a null terminator.
However, i feel somewhere, my actual question has been answered. If someone would like to edit their responses so i can choose one which is the best fitting i'd appreciate it.
Again, the question I was asking is "If you read in a structure, are you also reading in the data it holds, or do you need to access it some other way".
Sorry for the confusion
For an assignment, I've been tasked with a program which writes and reads structures to a disk (using fread and fwrite).
I'm having trouble grasping the concept.
Lets say we have this structure:
typedef struct {
short nameLength;
char* name;
}attendenceList;
attendenceList names;
now assume we give it this data:
names.name = "John Doe\0";
names.nameLength = strlen(names.name); /*potentially -1?*/
and then we use fwrite... given a file pointer fp.
fwrite(&names,sizeof(names),1,fp);
now we close the file, and open it later to read in the structure.
the question is this: when we read in the structure, are we also reading in the variables it stores?
Can we then now do something like:
if(names.nameLength < 10)
{
...
}
Or do we have to fread something more then just the structure, or assign them somehow?
Assuming the fread is:
fread(&names,sizeof(names),1,fp);
Also assuming we've defined the structure in our current function, as above.
Thanks for the help!
You have a problem here:
fwrite(&names,sizeof(names),1,fp);
Since attendenceList saves the name as a char * this will just write out the pointer, not the actual text. When you read that back in, the memory the pointer is referencing will most likely have something else in it.
You have two choices:
Put a character array (char names[MAXSIZE]) in attendenceList.
Don't write the raw data structure, but write the necessary fields.
You're writing the memory layout of the structure, which includes its members.
You'll get them back if you read the structure back in again - atleast if you do it on the same platform, with a program compiled with the same compiler and compiler settings.
Your name member is declared just as a char, so you can't store a string in it.
If name was a pointer like this:
typedef struct {
short nameLength;
char *name;
}attendenceList;
You really should not read/write the struct to a file. You will write the structure as it's laid out in memory, and that includes the value if the name pointer.
fwrite knows nothing about pointers inside your structure, it will not follow pointers and also write whatever they point to.
when you read the structure back again, you'll read in the address in the name pointer, and that might not point to anything sensible anymore.
If you declare name as an array, you'll be ok, as the array and its content is part of the structure.
typedef struct {
short nameLength;
char name[32];
}attendenceList;
As always, make sure you don't try to copy a string - including its nul terminator- to name that's larger than 32. And when you read it back again. set yourstruct.name[31] = 0; so you are sure the buffer is null terminated.
To write a structure, you'd do
attendenceList my_list;
//initialize my_list
if(fwrite(&my_list,sizeof my_list,1,f) != 1) {
//handle error
}
And to read it back again:
attendenceList my_list;
//initialize my_list
if(fread(&my_list,sizeof my_list,1,f) != 1) {
//handle error
}
}
I'm assuming you meant char* name instead of char name.
Also sizeof(name) will return 4 because you are getting the size of a char* not the length of the char array. So you should write strlen(name) not sizeof(name) inside your fwrite.
In your above example I would recommend storing the string exact size without the null termination. You don't need to store the string length as you can get that after.
If you are reading just a string from a file, and you wrote the exact size without the null termination. Then you need to manually null terminate your buffer after you read the data in.
So make sure you allocate at least the size of your data you are reading in plus 1.
Then you can set the last byte of that array to '\0'.
If you write a whole struct at a time to the buffer, you should be careful because of padding. The padding may not always be the same.
when we read in the structure, are we also reading in the variables it stores?
Yes you are, but the problem you have is that as I mentioned above you will be storing the pointer char* (4 bytes) and not the actual char array. I would recommend storing the struct elements individually.
You ask:
now we close the file, and open it later to read in the structure. the question is this: when we read in the structure, are we also reading in the variables it stores?
No. sizeof(names) is a constant value defined at compile time. It will be the same as
sizeof(short) + sizeof(void*) + some_amount_of_padding_to_align_things
it will NOT include the size of what names.name points to, it will only include the size of the pointer itself.
So you have two problems when writing this to a file.
you aren't actually writing the name string to the file
you are writing a pointer value to the file that will have no meaning when you read it back.
As your code is currently written, When you read back the names, names.name will point to somewhere, but it won't point to "John Doe\0".
What you need to do is to write the string pointed to by names.name instead of the pointer value.
What you need to do is sometimes called "flattening" the structure, You make a structure in memory that contains no pointers, but holds the same data as the structure you want to use, then you write the flattened structure to disk. This is one way to do that.
typedef struct {
short nameLength;
char name[1]; // this will be variable sized at runtime.
}attendenceListFlat;
int cbFlat = sizeof(attendenceListFlat) + strlen(names.name);
attendenceListFlat * pflat = malloc(cbFlat);
pflat->nameLength = names.nameLength;
strcpy(pflat->name, names.name);
fwrite(pflat, cbFlat, 1, fp);
The flattened structure ends with an array that has a minimum size of 1, but when we malloc, we add strlen(names.name) so we can treat that as an array of strlen(names.name)+1 size.
A few things.
Structures are just chunks of memory. It's just taking a bunch of bytes and drawing boundaries on them. Accessing structure elements is just a convenient way of getting a particular memory offset cast as a particular type of data
You are attempting to assign a string to a char type. This will not work. In C, strings are arrays of characters with a NULL byte at the end of them. The easiest way to get this to work is to set a side a fixed buffer for the name. When you create your structure you'll have to copy the name into the buffer (being very careful not to write more bytes than the buffer contains). You can then write/read the buffer from the file in one step.
struct attendanceList {
int namelen;
char name[256]; //fixed size buffer for name
}
Another way you could do it is by having the name be a pointer to a string. This makes what you're trying to do more complicated, because in order to write/read the struct to/from a file, you will have to take into account that the name is stored in a different place in memory. This means two writes and two reads (depending on how you do it) as well as correctly assigning the name pointer to wherever you read the data for the name.
struct attendanceList {
int namelen;
char* name; //the * means "this is a pointer to a char somewhere else in memory"
}
There's a third way you could do it, with a dynamically sized struct using a trick with a zero length array at the end of a struct. Once you know how long the name is, you allocate the correct amount (sizeof(struct attendanceList) + length of string). Then you have it in one contiguous buffer. You just need to remember that sizeof(struct attendanceList) is not the size you need to write/read. This might be a little confusing as a beginning. It is also kind of a hack that's not supported under all compilers.
struct attendanceList {
int namelen;
char name[0]; //this just allows easy access to the data following the struct. Be careful!
}

Resources