how to implement overlap-checking memcpy in C - c

This is a learning exercise. I'm attempting to augment memcpy by notifying the user if the copy operation will pass or fail before it begins. My biggest question is the following. If I allocate two char arrays of 100 bytes each, and have two pointers that reference each array, how do I know which direction I am copying? If I copy everything from the first array to the second how do I ensure that the user will not be overwriting the original array?
My current solution compares the distance of the pointers from the size of the destination array. If the size between is smaller than I say an overwrite will occur. But what if its copying in the other direction? I'm just kind of confused.
int memcpy2(void *target, void *source, size_t nbytes) {
char * ptr1 = (char *)target;
char * ptr2 = (char *)source;
int i, val;
val = abs(ptr1 - ptr2);
printf("%d, %d\n", val, nbytes + 0);
if (val > nbytes) {
for (i = 0; i < val; i++){
ptr1[i] = ptr2[i];
}
return 0; /*success */
}
return -1; /* error */
}
int main(int argc, char **argv){
char src [100] = "Copy this string to dst1";
char dst [20];
int p;
p = memcpy2(dst, src, sizeof(dst));
if (p == 0)
printf("The element\n'%s'\nwas copied to \n'%s'\nSuccesfully\n", src, dst);
else
printf("There was an error!!\n\nWhile attempting to copy the elements:\n '%s'\nto\n'%s', \n Memory was overlapping", src, dst);
return 0;
}

The only portable way to determine if two memory ranges overlap is:
int overlap_p(void *a, void *b, size_t n)
{
char *x = a, *y = b;
for (i=0; i<n; i++) if (x+i==y || y+i==x) return 1;
return 0;
}
This is because comparison of pointers with the relational operators is undefined unless they point into the same array. In reality, the comparison does work on most real-world implementations, so you could do something like:
int overlap_p(void *a, void *b, size_t n)
{
char *x = a, *y = b;
return (x<=y && x+n>y) || (y<=x && y+n>x);
}
I hope I got that logic right; you should check it. You can simplify it even more if you want to assume you can take differences of arbitrary pointers.

What you want to check is the position in memory of the source relatively to the destination:
If the source is ahead of the destination (ie. source < destination), then you should start from the end. If the source is after, you start from the beginning. If they are equal, you don't have to do anything (trivial case).
Here are some crude ASCII drawings to visualize the problem.
|_;_;_;_;_;_| (source)
|_;_;_;_;_;_| (destination)
>-----^ start from the end to shift the values to the right
|_;_;_;_;_;_| (source)
|_;_;_;_;_;_| (destination)
^-----< start from the beginning to shift the values to the left
Following a very accurate comment below, I should add that you can use the difference of the pointers (destination - source), but to be on the safe side cast those pointers to char * beforehand.
In your current setting, I don't think that you can check if the operation will fail. Your memcpy prototype prevents you from doing any form of checking for that, and with the rule given above for deciding how to copy, the operation will succeed (outside of any other considerations, like prior memory corruption or invalid pointers).

I don't believe that "attempting to augment memcpy by notifying the user if the copy operation will pass or fail before it begins." is a well-formed notion.
First, memcpy() doesn't succeed or fail in the normal sense. It just copies the data, which might cause a fault/exception if it reads outside the source array or writes outside the destination array, and it might also read or write outside one of those arrays without causing any fault/exception and just silently corrupting data. When I say "memcpy does this" I'm not talking just about the implementation of the C stdlib memcpy, but about any function with the same signature -- it doesn't have enough information to do otherwise.
Second, if your definition of "succeed" is "assuming the buffers are big enough but may be overlapping, copy the data from source to dst without tripping over yourself while copying" -- that is indeed what memmove() does, and it's always possible. Again, there's no "return failure" case. If the buffers don't overlap it's easy, if the source is overlapping the end of the destination then you just copy byte by byte from the beginning; if the source is overlapping the beginning of the destination then you just copy byte by byte from the end. Which is what memmove() does.
Third, when writing this kind of code, you have to be very careful about overflow cases for your pointer arithmetic (including addition, subtraction, and array indexing). In val = abs(ptr1 - ptr2), ptr1 - ptr2 could be a very large number, and it will be unsigned, so abs() won't do anything to it, and int is the wrong type to store that in. Just so you know.

Related

Abort trap: 6 error with arrays in c

The following code compiled fine yesterday for a while, started giving the abort trap: 6 error at one point, then worked fine again for a while, and again started giving the same error. All the answers I've looked up deal with strings of some fixed specified length. I'm not very experienced in programming so any help as to why this is happening is appreciated. (The code is for computing the Zeckendorf representation.)
If I simply use printf to print the digits one by one instead of using strings the code works fine.
#include <string.h>
// helper function to compute the largest fibonacci number <= n
// this works fine
void maxfib(int n, int *index, int *fib) {
int fib1 = 0;
int fib2 = 1;
int new = fib1 + fib2;
*index = 2;
while (new <= n) {
fib1 = fib2;
fib2 = new;
new = fib1 + fib2;
(*index)++;
if (new == n) {
*fib = new;
}
}
*fib = fib2;
(*index)--;
}
char *zeckendorf(int n) {
int index;
int newindex;
int fib;
char *ans = ""; // I'm guessing the error is coming from here
while (n > 0) {
maxfib(n, &index, &fib);
n -= fib;
maxfib(n, &newindex, &fib);
strcat(ans, "1");
for (int j = index - 1; j > newindex; j--) {
strcat(ans, "0");
}
}
return ans;
}
Your guess is quite correct:
char *ans = ""; // I'm guessing the error is coming from here
That makes ans point to a read-only array of one character, whose only element is the string terminator. Trying to append to this will write out of bounds and give you undefined behavior.
One solution is to dynamically allocate memory for the string, and if you don't know the size beforehand then you need to reallocate to increase the size. If you do this, don't forget to add space for the string terminator, and to free the memory once you're done with it.
Basically, you have two approaches when you want to receive a string from function in C
Caller allocates buffer (either statically or dynamically) and passes it to the callee as a pointer and size. Callee writes data to buffer. If it fits, it returns success as a status. If it does not fit, returns error. You may decide that in such case either buffer is untouched or it contains all data fitting in the size. You can choose whatever suits you better, just document it properly for future users (including you in future).
Callee allocates buffer dynamically, fills the buffer and returns pointer to the buffer. Caller must free the memory to avoid memory leak.
In your case the zeckendorf() function can determine how much memory is needed for the string. The index of first Fibonacci number less than parameter determines the length of result. Add 1 for terminating zero and you know how much memory you need to allocate.
So, if you choose first approach, you need to pass additional two parameters to zeckendorf() function: char *buffer and int size and write to the buffer instead of ans. And you need to have some marker to know if it's first iteration of the while() loop. If it is, after maxfib(n, &index, &fib); check the condition index+1<=size. If condition is true, you can proceed with your function. If not, you can return error immediately.
For second approach initialize the ans as:
char *ans = NULL;
after maxfib(n, &index, &fib); add:
if(ans==NULL) {
ans=malloc(index+1);
}
and continue as you did. Return ans from function. Remember to call free() in caller, when result is no longer needed to avoid memory leak.
In both cases remember to write the terminating \0 to buffer.
There is also a third approach. You can declare ans as:
static char ans[20];
inside zeckendorf(). Function shall behave as in first approach, but the buffer and its size is already hardcoded. I recommend to #define BUFSIZE 20 and either declare variable as static char ans[BUFSIZE]; and use BUFSIZE when checking available size. Please be aware that it works only in single threaded environment. And every call to zeckendorf() will overwrite the previous result. Consider following code.
char *a,*b;
a=zeckendorf(10);
b=zeckendorf(15);
printf("%s\n",a);
printf("%s\n",b);
The zeckendorf() function always return the same pointer. So a and b would pointer to the same buffer, where the string for 15 would be stored. So, you either need to store the result somewhere, or do processing in proper order:
a=zeckendorf(10);
printf("%s\n",a);
b=zeckendorf(15);
printf("%s\n",b);
As a rule of thumb majority (if not all) Linux standard C library function uses either first or third approach.

Copying unsigned char in C

I want to use memcpy but it seems to me that it's copying the array from the start?
I wish to copy from A[a] to A[b]. So, instead I found an alternative way,
void copy_file(char* from, int offset, int bytes, char* to) {
int i;
int j = 0;
for (i = offset; i <= (offset+bytes); i++) to[i] = from[j++];
}
I'm getting seg faults but I don't know where I am getting this seg fault from?
each entry holds 8 bytes so my second attempt was
void copy_file(char* from, int offset, int bytes, char* to) {
int i;
int j = 0;
for (i = 8*offset; i <= 8*(offset+bytes); i++) to[i] = from[j++];
}
but still seg fault. If you need more information please don't hesitate to ask!
I'm getting seg faults but I don't know where I am getting this seg fault from?
Primary Suggestion: Learn to use a debugger. It provides helpful information about erroneous instruction(s).
To answer you query on the code snippet shown on above question,
Check the incoming pointers (to and from) against NULL before dereferencing them.
Put a check on the boundary limits for indexes used. Currently they can overrun the allocated memory.
To use memcpy() properly:
as per the man page, the signature of memcpy() indicates
void *memcpy(void *dest, const void *src, size_t n);
it copies n bytes from address pointer by src to address pointed by dest.
Also, a very very important point to note:
The memory areas must not overlap.
So, to copy A[a] to A[b], you may write something like
memcpy(destbuf, &A[a], (b-a) );
it seems to me that memcpy copying the array from the start
No, it does not. In fact, memcpy does not have a slightest idea that it is copying from or to an array. It treats its arguments as pointers to unstructured memory blocks.
If you wish to copy from A[a] to A[b], pass an address of A[a] and the number of bytes between A[b] and A[a] to memcpy, like this:
memcpy(Dest, &A[a], (b-a) * sizeof(A[0]));
This would copy the content of A from index a, inclusive, to index b, exclusive, into a memory block pointed to by Dest. If you wish to apply an offset to Dest as well, use &Dest[d] for the first parameter. Multiplication by sizeof is necessary for arrays of types other than char, signed or unsigned.
Change the last line from
for (i = offset; i <= (offset+bytes); i++)
to[i] = from[j++];
to
for (i = offset; i <= bytes; i++,j++)
to[j] = from[i];
This works fine for me. I have considered offset as the start of the array and byte as the end of the array. ie to copy from[offset] to from[bytes] to to[].

C: create String existing out of ints and chars

Since I'm very new to C programming, I have a probably very simple problem.
I got a struct looking like this
typedef struct Vector{
int a;
int b;
int c;
}Vector;
Now I want to write an array of Vectors in a file. To achieve that, I thought to create following method scheme
String createVectorString(Vector vec){
// (1)
}
String createVectorArrayString(Vector arr[]){
int i;
String arrayString;
for(i=0; i<sizeof(arr); i++){
//append createVectorString(arr[i]) to arrayString (2)
}
}
void writeInFile(Vector arr[]){
FILE *file;
file = fopen("sorted_vectors.txt", "a+");
fprintf(file, "%s", createVectorArrayString(arr);
fclose(file);
}
int main(void){
// create here my array of Vectors (this has already been made and is not part of the question)
// then call writeInFile
return 0;
}
My main problems are at (1), which involves also (2) (since I have no clue how to work with Strings in C, eclipse is saying "Type "String" unknown", although I included <string.h>)
So I read at some point that transforming an int to a String is possible with the method itoa().
As I understood it, I can simply do following
char buf[33];
int a = 5;
itoa(a, buf, 10)
However, I cannot bring that to work, let alone that I can't figure out how to "paste" chars or ints into a String.
In my point (1), I would like to create a String of the Form (a,b,c), where as a, b and c are the "fields" of my struct Vector.
In point (2), I would like to create a single String of the Form (a1,b1,c1)\n(a2,b2,c2)\n...(an,bn,cn), whereby n is the amount of Vectors in the array.
Is there a quick solution? Do I confuse the concept of Strings from Java with them of C?
Yes, you do confuse the concept of strings in Java and C.
The C strings are rather inconvenient to work with. They require dynamic memory allocation, and what is worse, corresponding deallocation (which is possible but tedious). In your case, it might be best to remove strings completely, and implement whatever you need without strings.
To write a vector directly to file:
Vector vec;
FILE* file = ...;
fprintf(file, "%d %d %\n", vec.a, vec.b, vec.c);
To write an array of vectors, just do the above in a loop.
A string, in C, is just a null-terminated array of characters. It is generally declared as a char *, though if you have a fixed maximum length, and can allocate it on the stack or inline in a structure, it might be declared as char str[LENGTH].
One of the easiest ways to build a string out of a mix of characters and numbers is to use snprintf(). This is like printf(), but instead of printing to standard output, will print into a string (an array of char). Note that you need to allocate and pass in the buffer yourself; so you will either need to know the maximum length beforehand, or find out by trying to call snprintf(), finding out how many characters it would print, allocating an array of that size, and calling snprintf() again to actually print the result.
So if you have a vector of three integers, and want to build a string out of it, you could write:
char *createVectorString(Vector vec){
int count = snprintf(NULL, 0, "(%d,%d,%d)", vec.a, vec.b, vec.c);
if (count < 0)
return NULL;
char *result = malloc(count * sizeof(char));
if (result == NULL)
return NULL;
count = snprintf(result, count, "(%d,%d,%d)", vec.a, vec.b, vec.c);
if (count < 0)
return NULL;
return result;
}
Note that because you called malloc() to allocate this buffer, you will need to call free() once you are done with it, to avoid a memory leak.
Note that snprintf() only returns the length that you need as of C99. Some compilers (like MSVC), don't support C99 yet, so they return -1 instead of the length that the string would be. In those cases, there may be another function that you can call to determine the size of buffer you need (in MSVC, it's _vscprintf), or you may need to just guess at a size, and if that doesn't work, allocate a buffer twice that size and try again, until it succeeds.
In short: yes, you are confusing Java Strings with C, where you do not have standard string type. What is a string is in reality a sequence of chars terminated with a char with value 0 (or '\0', if you want to be purist).
The quickest solution is to not generate strings (and manually allocate all the memory), but rather to use fprintf with FILE*. Instead of functions to create strings, write functions to write various things into supplied FILE*, for example int writeVector(FILE* output, Vector v). It will be easier for the beginning. I don't think all the gory details of manual memory management required for constructing such strings are good start.
(Note the return type of int in proposed prototype; this is for error codes.)
Additionally, as one of the commenters noted, you misunderstand sizeof. sizeof(arr) would return size of all the elements of the array combined, in bytes (well, technically in chars, but it's a distinction you don't need to worry about right now). To get number of elements in an array, you'd need to use sizeof(arr)/sizeof(arr[0]). But I'm not sure it would work with your function argument, which is technically a pointer, despite the fancy syntax. Applying sizeof to pointer will return size of the pointer itself, not the data it points to.
Which is why in C you would usually provide size of an array in an extra function argument, like:
String createVectorArrayString(Vector arr[], size_t n)
or more in line with what I wrote above:
int writeVectorArray(FILE *output, Vector arr[], size_t n)
{
int retcode = 0;
size_t i;
for (i = 0; i < n; ++i) {
if ( (retcode = writeVector(output, arr[i])) != 0)
return retcode;
}
}
Yes, you are confusing Java Strings with C.
you can't pass arrays in C, only pointers to the first element.
sizeof (arr) where arr is a function argument is the size of the pointer.
You can't return a block scope String, only a pointer to a string. But pointers to local automatic variables go out of scope when the function returns.
I'd write a loop more along
#define N 42
/* Typedef for Vector assumed somewhere.*/
Vector arr[N];
/* Fill arr[]. */
for (i = 0; i < N; ++i) {
fprintf (file, "arr[%d] = { a=%d, b=%d, c=%d }\n", i, arr[i].a, arr[i].b, arr[i].c);
}

How can I copy a repeating pattern into a memory buffer?

I want write a repeating pattern of bytes into a block of memory. My idea is to write the first example of the pattern, and then copy it into the rest of the buffer. For example, if I start with this:
ptr: 123400000000
Afterward, I want it to look like this:
ptr: 123412341234
I thought I could use memcpy to write to intersecting regions, like this:
memcpy(ptr + 4, ptr, 8);
The standard does not specify what order the copy will happen in, so if some implementation makes it copy in reverse order, it can give different results:
ptr: 123412340000
or even combined results.
Is there any workaround that lets me still use memcpy, or do I have to implement my own for loop? Note that I cannot use memmove because it does exactly what I'm trying to avoid; it make the ptr be 123412340000, while I want 123412341234.
I program for Mac/iPhone(clang compiler) but a general answer will be good too.
There is no standard function to repeat a pattern of bytes upon a memory range. You can use the memset_pattern* function family to get fixed-size patterns; if you need the size to vary, you'll have to roll your own.
// fills the 12 first bytes at `ptr` with the 4 first bytes of `ptr`
memset_pattern4(ptr, ptr, 12);
Be aware that memset_pattern4, memset_pattern8 and memset_pattern16 exist only on Mac OS/iOS, so don't use them for cross-platform development.
Otherwise, rolling a (cross-platform) function that does a byte-per-byte copy is pretty easy.
void byte_copy(void* into, void* from, size_t size)
{
for (size_t i = 0; i < size; i++)
into[i] = from[i];
}
Here is what kernel.org says:
The memcpy() function copies n bytes
from memory area src to memory area
dest. The memory areas must not
overlap. Use memmove(3) if the
memory areas do overlap.
An here is what MSDN says:
If the source and destination overlap,
the behavior of memcpy is undefined.
Use memmove to handle overlapping
regions.
The C++ answer for all platforms is std::fill_n(destination, elementRepeats, elementValue).
For what you've asked for:
short val = 0x1234;
std::fill_n(ptr, 3, val);
This will work for val of any type; chars, shorts, ints, int64_t, etc.
Old answer
You want memmove(). Full description:
The memmove() function shall copy n bytes from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n bytes from the object pointed to by s2 are first copied into a temporary array of n bytes that does not overlap the objects pointed to by s1 and s2, and then the n bytes from the temporary array are copied into the object pointed to by s1.
From the memcpy() page:
If copying takes place between objects that overlap, the behaviour is undefined.
You have to use memmove() anyway. This is because the result of using memcpy() is not reliable in any way.
Relevant bits to the actual question
You're asking for memcpy(ptr + 4, ptr, 8); which says copy 8 bytes from ptr and put them at ptr+4. ptr is 123400000000, the first 8 bytes are 1234000, so it is doing this:
Original : 123400000000
Writes : 12340000
Result : 123412340000
You'd need to call:
memcpy(ptr+4, ptr, 4);
memcpy(ptr+8, ptr, 4);
To achieve what you're after. Or implement an equivalent. This ought to do it, but it is untested, and is equivalent to memcpy; you'll need to either add the extra temporary buffer or use two non-overlapping areas of memory.
void memexpand(void* result, const void* start,
const uint64_t cycle, const uint64_t limit)
{
uint64_t count = 0;
uint8_t* source = start;
uint8_t* dest = result;
while ( count < limit )
{
*dest = *source;
dest++;
count++;
if ( count % cycle == 0 )
{
source = start;
}
else
{
source++;
}
}
}
You can do that by copying once, and then memcpy everything to copied to the following bytes and repeat that, it's better understood in code:
void block_memset(void *destination, const void *source, size_t source_size, size_t repeats) {
memcpy(destination,source,source_size);
for (size_t i = 1; i < repeats; i += i)
memcpy(destination + i,destination,source_size * (min(i,repeats - i)));
}
I benchmarked; it's as fast as regular memset for large number of repeats, and the source_size is quite dynamic without much performance penalty too.
Why not just allocate an 8 byte buffer, move it there, then move it back to where you want it? (As #cnicutar says, you shouldn't have overlapping address spaces for memcpy.)

How do I declare an array of undefined or no initial size?

I know it could be done using malloc, but I do not know how to use it yet.
For example, I wanted the user to input several numbers using an infinite loop with a sentinel to put a stop into it (i.e. -1), but since I do not know yet how many he/she will input, I have to declare an array with no initial size, but I'm also aware that it won't work like this int arr[]; at compile time since it has to have a definite number of elements.
Declaring it with an exaggerated size like int arr[1000]; would work but it feels dumb (and waste memory since it would allocate that 1000 integer bytes into the memory) and I would like to know a more elegant way to do this.
This can be done by using a pointer, and allocating memory on the heap using malloc.
Note that there is no way to later ask how big that memory block is. You have to keep track of the array size yourself.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv)
{
/* declare a pointer do an integer */
int *data;
/* we also have to keep track of how big our array is - I use 50 as an example*/
const int datacount = 50;
data = malloc(sizeof(int) * datacount); /* allocate memory for 50 int's */
if (!data) { /* If data == 0 after the call to malloc, allocation failed for some reason */
perror("Error allocating memory");
abort();
}
/* at this point, we know that data points to a valid block of memory.
Remember, however, that this memory is not initialized in any way -- it contains garbage.
Let's start by clearing it. */
memset(data, 0, sizeof(int)*datacount);
/* now our array contains all zeroes. */
data[0] = 1;
data[2] = 15;
data[49] = 66; /* the last element in our array, since we start counting from 0 */
/* Loop through the array, printing out the values (mostly zeroes, but even so) */
for(int i = 0; i < datacount; ++i) {
printf("Element %d: %d\n", i, data[i]);
}
}
That's it. What follows is a more involved explanation of why this works :)
I don't know how well you know C pointers, but array access in C (like array[2]) is actually a shorthand for accessing memory via a pointer. To access the memory pointed to by data, you write *data. This is known as dereferencing the pointer. Since data is of type int *, then *data is of type int. Now to an important piece of information: (data + 2) means "add the byte size of 2 ints to the adress pointed to by data".
An array in C is just a sequence of values in adjacent memory. array[1] is just next to array[0]. So when we allocate a big block of memory and want to use it as an array, we need an easy way of getting the direct adress to every element inside. Luckily, C lets us use the array notation on pointers as well. data[0] means the same thing as *(data+0), namely "access the memory pointed to by data". data[2] means *(data+2), and accesses the third int in the memory block.
The way it's often done is as follows:
allocate an array of some initial (fairly small) size;
read into this array, keeping track of how many elements you've read;
once the array is full, reallocate it, doubling the size and preserving (i.e. copying) the contents;
repeat until done.
I find that this pattern comes up pretty frequently.
What's interesting about this method is that it allows one to insert N elements into an empty array one-by-one in amortized O(N) time without knowing N in advance.
Modern C, aka C99, has variable length arrays, VLA. Unfortunately, not all compilers support this but if yours does this would be an alternative.
Try to implement dynamic data structure such as a linked list
Here's a sample program that reads stdin into a memory buffer that grows as needed. It's simple enough that it should give some insight in how you might handle this kind of thing. One thing that's would probably be done differently in a real program is how must the array grows in each allocation - I kept it small here to help keep things simpler if you wanted to step through in a debugger. A real program would probably use a much larger allocation increment (often, the allocation size is doubled, but if you're going to do that you should probably 'cap' the increment at some reasonable size - it might not make sense to double the allocation when you get into the hundreds of megabytes).
Also, I used indexed access to the buffer here as an example, but in a real program I probably wouldn't do that.
#include <stdlib.h>
#include <stdio.h>
void fatal_error(void);
int main( int argc, char** argv)
{
int buf_size = 0;
int buf_used = 0;
char* buf = NULL;
char* tmp = NULL;
char c;
int i = 0;
while ((c = getchar()) != EOF) {
if (buf_used == buf_size) {
//need more space in the array
buf_size += 20;
tmp = realloc(buf, buf_size); // get a new larger array
if (!tmp) fatal_error();
buf = tmp;
}
buf[buf_used] = c; // pointer can be indexed like an array
++buf_used;
}
puts("\n\n*** Dump of stdin ***\n");
for (i = 0; i < buf_used; ++i) {
putchar(buf[i]);
}
free(buf);
return 0;
}
void fatal_error(void)
{
fputs("fatal error - out of memory\n", stderr);
exit(1);
}
This example combined with examples in other answers should give you an idea of how this kind of thing is handled at a low level.
One way I can imagine is to use a linked list to implement such a scenario, if you need all the numbers entered before the user enters something which indicates the loop termination. (posting as the first option, because have never done this for user input, it just seemed to be interesting. Wasteful but artistic)
Another way is to do buffered input. Allocate a buffer, fill it, re-allocate, if the loop continues (not elegant, but the most rational for the given use-case).
I don't consider the described to be elegant though. Probably, I would change the use-case (the most rational).

Resources