How can I copy a repeating pattern into a memory buffer? - c

I want write a repeating pattern of bytes into a block of memory. My idea is to write the first example of the pattern, and then copy it into the rest of the buffer. For example, if I start with this:
ptr: 123400000000
Afterward, I want it to look like this:
ptr: 123412341234
I thought I could use memcpy to write to intersecting regions, like this:
memcpy(ptr + 4, ptr, 8);
The standard does not specify what order the copy will happen in, so if some implementation makes it copy in reverse order, it can give different results:
ptr: 123412340000
or even combined results.
Is there any workaround that lets me still use memcpy, or do I have to implement my own for loop? Note that I cannot use memmove because it does exactly what I'm trying to avoid; it make the ptr be 123412340000, while I want 123412341234.
I program for Mac/iPhone(clang compiler) but a general answer will be good too.

There is no standard function to repeat a pattern of bytes upon a memory range. You can use the memset_pattern* function family to get fixed-size patterns; if you need the size to vary, you'll have to roll your own.
// fills the 12 first bytes at `ptr` with the 4 first bytes of `ptr`
memset_pattern4(ptr, ptr, 12);
Be aware that memset_pattern4, memset_pattern8 and memset_pattern16 exist only on Mac OS/iOS, so don't use them for cross-platform development.
Otherwise, rolling a (cross-platform) function that does a byte-per-byte copy is pretty easy.
void byte_copy(void* into, void* from, size_t size)
{
for (size_t i = 0; i < size; i++)
into[i] = from[i];
}

Here is what kernel.org says:
The memcpy() function copies n bytes
from memory area src to memory area
dest. The memory areas must not
overlap. Use memmove(3) if the
memory areas do overlap.
An here is what MSDN says:
If the source and destination overlap,
the behavior of memcpy is undefined.
Use memmove to handle overlapping
regions.

The C++ answer for all platforms is std::fill_n(destination, elementRepeats, elementValue).
For what you've asked for:
short val = 0x1234;
std::fill_n(ptr, 3, val);
This will work for val of any type; chars, shorts, ints, int64_t, etc.

Old answer
You want memmove(). Full description:
The memmove() function shall copy n bytes from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n bytes from the object pointed to by s2 are first copied into a temporary array of n bytes that does not overlap the objects pointed to by s1 and s2, and then the n bytes from the temporary array are copied into the object pointed to by s1.
From the memcpy() page:
If copying takes place between objects that overlap, the behaviour is undefined.
You have to use memmove() anyway. This is because the result of using memcpy() is not reliable in any way.
Relevant bits to the actual question
You're asking for memcpy(ptr + 4, ptr, 8); which says copy 8 bytes from ptr and put them at ptr+4. ptr is 123400000000, the first 8 bytes are 1234000, so it is doing this:
Original : 123400000000
Writes : 12340000
Result : 123412340000
You'd need to call:
memcpy(ptr+4, ptr, 4);
memcpy(ptr+8, ptr, 4);
To achieve what you're after. Or implement an equivalent. This ought to do it, but it is untested, and is equivalent to memcpy; you'll need to either add the extra temporary buffer or use two non-overlapping areas of memory.
void memexpand(void* result, const void* start,
const uint64_t cycle, const uint64_t limit)
{
uint64_t count = 0;
uint8_t* source = start;
uint8_t* dest = result;
while ( count < limit )
{
*dest = *source;
dest++;
count++;
if ( count % cycle == 0 )
{
source = start;
}
else
{
source++;
}
}
}

You can do that by copying once, and then memcpy everything to copied to the following bytes and repeat that, it's better understood in code:
void block_memset(void *destination, const void *source, size_t source_size, size_t repeats) {
memcpy(destination,source,source_size);
for (size_t i = 1; i < repeats; i += i)
memcpy(destination + i,destination,source_size * (min(i,repeats - i)));
}
I benchmarked; it's as fast as regular memset for large number of repeats, and the source_size is quite dynamic without much performance penalty too.

Why not just allocate an 8 byte buffer, move it there, then move it back to where you want it? (As #cnicutar says, you shouldn't have overlapping address spaces for memcpy.)

Related

Copying unsigned char in C

I want to use memcpy but it seems to me that it's copying the array from the start?
I wish to copy from A[a] to A[b]. So, instead I found an alternative way,
void copy_file(char* from, int offset, int bytes, char* to) {
int i;
int j = 0;
for (i = offset; i <= (offset+bytes); i++) to[i] = from[j++];
}
I'm getting seg faults but I don't know where I am getting this seg fault from?
each entry holds 8 bytes so my second attempt was
void copy_file(char* from, int offset, int bytes, char* to) {
int i;
int j = 0;
for (i = 8*offset; i <= 8*(offset+bytes); i++) to[i] = from[j++];
}
but still seg fault. If you need more information please don't hesitate to ask!
I'm getting seg faults but I don't know where I am getting this seg fault from?
Primary Suggestion: Learn to use a debugger. It provides helpful information about erroneous instruction(s).
To answer you query on the code snippet shown on above question,
Check the incoming pointers (to and from) against NULL before dereferencing them.
Put a check on the boundary limits for indexes used. Currently they can overrun the allocated memory.
To use memcpy() properly:
as per the man page, the signature of memcpy() indicates
void *memcpy(void *dest, const void *src, size_t n);
it copies n bytes from address pointer by src to address pointed by dest.
Also, a very very important point to note:
The memory areas must not overlap.
So, to copy A[a] to A[b], you may write something like
memcpy(destbuf, &A[a], (b-a) );
it seems to me that memcpy copying the array from the start
No, it does not. In fact, memcpy does not have a slightest idea that it is copying from or to an array. It treats its arguments as pointers to unstructured memory blocks.
If you wish to copy from A[a] to A[b], pass an address of A[a] and the number of bytes between A[b] and A[a] to memcpy, like this:
memcpy(Dest, &A[a], (b-a) * sizeof(A[0]));
This would copy the content of A from index a, inclusive, to index b, exclusive, into a memory block pointed to by Dest. If you wish to apply an offset to Dest as well, use &Dest[d] for the first parameter. Multiplication by sizeof is necessary for arrays of types other than char, signed or unsigned.
Change the last line from
for (i = offset; i <= (offset+bytes); i++)
to[i] = from[j++];
to
for (i = offset; i <= bytes; i++,j++)
to[j] = from[i];
This works fine for me. I have considered offset as the start of the array and byte as the end of the array. ie to copy from[offset] to from[bytes] to to[].

C - Append strings until end of allocated memory

Let's consider following piece of code:
int len = 100;
char *buf = (char*)malloc(sizeof(char)*len);
printf("Appended: %s\n",struct_to_string(some_struct,buf,len));
Someone allocated amount of memory in order to get it filled with string data. The problem is that string data taken from some_struct could be ANY length. So what i want to achieve is to make struct_to_string function do the following:
Do not allocate any memory that goes outside (so, buf has to be allocated outside of the function, and passed)
Inside the struct_to_string I want to do something like:
char* struct_to_string(const struct type* some_struct, char* buf, int len) {
//it will be more like pseudo code to show the idea :)
char var1_name[] = "int l1";
buf += var1_name + " = " + some_struct->l1;
//when l1 is a int or some non char, I need to cast it
char var2_name[] = "bool t1";
buf += var2_name + " = " + some_struct->t1;
// buf+= (I mean appending function) should check if there is a place in a buf,
//if there is not it should fill buf with
//as many characters as possible (without writting to memory) and stop
//etc.
return buf;
}
Output should be like:
Appended: int l1 = 10 bool t1 = 20 //if there was good amount of memory allocated or
ex: Appended: int l1 = 10 bo //if there was not enough memory allocated
To sum up:
I need a function (or couple of functions) that adds given strings to the base string without overwritting base string;
do nothing when base string memory is full
I can not use C++ libraries
Another things that I could ask but are not so important right now:
Is there a way (in C) iterate through structure variable list to get their names, or at least to get their values without their names? (for example iterate through structure like through array ;d)
I do not normally use C, but for now I'm obligated to do, so I have very basic knowledge.
(sorry for my English)
Edit:
Good way to solve that problem is shown in post below: stackoverflow.com/a/2674354/2630520
I'd say all you need is the standard strncat function defined in the string.h header.
About the 'iterate through structure variable list' part, I'm not exactly sure what you mean. If your talking about iterating over the structure's members, a short answer would be : you can't introspect C structs for free.
You need to know beforehand what structure type you're using so that the compiler know at what offset in the memory it can find each member of your struct. Otherwise it's just an array of bytes like any other.
Don't mind asking if I wasn't clear enough or if you want more details.
Good luck.
So basically I did it like here: stackoverflow.com/a/2674354/2630520
int struct_to_string(const struct struct_type* struct_var, char* buf, const int len)
{
unsigned int length = 0;
unsigned int i;
length += snprintf(buf+length, len-length, "v0[%d]", struct_var->v0);
length += other_struct_to_string(struct_var->sub, buf+length, len-length);
length += snprintf(buf+length, len-length, "v2[%d]", struct_var->v2);
length += snprintf(buf+length, len-length, "v3[%d]", struct_var->v3);
....
return length;
}
snprintf writes as much as possible and discards everything left, so it was exactly what I was looking for.

What is the most efficient way to implement mutable strings in C?

I'm currently implementing a very simple JSON parser in C and I would like to be able to use mutable strings (I can do it without mutable strings, however I would like to learn the best way of doing them anyway). My current method is as follows:
char * str = calloc(0, sizeof(char));
//The following is executed in a loop
int length = strlen(str);
str = realloc(str, sizeof(char) * (length + 2));
//I had to reallocate with +2 in order to ensure I still had a zero value at the end
str[length] = newChar;
str[length + 1] = 0;
I am comfortable with this approach, however it strikes me as a little inefficient given that I am always only appending one character each time (and for the sake of argument, I'm not doing any lookaheads to find the final length of my string). The alternative would be to use a linked list:
struct linked_string
{
char character;
struct linked_string * next;
}
Then, once I've finished processing I can find the length, allocate a char * of the appropriate length, and iterate through the linked list to create my string.
However, this approach seems memory inefficient, because I have to allocate memory for both each character and the pointer to the following character. Therefore my question is two-fold:
Is creating a linked list and then a C-string faster than reallocing the C-string each time?
If so, is the gained speed worth the greater memory overhead?
The standard way for dynamic arrays, regardless of whether you store chars or something else, is to double the capacity when you grow it. (Technically, any multiple works, but doubling is easy and strikes a good balance between speed and memory.) You should also ditch the 0 terminator (add one at the end if you need to return a 0 terminated string) and keep track of the allocated size (also known as capacity) and the number of characters actually stored. Otherwise, your loop has quadratic time complexity by virtue of using strlen repeatedly (Shlemiel the painter's algorithm).
With these changes, time complexity is linear (amortized constant time per append operation) and practical performance is quite good for a variety of ugly low-level reasons.
The theoretical downside is that you use up to twice as much memory as strictly necessary, but the linked list needs at least five times as much memory for the same amount of characters (with 64 bit pointers, padding and typical malloc overhead, more like 24 or 32 times). It's not usually a problem in practice.
No, linked lists are most certainly not "faster" (how and wherever you measure such a thing). This is a terrible overhead.
If you really find that your current approach is a bottleneck, you could always allocate or reallocate your strings in sizes of powers of 2. Then you would only have to do realloc when you cross such a boundary for the total size of the char array.
I would suggest that it would be reasonable to read the entire set of text into one memory allocation, then go through and NUL terminate each string. Then count the number of strings, and make a array of pointers to each of the strings. That way you have one memory allocation for the text area, and one for the array of pointers.
Most implementations of variable-length arrays/strings/whatever have a size and a capacity. The capacity is the allocated size, and the size is what's actually used.
struct mutable_string {
char* data;
size_t capacity;
};
Allocating a new string looks like this:
#define INITIAL_CAPACITY 10
mutable_string* mutable_string_create_empty() {
mutable_string* str = malloc(sizeof(mutable_string));
if (!str) return NULL;
str->data = calloc(INITIAL_CAPACITY, 1);
if (!str->data) { free(str); return NULL; }
str->capacity = INITIAL_CAPACITY;
return str;
}
Now, any time you need to add a character to the string, you'd do this:
int mutable_string_concat_char(mutable_string* str, char chr) {
size_t len = strlen(str->data);
if (len < str->capacity) {
str->data[len] = chr;
return 1; //Success
}
size_t new_capacity = str->capacity * 2;
char* new_data = realloc(str->data, new_capacity);
if (!new_data) return 0;
str->data = new_data;
str->data[len] = chr;
str->data[len + 1] = '\0';
str->capacity = new_capacity;
}
The linked list approach is worse because:
You still need to do an allocation call every time you add a character;
It consumes a LOT more memory. This approach consumes up to sizeof(size_t) + (string_length + 1) * 2. That approach consumes string_length * sizeof(linked_string).
Generally, linked lists are less cache-friendly than arrays.

how to implement overlap-checking memcpy in C

This is a learning exercise. I'm attempting to augment memcpy by notifying the user if the copy operation will pass or fail before it begins. My biggest question is the following. If I allocate two char arrays of 100 bytes each, and have two pointers that reference each array, how do I know which direction I am copying? If I copy everything from the first array to the second how do I ensure that the user will not be overwriting the original array?
My current solution compares the distance of the pointers from the size of the destination array. If the size between is smaller than I say an overwrite will occur. But what if its copying in the other direction? I'm just kind of confused.
int memcpy2(void *target, void *source, size_t nbytes) {
char * ptr1 = (char *)target;
char * ptr2 = (char *)source;
int i, val;
val = abs(ptr1 - ptr2);
printf("%d, %d\n", val, nbytes + 0);
if (val > nbytes) {
for (i = 0; i < val; i++){
ptr1[i] = ptr2[i];
}
return 0; /*success */
}
return -1; /* error */
}
int main(int argc, char **argv){
char src [100] = "Copy this string to dst1";
char dst [20];
int p;
p = memcpy2(dst, src, sizeof(dst));
if (p == 0)
printf("The element\n'%s'\nwas copied to \n'%s'\nSuccesfully\n", src, dst);
else
printf("There was an error!!\n\nWhile attempting to copy the elements:\n '%s'\nto\n'%s', \n Memory was overlapping", src, dst);
return 0;
}
The only portable way to determine if two memory ranges overlap is:
int overlap_p(void *a, void *b, size_t n)
{
char *x = a, *y = b;
for (i=0; i<n; i++) if (x+i==y || y+i==x) return 1;
return 0;
}
This is because comparison of pointers with the relational operators is undefined unless they point into the same array. In reality, the comparison does work on most real-world implementations, so you could do something like:
int overlap_p(void *a, void *b, size_t n)
{
char *x = a, *y = b;
return (x<=y && x+n>y) || (y<=x && y+n>x);
}
I hope I got that logic right; you should check it. You can simplify it even more if you want to assume you can take differences of arbitrary pointers.
What you want to check is the position in memory of the source relatively to the destination:
If the source is ahead of the destination (ie. source < destination), then you should start from the end. If the source is after, you start from the beginning. If they are equal, you don't have to do anything (trivial case).
Here are some crude ASCII drawings to visualize the problem.
|_;_;_;_;_;_| (source)
|_;_;_;_;_;_| (destination)
>-----^ start from the end to shift the values to the right
|_;_;_;_;_;_| (source)
|_;_;_;_;_;_| (destination)
^-----< start from the beginning to shift the values to the left
Following a very accurate comment below, I should add that you can use the difference of the pointers (destination - source), but to be on the safe side cast those pointers to char * beforehand.
In your current setting, I don't think that you can check if the operation will fail. Your memcpy prototype prevents you from doing any form of checking for that, and with the rule given above for deciding how to copy, the operation will succeed (outside of any other considerations, like prior memory corruption or invalid pointers).
I don't believe that "attempting to augment memcpy by notifying the user if the copy operation will pass or fail before it begins." is a well-formed notion.
First, memcpy() doesn't succeed or fail in the normal sense. It just copies the data, which might cause a fault/exception if it reads outside the source array or writes outside the destination array, and it might also read or write outside one of those arrays without causing any fault/exception and just silently corrupting data. When I say "memcpy does this" I'm not talking just about the implementation of the C stdlib memcpy, but about any function with the same signature -- it doesn't have enough information to do otherwise.
Second, if your definition of "succeed" is "assuming the buffers are big enough but may be overlapping, copy the data from source to dst without tripping over yourself while copying" -- that is indeed what memmove() does, and it's always possible. Again, there's no "return failure" case. If the buffers don't overlap it's easy, if the source is overlapping the end of the destination then you just copy byte by byte from the beginning; if the source is overlapping the beginning of the destination then you just copy byte by byte from the end. Which is what memmove() does.
Third, when writing this kind of code, you have to be very careful about overflow cases for your pointer arithmetic (including addition, subtraction, and array indexing). In val = abs(ptr1 - ptr2), ptr1 - ptr2 could be a very large number, and it will be unsigned, so abs() won't do anything to it, and int is the wrong type to store that in. Just so you know.

keeping track of how much memory malloc has allocated

After a quick scan of related questions on SO, I have deduced that there's no function that would check the amount of memory that malloc has allocated to a pointer. I'm trying to replicate some of std::string basic functionality (mainly dynamic size) using simple char*'s in C and don't want to call realloc all the time. I guess I'll need to keep track of how much memory has been allocated. In order to do that, I'm considering creating a typedef that will contain the string itself and an integer with the amount of memory currently allocated, something like this:
typedef struct {
char * str;
int mem;
} my_string_t;
Is that an optimal solution, or perhaps you can suggest something that will bear better results? Thanks in advance for your help.
You will want to allocate the space for both the length and the string in the same block of memory. This may be what you intended with your struct, but you have reserved space for only a pointer to the string.
There must be space allocated to contain the characters of the string.
For example:
typedef struct
{
int num_chars;
char string[];
} my_string_t;
my_string_t * alloc_my_string(char *src)
{
my_string_t * p = NULL;
int N_chars = strlen(src) + 1;
p = malloc( N_chars + sizeof(my_string_t));
if (p)
{
p->num_chars = N_chars;
strcpy(p->string, src);
}
return p;
}
In my example, to access the pointer to your string, you address the string member of the my_string_t:
my_string_t * p = alloc_my_string("hello free store.");
printf("String of %d bytes is '%s'\n", p->num_chars, p->string);
Be careful to realize that you are obtaining the pointer for the string as a consequence of allocating space to store the characters. The resource you are allocating is the storage for the characters, the pointer obtained is a reference to the allocated storage.
In my example, the memory allocated is laid out sequentially as follows:
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| 00 | 00 | 00 | 11 | 'h'| 'e'| 'l'| 'l'| 'o'| 20 | 'f'| 'r'| 'e'| 'e'| 20 | 's'| 't'| 'o'| 'r'| 'e'| '.'| 00 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
^^ ^
|| |
p| |
p->num_chars p->string
Notice that the value of p->string is not stored in the allocated memory, it is four bytes from the beginning of the allocated memory, immediately subsequent to the (presumed 32-bit, four-byte) integer.
Your compiler may require that you declare the flexible C array as:
typedef struct
{
int num_chars;
char string[0];
} my_string_t;
but the version lacking the zero is supposedly C99-compliant.
You can accomplish the equivalent thing with no array member as follows:
typedef struct
{
int num_chars;
} mystr2;
char * str_of_mystr2(mystr2 * ms)
{
return (char *)(ms + 1);
}
mystr2 * alloc_mystr2(char *src)
{
mystr2* p = NULL;
size_t N_chars = strlen(src) + 1;
if (N_chars num_chars = (int)N_chars;
strcpy(str_of_mystr2(p), src);
}
return p;
}
printf("String of %d bytes is '%s'\n", p->num_chars, str_of_mystr2 (p));
In this second example, the value equivalent to p->string is calculated by str_of_mystr2(). It will have approximately the same value as the first example, depending on how the end of structs are packed by your compiler settings.
While some would suggest tracking the length in a size_t I would look up some old Dr. Dobb's article on why I disagree. Supporting values greater than INT_MAX is of doubtful value to your program's correctness. By using an int, you can write assert(p->num_chars >= 0); and have that test something. With an unsigned, you would write the equivalent test something like assert(p->num_chars < UINT_MAX / 2); As long as you write code which contains checks on run-time data, using a signed type can be useful.
On the other hand, if you are writing a library which handles strings in excess of UINT_MAX / 2 characters, I salute you.
This is the obvious solution. And while you are at it, you might want to have a struct member that maintains the amount of allocated memory actually in use. This will avoid having to call strlen() all the time, and would enable you to support non null-terminated strings, as the C++ std::string class does.
That is how it was done in the Pleistocene, and that's how you should do it today. You are dead on the money that malloc does not offer any portable, supported, mechanism to query the size of an allocated block.
A more common way is to wrap malloc (and realloc) and keep a list of sizes and pointers
That way you don't need to change any string functions.
write wrapper functions. If you are using malloc then you should do that anyway.
For an example look in "writing solid code"
I think you could use malloc_usable_size.

Resources