(C) Recursive strcpy() that takes only 1 parameter - c

Let me be clear from the get go, this is not a dupe, I'll explain how.
So, I tasked myself to write a function that imitates strcpy but with 2 conditions:
it needs to be recursive
it must take a single parameter (which is the original string)
The function should return a pointer to the newly copied string. So this is what I've tried so far:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char * my_strcpy(char *original);
int main(void) {
char *string = my_strcpy("alpine");
printf("string = <%s>\n", string);
return 0;
}
char * my_strcpy(char *original){
char *string = (char *)malloc(10);
if(*original == '\0') {
return string;
}
*string++ = *original;
my_strcpy(original + 1);
}
The problem is somewhat obvious, string gets malloc-ed every time my_strcpy() is called. One of the solutions I could think of would be to allocate memory for string only the first time the function is called. Since I'm allowed to have only 1 parameter, only thing I could think of was to check the call stack, but I don't know whether that's allowed and it does feel like cheating.
Is there a logical solution to this problem?

You wrote it as tail recursive, but I think without making the function non-reentrant your only option is going to be to make the function head recursive and repeatedly call realloc on the return value of the recursive call to expand it, then add in one character. This has the same problem that just calling strlen to do the allocation has: it does something linear in the length of the input string in every recursive call and turns out to be an implicitly n-squared algorithm (0.5*n*(n+1)). You can improve it by making the amortized time complexity better, by expanding the string by a factor and only growing it when the existing buffer is full, but it's still not great.
There's a reason you wouldn't use recursion for this task (which you probably know): stack depth will be equal to input string length, and a whole stack frame pushed and a call instruction for every character copied is a lot of overhead. Even so, you wouldn't do it recursively with a single argument if you were really going to do it recursively: you'd make a single-argument function that declares some locals and calls a recursive function with multiple arguments.
Even with the realloc trick, it'll be difficult or impossible to count the characters in the original as you go so that you can call realloc appropriately, remembering that other stdlib "str*" functions are off limits because they'll likely make your whole function n-squared, which I assumed we were trying to avoid.
Ugly tricks like verifying that the string is as long as a pointer and replacing the first few characters with a pointer by memcpy could be used, making the base case for the recursion more complicated, but, um, yuck.

Recursion is a technique for analysing problems. That is, you start with the problem and think about what the recursive structure of a solution might be. You don't start with a recursive structure and then attempt to shoe-horn your problem willy-nilly into it.
In other words, it's good to practice recursive analysis, but the task you have set yourself -- to force the solution to have the form of a one-parameter function -- is not the way to do that. If you start contemplating global or static variables, or extracting runtime context by breaking into the call stack, you have a pretty good hint that you have not yet found the appropriate recursive analysis.
That's not to say that there is not an elegant recursive solution to your problem. There is one, but before we get to it, we might want to abstract away a detail of the problem in order to provide some motivation.
Clearly, if we have a contiguous data structure already in memory, making a contiguous copy is not challenging. If we don't know how big it is, we can do two traverses: one to find its size, after which we can allocate the needed memory, and another one to do the copy. Both of those tasks are simple loops, which is one form of recursion.
The essential nature of a recursive solution is to think about how to step from a problem to a (slightly) simpler or smaller problem. Or, more commonly, a small number of smaller or simpler problems.
That's the nature of one of the most classic recursive problems: sorting a sequence of numbers. The basic structure: divide the sequence into two roughly equal parts; sort each part (the recursive step) and put the results back together so that the combination is sorted. That basic outline has at least two interesting (and very different) manifestations:
Divide the sequence arbitrarily into two almost equal parts either by putting alternate elements in alternate parts or by putting the first half in one part and the rest in the other part. (The first one will work nicely if we don't know in advance how big the sequence is.) To put the sorted parts together, we have to interleave ("merge") the. (This is mergesort).
Divide the sequence into two ranges by estimating the middle value and putting all smaller values into one part and all larger values into the other part. To put the sorted parts together, we just concatenate them. (This is quicksort.)
In both cases, we also need to use fact that a single-element sequence is (trivially) sorted so no more processing needs to be done. If we divide a sequence into two parts often enough, ensuring that neither part is empty, we must eventually reach a part containing one element. (If we manage to do the division accurately, that will happen quite soon.)
It's interesting to implement these two strategies using singly-linked lists, so that the lengths really are not easily known. Both can be implemented this way, and the implementations reveal something important about the nature of sorting.
But let's get back to the much simpler problem at hand, copying a sequence into a newly-allocated contiguous array. To make the problem more interesting, we won't assume that the sequence is already stored contiguously, nor that we can traverse it twice.
To start, we need to find the length of the sequence, which we can do by observing that an empty sequence has length zero and any other sequence has one more element than the subsequence starting after the first element (the "tail" of the sequence.)
Length(seq):
If seq is empty, return 0.
Else, return 1 + Length(Tail(seq))
Now, suppose we have allocated storage for the copy. Now, we can copy by observing that an empty sequence is fully copied, and any other sequence can be copied by placing the first element into the allocated storage and then cipying the tail of the sequence into the storage starting at the second position: (and this procedure logically takes two arguments)
Copy(destination, seq):
If seq is not empty:
Put Head(seq) into the location destination
Call Copy (destination+1, Tail(seq))
But we can't just put those two procedures together, because that would traverse the sequence twice, which we said we couldn't do. So we need to somehow nest these algorithms.
To do that, we have to start by passing the accumulated length down through the recursion so that we can use it at to allocate the storage when we know how big the object. Then, on the way back, we need to copy the element we counted on the way down:
Copy(seq, length):
If seq is not empty:
Set item to its first element (that is, Head(seq))
Set destination to Copy(Tail(seq), length + 1)
Store item at location destination - 1
Return destination - 1
Otherwise: (seq is empty)
Set destination to Allocate(length)
# (see important note below)
Return destination + length
To correctly start the recursion, we need to pass in 0 as the initial length. It's bad style to force the user to insert "magic numbers", so we would normally wrap the function with a single-argument driver:
Strdup(seq):
Return Copy (seq, 0)
Important Note: if this were written in C using strings, we would need to NUL-terminate the copy. That means allocating length+1 bytes, rather than length, and then storing 0 at destination+length.

You didn't say we couldn't use strcat.
So here is logical (although somewhat useless) answer by using recursion to do nothing other than chop off the last character and adding it back on again.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char * my_strcpy(char *original);
int main(void) {
char *string = my_strcpy("alpine");
printf("string = <%s>\n", string);
return 0;
}
char * my_strcpy(char *original){
if(*original == '\0') {
return original;
}
int len = strlen(original);
char *string = (char *)malloc(len+1);
char *result = (char *)malloc(len+1);
string[0] = result[0] = '\0';
strcat (string, original);
len--;
char store[2] = {string[len] , '\0'}; // save last char
string[len] = '\0'; // cut it off
strcat (result, my_strcpy(string));
strcat (result, store); // add it back
return result;
}

Related

How to save memory in an array of which many elements are always 0?

I have a 2tensor in C that looks like:
int n =4;
int l =5;
int p =6;
int q=2;
I then initialize each element of T
//loop over each of the above indices
T[n][l][p][q]=...
However, many of them are zero and there are symmetries such as.
T[4][3][2][1]=-T[3][4][2][1]
How can I save memory on the elements of T which are zero? Ideally I would like to place something like NULL in those positions so they use 0 instead of 8 bytes. Also, later on in the calculation I can check if they are zero or not by checking if they are equal to NULL
How do I implicitly include those symmetries in T with using excess memory?
Edit: the symmetry can perhaps be fixed with a different implementation. But what about the zeros? Is there any implementation to not have them waste memory?
You cannot influence the size of any variable by a value you write to it.
If you want to save memory you have not only to not use it, you have to not define a variable using it.
If you do not define a variable, then you have to not use it ever.
Then you have saved memory.
This is of course obvious.
Now, how to apply that to your problem.
Allow me to simplify, for one because you did not give enough information and explanation, at least not for me to understand every detail. For another, to keep the explanation simple.
So I hope that it suffices if I solve the following problem for you, which I think is kind of the little brother of your problem.
I have a large array in C (not really large, lets say N entries, with N==20).
But for special reasons, I will never need to actually read and write any even indices, they should act as if they contain 0, but I want to save the memory used by them.
So actually I want to only use M of the entries, with M*2==N.
So instead of
int Array[N]; /* all the theoretical elements */
I define
int Array[M]; /* only the actually used elements */
Of course I cannot access any of the elements which are not needed and it will not really be necessary.
But for the logic of my program, I want to be able to program as if I could access them, but be sure that they will always every only read 0 and ignore any written value.
So what I do is wrapping all accesses to the array.
int GetArray(int index)
{
if (index & 1)
{
/* odd, I need to really access the array,
but at a calculated index */
return Array[index/2];
} else
{
/* even, always 0 */
return 0;
}
}
void SetArray(int index, int value)
{
if (index & 1)
{
/* odd, I need to really access the array,
but at a calculated index */ */
Array[index/2] = value;
} else
{
/* even, no need to store anything, stays always "0" */
}
}
So I can read and write as if the array were twice as large, but guarantee not to ever use the faked elements.
And by mapping the indices as
actualindex = wantindex / 2
I ensure that I do not access beyond the size of the actually existing array.
Porting this concept now to the more complicated setup you have described is your job. You know all the details, you can test wether everything works.
I recommend to extend GetArray() and SetArray() by checks on the resulting index, to make sure that it is never outside of the actual array.
You can also add all kinds of self checks to verify that all your rules and expectations are met.

Does Ruby create a copy when referencing an array with its [..] (slice) method?

I want to loop on a slice of an array. I have basically two main options.
ar.each_with_index{|e,i|
next if i < start_ind
break if i > end_ind
foo(e)
#maybe more code...
}
Another option, which I think is more elegant, would be to run:
ar[start_ind..end_ind].each{|e|
foo(e)
#maybe more code...
}
My concern is Ruby potentially creating a huge array under the hood and doing a lot of memory allocation. Or is there something "smarter" at play that does not create a copy?
You could do a loop of index values... not as elegant as your second solution but economical.
(start_ind..end_ind).each do |index|
foo(ar[index])
# maybe more code
end
You may want to refer to methods' C source code, but it takes a bit of time to read the code. May I help you in this
First: each_index
It's source code in C is tricky, but boils down to something similar to 'each' which looks like
VALUE rb_ary_each(VALUE ary) {
long i;
RETURN_SIZED_ENUMERATOR(ary, 0, 0, ary_enum_length);
for (i=0; i<RARRAY_LEN(ary); i++) {
rb_yield(RARRAY_AREF(ary, i));
}
return ary;
}
It does not create any other array internally by itself. What it effectively does is it simply loops through elements, takes each element and passes it into the block provided (rb_yield part). What's actually inside the block that you provide is a different story.
Second: [...].each
You actually have to notice it is two function calls. The second being 'each' is of little interest to us since it is described above The first function call is '[]'. Logically you expect it to output an subarray as variable, which has to be stored at least temporary.
Let's verify. Source code for C is rather long, but the piece of the greatest importance to you is:
VALUE rb_ary_aref(int argc, const VALUE *argv, VALUE ary) {
// some code
if (argc == 2) {
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
if (beg < 0) {
beg += RARRAY_LEN(ary);
}
return rb_ary_subseq(ary, beg, len);
}
// some more code
}
It's actually for a function call like ar[start_ind, end_ind] and not ar[start_ind..end_ind]. The difference is immaterial, but this way is easier to understand.
The thing that answers your question is called "rb_ary_subseq". As you may guess from its name or learn from its source, it actually does create a new array. So it would create a copy under the hood of size equal or less of the array given.
You'd want to consider computational cost of functional calls, but the question is about memory.

Simulating an appendable array in C...Kinda

I'm trying to write a C code that does what a chunk of python code I have written does.
I tried to keep all its lines simple, but there still turns out to be some stuff I wrote that C cannot do.
My code will take an array of coordinates and replace/add items to that array over time.
For example:
[[[0,1]],[[2,1],[1,14]],[[1,1]]] ==> [[[0,1]],[[2,1],[1,14],[3,2]],[[1,1]]]
or
[[[0,1]],[[2,1],[1,14]],[[1,1]]] ==> [[[0,1]],[[40]],[[1,1]]]
I think this is impossible in C, but how about instead using strings to represent the lists so they can be added to? Like this:
[['0$1$'],['2$1$1$14$'],['1$1$']] ==> [['0$1$'],['2$1$1$14$3$2'],['1$1$']]
and
[['0$1$'],['2$1$1$14$'],['1$1$']] ==> [['0$1$'],['40$'],['1$1$']]
In my code, I know each array in the array is either one or more pairs of numbers or just one number so this method works for me.
Can C do this and if so please provide an example.
If you know that both the length of a string and the number of said strings won't exceed a certain value, you can do this:
char Strings[NUMBER_OF_STRINGS][MAX_STRING_LENGTH + 1]; // for the null terminator
It would then be a good practice to zero all this memory:
for (size_t i = 0; i < NUMBER_OF_STRINGS; i++)
memset(Strings[i], 0, MAX_STRING_LENGTH + 1);
And if you want to append a string, use strcat:
strcat(Strings[i], SourceString);
A safer (though slightly more costly since you need to call strlen which walks the entire string) solution would be:
strncat(Strings[i], SourceString, MAX_STRING_LENGTH - strlen(Strings[i]));

Most efficient way to check if elements in an array have changed

I have an array in c and I need to perform some operation only if the elements in an array have changed. However the time and memory taken for this is very important. I realized that an efficient way to do this would probably be to hash all the elements of the array and compare the result with the previous result. If they match that means the elements dont change. I would however like to know if this is the most efficient way of doing things. Also since the array is only 8 bytes long(1 byte for each element) which hashing function would be least time consuming?
The elements in an array are actually being received from another microcontroller. So they may or may not change depending on whether what the other micro-controller measured is the same or not
If you weren't tied to a simple array, you could create a "MRU" List of structures where the structure could contain a flag that indicates if the item was changed since it was last inspected.
Every time an item changes set the "changed flag" and move it to the head of the list. When you need to check for the changed items you traverse the list from the head and unset the changed flags and stopping at the first element with its change flag not set.
Sorry, I missed the part about the array being only 8 bytes long. With that info and with the new info from your edit, I'm thinking the previous suggestion is not ideal.
If the array is only 8-bytes long why not just cache a copy of the previous array and compare it to the new array received?
Below is a clarification of my comment about "shortcutting" the compares. How you implement this would depend on what the sizeof(int) is on the platform used.
Using a 64-bit integer you could get away with one compare to determine if the array has changed. For example:
#define ARR_SIZE 8
unsigned char cachedArr[ARR_SIZE];
unsigned char targetArr[ARR_SIZE];
unsigned int *ic = (unsigned int *)cachedArr;
unsigned int *it = (unsigned int *)targetArr;
// This assertion needs to be true for this implementation to work
// correctly.
assert(sizeof(int) == sizeof(cachedArr));
/*
** ...
** assume initialization and other suff here
** leading into the main loop that is receiving the target array data.
** ...
*/
if (*ic != *it)
{
// Target array has changed; find out which element(s) changed.
// If you only cared that there was a change and did not care
// to know which specific element(s) had changed you could forego
// this loop altogether.
for (int i = 0; i < ARR_SIZE; i++)
{
if (cachedArr[i] != targetArr[i])
{
// Do whatever needs to be done based on the i'th element
// changed
}
}
// Cache the array again since it has changed.
memcpy(cachedArr, targetArr, sizeof(cachedArr));
}
// else no change to the array
If the native integer size was smaller than 64-bit you could use the same theory, but you'd have to loop over the array sizeof(cachedArr) / sizeof(unsigned int) times; and there would be a worst-case scenario involved (but isn't there always) if the change was in the last chunk tested.
It should be noted that with doing any char to integer type casting you may need to take into consideration alignment (if the char data is aligned to the appropriate word-size boundary).
Thinking further upon this however, it might be better altogether to just unroll the loop yourself and do:
if (cachedArr[0] != targetArr[0])
{
doElement0ChangedWork();
}
if (cachedArr[1] != targetArr[1])
{
doElement1ChangedWork();
}
if (cachedArr[2] != targetArr[2])
{
doElement2ChangedWork();
}
if (cachedArr[3] != targetArr[3])
{
doElement3ChangedWork();
}
if (cachedArr[4] != targetArr[4])
{
doElement4ChangedWork();
}
if (cachedArr[5] != targetArr[5])
{
doElement5ChangedWork();
}
if (cachedArr[6] != targetArr[6])
{
doElement6ChangedWork();
}
if (cachedArr[7] != targetArr[7])
{
doElement7ChangedWork();
}
Again, depending on whether or not knowing which specific element(s) changed that could be tightened up. This would result in more instruction memory needed but eliminates the loop overhead (the good old memory versus speed trade-off).
As with anything time/memory related test, measure, compare, tweak and repeat until desired results are achieved.
only if the elements in an array have changed
Who else but you is going to change them? You can just keep track of whether you've made a change since the last time you did the operation.
If you don't want to do that (perhaps because it'd require recording changes in too many places, or because the record-keeping would take too much time, or because another thread or other hardware is messing with the array), just save the old contents of the array in a separate array. It's only 8 bytes. When you want to see whether anything has changed, compare the current array to the copy element-by-element.
As others have said, the elements will only change if the code changed them.
Maybe this data can be changed by another user? Otherwise you would know that you had changed an entry.
As far as the hash function, there are only 2^8 = 256 different values that this array can take. A hash function won't really help here. Also, a hash function has to be computed, which costs memory so I don't think that will work for your application.
I would just compare bits until you find one has changed. If one has changed, the you will check 4 bits on average before you that your array has changed (assuming that each bit is equally likely to change).
If one hasn't changed, that is worst case scenario and you will have to check all eight bits to conclude that none have changed.
If array only 8 bytes long, you can treat it as if it is a long long type number. Suppose original array is char data[8].
long long * pData = (logn long *)data;
long long olddata = *pData;
if ( olddata != *pData )
{
// detect which one changed
}
I mean, this way you operate all data in one shot, this is much faster than access each element using index. hash is slower n this case.
If it is byte oriented with only eight elements, doing an XOR function would be more efficient than any other comparison.
If ((LocalArray[0] ^ received Array [0]) & (LocalArray[1] ^ received Array [1]) & ...)
{
//Yes it is changed
}

Remove element from c array

I need to remove a specific element from an array, that array is dynamically resized in order to store an unknown number of elements with realloc.
To control the allocated memory and defined elements, I have two other variables:
double *arr = NULL;
int elements = 0;
int allocated = 0;
After some elements being placed in the array, I may need to remove some of them. All texts that I've found says to use memmove and reduce the variables by the number of elements removed.
My doubt is if this method is secure and efficient.
I think this is the most efficient function you can use (memcpy is not an option) regarding secured - you will need to make sure that the parameters are OK, otherwise bad things will happen :)
Using memmove is certainly efficient, and not significantly less secure than iterating over the array. To know how secure the implementation actually is, we'd need to see the code, specifically the memmove call and how return results from realloc are being checked.
If you get your memmove wrong, or don't check any realloc returns, expect a crash.
In principle, assuming you calculate your addresses and lengths correctly, you can use memmove, but note that if you overwrite one or more elements with the elements at higher indexes, and these overwritten elements were structs that contained pointers to allocated memory, you could produce leaks.
IOW, you must first take care of properly disposing the elements you are overwriting before you can use memmove. How you dispose them depends on what they represent. If they are merely structs that contain pointers into other structures, but they don't "own" the allocated memory, nothing happens. If the pointers "own" the memory, it must be deallocated first.
The performance of memmove() and realloc() can be increased by data partitioning. By data partitioning I mean to use multiple array chunk rather than one big array.
Apart from memmove(), I found memory swaping is efficient way. But there is drawback. The array order may be changed in this way.
int remove_element(int*from, int total, int index) {
if(index != (total-1))
from[index] = from[total-1];
return total-1; // return the number of elements
}
Interestingly array is randomly accessible by the index. And removing randomly an element may impact the indexes of other elements as well. If this remove is done in a loop traversal on the array, then the reordering may case unexpected results.
One way to fix that is to use a is_blank mask array and defer removal.
int remove_element(int*from, int total, int*is_valid, int index) {
is_blank[index] = 1;
return total; // **DO NOT DECREASE** the total here
}
It may create a sparse array. But it is also possible to fill it up as new elements are added in the blank positions.
Again, it is possible to make the array compact in the following efficient swap algorithm.
int sparse_to_compact(int*arr, int total, int*is_valid) {
int i = 0;
int last = total - 1;
// trim the last blank elements
for(; last >= 0 && is_blank[last]; last--); // trim blank elements from last
// now we keep swapping the blank with last valid element
for(i=0; i < last; i++) {
if(!is_blank[i])
continue;
arr[i] = arr[last]; // swap blank with the last valid
last--;
for(; last >= 0 && is_blank[last]; last--); // trim blank elements
}
return last+1; // return the compact length of the array
}
Note that the algorithm above uses swap and it changes the element order. May be it is preferred/safe to be used outside of some loop operation on the array. And if the indices of the elements are saved somewhere, they need to be updated/rebuilt as well.

Resources