strchr(), APT_String and subtraction operation - c

I'm working with a program written in C that involves comparing hypehenated surnames. For example, it might compare Mary Jay-Blige to Mary Kay-Blige.
The code that finds the hyphen and sets a variable to it's position is:
APT_String LAST_NAME
char * p_ich;
int iPosHyphen;
p_ich = strchr(LAST_NAME,'-');
iPosHyphen = p_ich-LAST_NAME+1;
where APT_String is a data type for IBM's DataStage.
I inherited the above code, and it appears to "work", but I would like some clarification on the p_ich-LAST_NAME+1 operation.
Namely, if strchr() returns the location of the first '-', how is C handling this arithmetic?
If I call cout<<p_ich;, I get -Blige. So I guess it returns the remainder of the string once the specified char is found?

Yes, strchr returns the address of the first occurence (not the index). So you subtract the original string (address) from that to get the position of the hyphen. But here the +1 gets you the first position (index) after the hyphen.
Such that p_ich[iPosHyphen] == 'B'.

This is very basic C pointer arithmetic and you can easily find a lot of information about it.
Subtracting one pointer from another yields distance between their indices as if they were part of the same array. In your example it will be distance between *p_ich* and *LAST_NAME*.
With standard char type distance will be equal to difference between memory addresses but in general:
ptr1-ptr2 == ((unsigned long)ptr - (unsigned long)ptr2)/sizeof(*ptr)

Related

(C) Recursive strcpy() that takes only 1 parameter

Let me be clear from the get go, this is not a dupe, I'll explain how.
So, I tasked myself to write a function that imitates strcpy but with 2 conditions:
it needs to be recursive
it must take a single parameter (which is the original string)
The function should return a pointer to the newly copied string. So this is what I've tried so far:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char * my_strcpy(char *original);
int main(void) {
char *string = my_strcpy("alpine");
printf("string = <%s>\n", string);
return 0;
}
char * my_strcpy(char *original){
char *string = (char *)malloc(10);
if(*original == '\0') {
return string;
}
*string++ = *original;
my_strcpy(original + 1);
}
The problem is somewhat obvious, string gets malloc-ed every time my_strcpy() is called. One of the solutions I could think of would be to allocate memory for string only the first time the function is called. Since I'm allowed to have only 1 parameter, only thing I could think of was to check the call stack, but I don't know whether that's allowed and it does feel like cheating.
Is there a logical solution to this problem?
You wrote it as tail recursive, but I think without making the function non-reentrant your only option is going to be to make the function head recursive and repeatedly call realloc on the return value of the recursive call to expand it, then add in one character. This has the same problem that just calling strlen to do the allocation has: it does something linear in the length of the input string in every recursive call and turns out to be an implicitly n-squared algorithm (0.5*n*(n+1)). You can improve it by making the amortized time complexity better, by expanding the string by a factor and only growing it when the existing buffer is full, but it's still not great.
There's a reason you wouldn't use recursion for this task (which you probably know): stack depth will be equal to input string length, and a whole stack frame pushed and a call instruction for every character copied is a lot of overhead. Even so, you wouldn't do it recursively with a single argument if you were really going to do it recursively: you'd make a single-argument function that declares some locals and calls a recursive function with multiple arguments.
Even with the realloc trick, it'll be difficult or impossible to count the characters in the original as you go so that you can call realloc appropriately, remembering that other stdlib "str*" functions are off limits because they'll likely make your whole function n-squared, which I assumed we were trying to avoid.
Ugly tricks like verifying that the string is as long as a pointer and replacing the first few characters with a pointer by memcpy could be used, making the base case for the recursion more complicated, but, um, yuck.
Recursion is a technique for analysing problems. That is, you start with the problem and think about what the recursive structure of a solution might be. You don't start with a recursive structure and then attempt to shoe-horn your problem willy-nilly into it.
In other words, it's good to practice recursive analysis, but the task you have set yourself -- to force the solution to have the form of a one-parameter function -- is not the way to do that. If you start contemplating global or static variables, or extracting runtime context by breaking into the call stack, you have a pretty good hint that you have not yet found the appropriate recursive analysis.
That's not to say that there is not an elegant recursive solution to your problem. There is one, but before we get to it, we might want to abstract away a detail of the problem in order to provide some motivation.
Clearly, if we have a contiguous data structure already in memory, making a contiguous copy is not challenging. If we don't know how big it is, we can do two traverses: one to find its size, after which we can allocate the needed memory, and another one to do the copy. Both of those tasks are simple loops, which is one form of recursion.
The essential nature of a recursive solution is to think about how to step from a problem to a (slightly) simpler or smaller problem. Or, more commonly, a small number of smaller or simpler problems.
That's the nature of one of the most classic recursive problems: sorting a sequence of numbers. The basic structure: divide the sequence into two roughly equal parts; sort each part (the recursive step) and put the results back together so that the combination is sorted. That basic outline has at least two interesting (and very different) manifestations:
Divide the sequence arbitrarily into two almost equal parts either by putting alternate elements in alternate parts or by putting the first half in one part and the rest in the other part. (The first one will work nicely if we don't know in advance how big the sequence is.) To put the sorted parts together, we have to interleave ("merge") the. (This is mergesort).
Divide the sequence into two ranges by estimating the middle value and putting all smaller values into one part and all larger values into the other part. To put the sorted parts together, we just concatenate them. (This is quicksort.)
In both cases, we also need to use fact that a single-element sequence is (trivially) sorted so no more processing needs to be done. If we divide a sequence into two parts often enough, ensuring that neither part is empty, we must eventually reach a part containing one element. (If we manage to do the division accurately, that will happen quite soon.)
It's interesting to implement these two strategies using singly-linked lists, so that the lengths really are not easily known. Both can be implemented this way, and the implementations reveal something important about the nature of sorting.
But let's get back to the much simpler problem at hand, copying a sequence into a newly-allocated contiguous array. To make the problem more interesting, we won't assume that the sequence is already stored contiguously, nor that we can traverse it twice.
To start, we need to find the length of the sequence, which we can do by observing that an empty sequence has length zero and any other sequence has one more element than the subsequence starting after the first element (the "tail" of the sequence.)
Length(seq):
If seq is empty, return 0.
Else, return 1 + Length(Tail(seq))
Now, suppose we have allocated storage for the copy. Now, we can copy by observing that an empty sequence is fully copied, and any other sequence can be copied by placing the first element into the allocated storage and then cipying the tail of the sequence into the storage starting at the second position: (and this procedure logically takes two arguments)
Copy(destination, seq):
If seq is not empty:
Put Head(seq) into the location destination
Call Copy (destination+1, Tail(seq))
But we can't just put those two procedures together, because that would traverse the sequence twice, which we said we couldn't do. So we need to somehow nest these algorithms.
To do that, we have to start by passing the accumulated length down through the recursion so that we can use it at to allocate the storage when we know how big the object. Then, on the way back, we need to copy the element we counted on the way down:
Copy(seq, length):
If seq is not empty:
Set item to its first element (that is, Head(seq))
Set destination to Copy(Tail(seq), length + 1)
Store item at location destination - 1
Return destination - 1
Otherwise: (seq is empty)
Set destination to Allocate(length)
# (see important note below)
Return destination + length
To correctly start the recursion, we need to pass in 0 as the initial length. It's bad style to force the user to insert "magic numbers", so we would normally wrap the function with a single-argument driver:
Strdup(seq):
Return Copy (seq, 0)
Important Note: if this were written in C using strings, we would need to NUL-terminate the copy. That means allocating length+1 bytes, rather than length, and then storing 0 at destination+length.
You didn't say we couldn't use strcat.
So here is logical (although somewhat useless) answer by using recursion to do nothing other than chop off the last character and adding it back on again.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char * my_strcpy(char *original);
int main(void) {
char *string = my_strcpy("alpine");
printf("string = <%s>\n", string);
return 0;
}
char * my_strcpy(char *original){
if(*original == '\0') {
return original;
}
int len = strlen(original);
char *string = (char *)malloc(len+1);
char *result = (char *)malloc(len+1);
string[0] = result[0] = '\0';
strcat (string, original);
len--;
char store[2] = {string[len] , '\0'}; // save last char
string[len] = '\0'; // cut it off
strcat (result, my_strcpy(string));
strcat (result, store); // add it back
return result;
}

Number sequences length, element first and last indexes in array

Im beginner in programming. My question is how to count number sequences in input array? For example:
input array = [0,0,1,1,1,1,1,1,0,1,0,1,1,1]
output integer = 3 (count one-sequences)
And how to calculate number sequences first and last indexes in input array? For example:
input array = [0,0,1,1,1,1,1,1,0,1,0,1,1,1]
output array = [3-8,10-10,12-14] (one first and last place in a sequence)
I tried to solve this problem in C with arrays. Thank you!
Your task is a good exercise to familiarize you with the 0-based array indexes used in C, iterating arrays, and adjusting the array indexes to 1-based when the output requires.
Taking the first two together, 0-based arrays in C, and iterating over the elements, you must first determine how many elements are in your array. This is something that gives new C programmers trouble. The reason being is for general arrays (as opposed to null-terminated strings), you must either know the number of elements in the array, or determine the number of elements within the scope where the array was declared.
What does that mean? It means, the only time you can use the sizeof operator to determine the size of an array is inside the same scope (i.e. inside the same block of code {...} where the array is declared. If the array is passed to a function, the parameter passing the array is converted (you may see it referred to as decays) to a pointer. When that occurs, the sizeof operator simply returns the size of a pointer (generally 8-bytes on x86_64 and 4-bytes on x86), not the size of the array.
So now you know the first part of your task. (1) declare the array; and (2) save the size of the array to use in iterating over the elements. The first you can do with int array[] = {0,0,1,1,1,1,1,1,0,1,0,1,1,1}; and the second with sizeof array;
Your next job is to iterate over each element in the array and test whether it is '0' or '1' and respond appropriately. To iterate over each element in the array (as opposed to a string), you will typically use a for loop coupled with an index variable ( 'i' below) that will allow you to access each element of the array. You may have something similar to:
size_t i = 0;
...
for (i = 0; i< sizeof array; i++) {
... /* elements accessed as array[i] */
}
(note: you are free to use int as the type for 'i' as well, but for your choice of type, you generally want to ask can 'i' ever be negative here? If not, a choice of a type that handles only positive number will help the compiler warn if you are misusing the variable later in your code)
To build the complete logic you will need to test for all changes from '0' to '1' you may have to use nested if ... else ... statements. (You may have to check if you are dealing with array[0] specifically as part of your test logic) You have 2 tasks here. (1) determine if the last element was '0' and the current element '1', then update your sequence_count++; and (2) test if the current element is '1', then store the adjusted index in a second array and update the count or index for the second array so you can keep track of where to store the next adjusted index value. I will let you work on the test logic and will help if you get stuck.
Finally, you need only print out your final sequence_count and then iterate over your second array (where you stored the adjusted index values for each time array was '1'.
This will get you started. Edit your question and add your current code when you get stuck and people can help further.

Why does C support negative array indices?

From this post in SO, it is clear that C supports negative indices.
Why support such a potential memory violation in a program?
Shouldn't the compiler throw a Negative Index warning at least? (am using GCC)
Or is this calculation done in runtime?
EDIT 1: Can anybody hint at its uses?
EDIT 2: for 3.) Using counters of loops in [] of arrays/pointers indicates Run-time Calculation of Indices.
The calculation is done at runtime.
Negative indices don't necessarily have to cause a violation, and have their uses.
For example, let's say you have a pointer that is currently pointing to the 10th element in an array. Now, if you need to access the 8th element without changing the pointer, you can do that easily by using a negative index of -2.
char data[] = "01234567890123456789";
char* ptr = &data[9];
char c = ptr[-2]; // 7
Here is an example of use.
An Infinite Impulse Response filter is calculated partially from recent previous output values. Typically, there will be some array of input values and an array where output values are to be placed. If the current output element is yi, then yi may be calculated as yi = a0•xi + a1•xi–1 +a2•yi–1 +a3•yi–2.
A natural way to write code for this is something like:
void IIR(float *x, float *y, size_t n)
{
for (i = 0; i < n; ++i)
y[i] = a0*x[i] + a1*x[i-1] + a2*y[i-1] + a3*y[i-2];
}
Observe that when i is zero, y[i-1] and y[i-2] have negative indices. In this case, the caller is responsible for creating an array, setting the initial two elements to “starter values” for the output (often either zero or values held over from a previous buffer), and passing a pointer to where the first new value is to be written. Thus, this routine, IRR, normally receives a pointer into the middle of an array and uses negative indices to address some elements.
Why support such a potential memory violation in a program?
Because it follows the pointer arithmetic, and may be useful in certain case.
Shouldn't the compiler throw a Negative Index warning at least? (am using GCC)
The same reason the compiler won't warn you when you access array[10] when the array has only 10 elements. Because it leaves that work to the programmers.
Or is this calculation done in runtime?
Yes, the calculation is done in runtime.
Elaborating on Taymon's answer:
float arr[10];
float *p = &arr[2];
p[-2]
is now perfectly OK. I haven't seen a good use of negative indices, but why should the standard exclude it if it is in general undecidable whether you are pointing outside of a valid range.
OP: Why support ... a potential memory violation?
It has potential uses, for as OP says it is a potential violation and not certain memory violation. C is about allowing users to do many things, include all the rope they need to hang themselves.
OP: ... throw a Negative Index warning ...
If concerned, use unsigned index or better yet, use size_t.
OP ... calculation done in runtime?
Yes, quite often as in a[i], where i is not a constant.
OP: hint at its uses?
Example: one is processing a point in an array of points (Pt) and want to determine if the mid-point is a candidate for removal as it is co-incident. Assume the calling function has already determined that the Mid is neither the first nor last point.
static int IsCoincident(Pt *Mid) {
Pt *Left = &Mid[-1]; // fixed negative index
Pt *Right = &Mid[+1];
return foo(Left, Mid, Right);
}
Array subscripts are just syntactic sugar for dereferencing of pointers to arbitrary places in memory. The compiler can't warn you about negative indexes because it doesn't know at compile time where a pointer will be pointing to. Any given pointer arithmetic expression might or might not result in a valid address for memory access.
a[b] does the same thing as *(a+b). Since the latter allows the negative b, so does the former.
Example of using negative array indices.
I use negative indices to check message protocols. For example, one protocol format looks like:
<nnn/message/f>
or, equally valid:
<nnn/message>
The parameter f is optional and must be a single character if supplied.
If I want to get to the value of character f, I first get a pointer to the > character:
char * end_ptr = strchr(msg, '>');
char f_char = '1'; /* default value */
Now I check if f is supplied and extract it (here is where the negative array index is used):
if (end_ptr[-2] == '/')
{
f_char = end_ptr[-1];
}
Note that I've left out error checking and other code that is not relevant to this example.

How to shift array elements in RPG?

I am aware of the fact that A loop can be used to move each element to its new destination, but is there any simpler and more elegant solution for the problem? Clearly MOVEA doesn't help here too much, as it does not work on the same array.
Shift down using the SUBARR function.
/FREE
%subarr(array:1) = %subarr(array:2); // Shift elements
array(%elem(array)) = *blanks; // Reset the remaining element
/END-FREE
C EVAL %SUBARR(array:1) = %SUBARR(array:2)
C EVAL array(%ELEM(array)) = *BLANKS
Shifting up requires the use of a temporary array or storage due to overlap.
The memmove() — Copy Bytes C Library function can safely copy overlapping memory areas left-to-right or right-to-left. One use for me is after scanning a string for an embedded apostrophe. If I find one, I take the string starting at that position and move (i.e., 'copy') the bytes one position to the right. This results in a doubled embedded apostrophe with all following bytes shifted one position.
The function also returns a pointer for the new position, and I use that (plus 1) as the starting point to scan the remainder of the string for the next embedded apostrophe. I loop until the scan returns no hits. A string is essentially just an 'array' of bytes, so it works just as well with actual arrays.

bsearch and searching range?

bsearch is pretty good for direct search, but what should I use if I need for example search range?
update
for example if i want to find range of values between a and b ( a >= x < b ).
update
range values can be not equal.
so if i have array(10,20,30) and i'm trying to find "15" i want to get address (pointer) to minimal range that is closest, in this example this is range (10,20)
One of the parameters bsearch takes is the number of elements to search. So instead of, for example, 100, make it search in 42 ...
bsearch("foo", data, /*100*/42, sizeof *data, cmpfx);
After the update
What I'd do is a manual (meaning I'd write the code) binary search.
The idea is to compare the middle element of the (remaining) array to both the lower and upper limit. If it's smaller then the lower limit search again in the small half; if it's larger than the upper limit search again in the big half; otherwise you've found an element in range.
After the 2nd update
You want to return a pair of pointers?
You have to wrap them inside a struct, or pass the addresses of the pointers to the functions ... or something.
But now you have a simpler search: search until you find the value (and return a 0-length range) or until you are about to fail. The range is between the array value you last looked at and, depending on exactly how you got to the fail situation, the value to one of the sides or EMPTY if you're at the end of the array.
The bsearch() function is designed to find a single element matching some condition. According to the man page:
RETURN VALUE
The bsearch() function returns a pointer to a matching member of the
array, or NULL if no match is found. If there are multiple elements
that match the key, the element returned is unspecified.
The key here is that if there are multiple elements that match the key, the element returned is unspecified. So you don't know if the element you get is the first, last, or somewhere in the middle of the range.
If you can change your requirements so that you're looking for elements in the array between A and B, and you can guarantee that there is exactly one A and exactly one B in the array, then you could first search for A then search for B.
start = bsearch(A, array, N, sizeof(*array), compare);
end = bsearch(B, array, N, sizeof(*array), compare);
You'll probably have to write your own function to do exactly what you're wanting.

Resources