String concats onto another without an assignment, why is this? - c

Below is a function from a program:
//read the specified file and check for the input ssn
int readfile(FILE *fptr, PERSON **rptr){
int v=0, i, j;
char n2[MAXS+1], b[1]=" ";
for(i=0; i<MAXR; i++){
j=i;
if(fscanf(fptr, "%c\n%d\n%19s %19s\n%d\n%19s\n%d\n%19s\n%19s\n%d\n%d\n%19s\n\n",
&rptr[j]->gender, &rptr[j]->ssn, rptr[j]->name, n2, &rptr[j]->age,
rptr[j]->job, &rptr[j]->income, rptr[j]->major, rptr[j]->minor,
&rptr[j]->height, &rptr[j]->weight, rptr[j]->religion)==EOF) {
i=MAXR;
}
strcat(rptr[j]->name, b);
//strcat(rptr[j]->name, n2);
if(&rptr[MAXR]->ssn==&rptr[j]->ssn)
v=j;
}
return v;
}
the commented line is like that because for some reason the array 'b' contains the string 'n2' despite an obvious lack of an assignment. This occurs before the first strcat call, but after/during the fscanf call.
it does accomplish the desired goal, but why is n2 concatenated onto the end of b, especially when b only has reserved space for 1 array element?
Here is a snippet of variable definitions after the fscanf call:
*rptr[j]->name = "Rob"
b = " Low"
n2= "Low"

It works, because you got lucky. b and n2 happened to be next to each other in memory, in the right order. C doesn't do boundary checking on arrays and will quite happily let you overflow them. So you can declare an array like this:
char someArray[1] = "lots and lots of characters";
The C compiler (certainly old ones) is going to think this is fine, even though there clearly isn't enough space in the someArray to store that many characters. I'm not sure if it's defined what it'll do in this situation (I suspect not), but on my compiler it limits the population to the size of the array, so it doesn't overflow the boundary (someArray=={'l'}).
Your situation is the same (although less extreme). char b[1] is creating an array with enough room to store 1 byte. You're putting a space in that byte, so there's no room for the null terminator. strcat, keeps copying memory until it gets to a null terminator, consequently it'll keep going until it finds one, even if it's not until the end of the next string (which is what's happening in your case).
If you had been using a C++ compiler, it would have thrown at least a warning (or more likely an error) to tell you that you were trying to put too many items into the array.

B needs to be size 2, 1 for the space, 1 for the null.

Related

Why cant I just print the array and what are the conditions to avoid a loop?

I wrote a program that changes the order of any given array and now I wanted to print it. Often, I just use a for-loop for integers, chars and all the other types, because it looks like that this works every time. Nevertheless, I am curious about the short version and I would like to know when I can use it. At the moment it just prints stuff from a random place.
#include <stdio.h>
int main() {
int array1[] = { 4, 5, 2, 7, 5, 3, 8 };
int laenge = sizeof(array1) / sizeof(array1[0]);
printf("%d\n", laenge);
int array2[laenge];
for (int i = 0; i < laenge; i++) {
array2[laenge - 1 - i] = array1[i];
}
printf("%d\n\n", array2);
for(int j = 0; j < laenge; j++) {
printf("%d", array2[j]);
}
return 0;
}
When you say,
Nevertheless, Im curious about the short version and I would like to know when I can use it.
, I take you to be asking about this:
printf("%d\n\n", array2);
The short answer is "never".
A longer answer is that
the C standard library does not provide any functions for producing formatted output of whole arrays (except the various flavors of C strings as a rather special case), and in particular, printf does not provide such a facility; and
the particular printf call you present has undefined behavior, and even on its face does not mean what it sounds like you think it means. It is attempting to print a pointer (to the first element of array array1) as if that pointer were an int.
Often, I just use a for-loop for integers, chars and all the other types, because it looks like that this works every time.
If the loop uses the correct iteration and the body converts the array contents appropriately to a valid stream, it should work every time.
Nevertheless, I am curious about the short version and I would like to know when I can use it.
The only type of array for which printf has built-in support is arrays of char. If the array is null terminated, %s will print it, if it is not, %.*s can be used and the number of elements must be passed before the array pointer as an int argument.
Other types are not supported by default. Using a loop is the only portable solution.
At the moment it just prints stuff from a random place.
printf("%d\n\n", array2) has undefined behavior because %d expects an int argument and you pass a pointer to int. printf will retrieve an int value from the place where the code would have provided an int argument, array2 is passed as a pointer, possibly in a different place and with a different representation, so the output if any will be meaningless and the program could misbehave in other unpredictable ways.
There is no "short version" - except for strings, there's no format specifier for printf that will print the entire contents of an array at once.
There are a couple of reasons for this:
Arrays don't store any metadata about their size or type; when you declare an array like int a[N];, what you get in memory looks like +---+
a: | | a[0]
+---+
| | a[1]
+---+
| | a[2]
+---+
...
No storage is allocated for anything other than the array elements themselves;
Under most circumstances, array expressions "decay" to pointers to their first element - when you called printf("%d\n\n", array2);
it was equivalent to calling it asprintf("%d\n\n", &array2[0]);
What printf received was not the contents of the array, but only a pointer to its first element. The %d conversion specifier expects its corresponding argument to be an int, so it will do its best to interpret that pointer value as an int, but the behavior is undefined and you can get anything from garbled output to having your program crash outright.
%s works for strings because it gets the address of the first character, then prints successive characters until it sees the zero terminator - its behavior is sort of like the following:
while ( *p )
putchar( *p++ );
This won't work for arrays where 0 can be a valid, in-band value, and again arrays don't know how big they are at runtime. So there's no good way to print the contents of an array with a single printf call.
That's just how C is. There are valid reasons for why the language works this way, but it means you don't have those sorts of conveniences.

Searching for all integers that occure twice in a vector

I got the task in university to realize an input of a maximum of 10 integers, which shall be stored in a one dimensional vector. Afterwards, every integer of the vector needs to be displayed on the display (via printf).
However, I don't know how to check the vector for each number. I thought something along the lines of letting the pointer of the vector run from 0 to 9 and comparing the value of each element with all elements again, but I am sure there is a much smarter way. I don't in any case know how to code this idea since I am new to C.
Here is what I have tried:
#include <stdio.h>
int main(void)
{
int vector[10];
int a;
int b;
int c;
a = 0;
b = 0;
c = 0;
printf("Please input 10 integers.\n\n");
while (a <= 10);
{
for (scanf_s("%lf", &vektor[a]) == 0)
{
printf("This is not an integer. Please try again.\n");
fflush(stdin);
}
a++;
}
for (b <= 10);
{
if (vector[b] != vector[c]);
{
printf("&d", vector[b]);
c++;
}
b++;
}
return 0;
}
Your code has several problems, some syntactic and some semantic. Your compiler will help with many of the former kind, such as
misspelling of variable name vector in one place (though perhaps this was a missed after-the-fact edit), and
incorrect syntax for a for loop
Some compilers will notice that your scanf format is mismatched with the corresponding argument. Also, you might even get a warning that clues you in to the semicolons that are erroneously placed between your loop headers and their intended bodies. I don't know any compiler that would warn you that bad input will cause your input loop to spin indefinitely, however.
But I guess the most significant issue is that the details of your approach to printing only non-duplicate elements simply will not serve. For this purpose, I recommend figuring out how to describe in words how the computer (or a person) should solve the problem before trying to write C code to implement it. These are really two different exercises, especially for someone whose familiarity with C is limited. You can reason about the prose description without being bogged down and distracted by C syntax.
For example, here are some words that might suit:
Consider each element, E, of the array in turn, from first to last.
Check all the elements preceding E in the array for one that contains the same value.
If none of the elements before E contains the same value as E then E contains the first appearance of its value, so print it. Otherwise, E's value was already printed when some previous element was processed, so do not print it again.
Consider the next E, if any (go back to step 1).

code order with variable length array

In C99, is there a big different between these two?:
int main() {
int n , m;
scanf("%d %d", &n, &m);
int X[n][m];
X[n-1][m-1] = 5;
printf("%d", X[n-1][m-1]);
}
and:
int main(int argc, char *argv[]) {
int n , m;
int X[n][m];
scanf("%d %d", &n, &m);
X[n-1][m-1] = 5;
printf("%d", X[n-1][m-1]);
}
The first one seems to always work, whereas the second one appears to work for most inputs, but gives a segfault for the inputs 5 5 and 6 6 and returns a different value than 5 for the input 9 9. So do you need to make sure to get the values before declaring them with variable length arrays or is there something else going on here?
When the second one works, it's pure chance. The fact that it ever works proves that, thankfully, compilers can't yet make demons fly out of your nose.
Declaring a variable doesn't necessarily initialize it. int n, m; leaves both n and m with undefined values in this case, and attempting to access those values is undefined behavior. If the raw binary data in the memory those point to happen to be interpreted into a value larger than the values entered for n and m -- which is very, very far from guaranteed -- then your code will work; if not, it won't. Your compiler could also have made this segfault, or made it melt your CPU; it's undefined behavior, so anything can happen.
For example, let's say that the area of memory that the compiler dedicates to n happened to contain the number 10589231, and m got 14. If you then entered an n of 12 and an m of 6, you're golden -- the array happens to be big enough. On the other hand, if n got 4 and m got 2, then your code will look past the end of the array, and you'll get undefined behavior -- which might not even break, since it's entirely possible that the bits stored in four-byte segments after the end of the array are both accessible to your program and valid integers according to your compiler/the C standard. In addition, it's possible for n and m to end up with negative values, which leads to... weird stuff. Probably.
Of course, this is all fluff and speculation depending on the compiler, OS, time of day, and phase of the moon,1 and you can't rely on any numbers happening to be initialized to the right ones.
With the first one, on the other hand, you're assigning the values through scanf, so (assuming it doesn't error) (and the entered numbers aren't negative) (or zero) you're going to have valid indices, because the array is guaranteed to be big enough because the variables are initialized properly.
Just to be clear, even though variables are required to be zero-initialized under some circumstances doesn't mean you should rely on that behavior. You should always explicitly give variables a default value, or initialize them as soon as possible after their declaration (in the case of using something like scanf). This makes your code clearer, and prevents people from wondering if you're relying on this type of UB.
1: Source: Ryan Bemrose, in chat
int X[n][m]; means to declare an array whose dimensions are the values that n and m currently have. C code doesn't look into the future; statements and declarations are executed in the order they are encountered.
In your second code you did not give n or m values, so this is undefined behaviour which means that anything may happen.
Here is another example of sequential execution:
int x = 5;
printf("%d\n", x);
x = 7;
This will print 5, not 7.
The second one should produce bugs because n and m are initialized with pretty much random values if they're local variables. If they're global, they'll be with value 0.

memset() not setting memory in c

I apologize if my formatting is incorrect as this is my first post, I couldn't find a post on the site that dealt with the same issue I am running into. I'm using plain C on ubuntu 12.04 server. I'm trying to concatenate several strings together into a single string, separated by Ns. The string sizes and space between strings may vary, however. A struct was made to store the positional data as several integers that can be passed to multiple functions:
typedef struct pseuInts {
int pseuStartPos;
int pseuPos;
int posDiff;
int scafStartPos;
} pseuInts;
As well as a string struct:
typedef struct string {
char *str;
int len;
} myString;
Since there are break conditions for the concatenated string multiple nodes of a dynamically linked list were assembled containing an identifier and the concatenated string:
typedef struct entry {
myString title;
myString seq;
struct entry *next;
} entry;
The memset call is as follows:
} else if ((*pseuInts)->pseuPos != (*pseuInts)->scafStartPos) {
(*pseuEntry)->seq.str = realloc ((*pseuEntry)->seq.str, (((*pseuEntry)->seq.len) + (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)))); //realloc the string being extended to account for the Ns
memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len)), 'N', (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos))); //insert the correct number of Ns
(*pseuEntry)->seq.len += (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)); //Update the length of the now extended string
(*pseuInts)->pseuPos += (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)); //update the position values
}
These are all being dereferenced as this else if decision is in a function being called by a function called from main, but the changes to the pseuEntry struct need to be updated in main so as to be passed to another function for further processing.
I've double checked the numbers being used in pseuInts by inserting some printf commands and they are correct in the positioning of how many Ns need to be added, even as they change between different short strings. However, when the program is run the memset only inserts Ns the first time it's called. IE:
GATTGT and TAATTTGACT are separated by 4 spaces and they become:
GATTGTNNNNTAATTTGACT
The second time it is called on the same concatenated string it doesn't work though. IE:
TAATTTGACT and TCTCC are separated by 6 spaces so the long string should become:
GATTGTNNNNTAATTTGACTNNNNNNTCTCC
but it only shows:
GATTGTNNNNTAATTTGACTTCTCC
I've added printfs to display the concatenated string immediately before and after the memset and the they are identical in output.
Sometimes the insertion is adding extra character spaces, but not initializing them so they print nonsense (as would be expected). IE:
GAATAAANNNNNNNNNNNNNNNNN¬GCTAATG
should be
GAATAAANNNNNNNNNNNNNNNNNGCTAATG
I've switched the memset with a for or a while loop and I get the same result. I used an intermediate char * to realloc and still get the same result. I'm looking for for suggestions as to where I should look to try and detect the error.
If you are okay with considering a completely different approach, I would like to offer this:
I understand your intent to be: Replace existing spaces between two strings with an equal number of "N"s. memset() (and associated memory allocations) is the primary method to perform the concatenations.
The problems you have described with your current concatenation attempts are :
1) garbage embedded in resulting string.
2) writing "N" in some unintended memory locations.
3) "N" not being written in other intended memory locations.
Different approach:
First: verify that the memory allocated to the string being modified is sufficient to contain results
second: verify all strings to be concatenated are \0 terminated before attempting concatenation.
third: use strcat(), and a for(;;) loop to append all "N"s, and eventually, subsequent strings.
eg.
for(i=0;i<numNs;i++)//compute numNs with your existing variables
{
strcat(firstStr, "N");//Note: "N" is already NULL term. , and strcat() also ensures null term.
}
strcat(firstStr, lastStr); //a null terminated concatenation
I know this approach is vastly different from what you were doing, but it does address at least the issues identified from your problem statement. If this makes no sense, please let me know and I will address questions as I am able to. (currently have other projects going on)
Looking at your memset:
memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len))), ...
That's the destination. Shouldn't it be:
(memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len) + ((*pseuEntry)->seq.pseuStartPos))
Otherwise I'm missing the meaninging of pseuInts .

two mallocs returning same pointer value

I'm filling a structure with data from a line, the line format could be 3 different forms:
1.-"LD "(Just one word)
2.-"LD A "(Just 2 words)
3.- "LD A,B "(The second word separated by a coma).
The structure called instruccion has only the 3 pointers to point each part (mnemo, op1 and op2), but when allocating memory for the second word sometimes malloc returns the same value that was given for the first word. Here is the code with the mallocs pointed:
instruccion sepInst(char *linea){
instruccion nueva;
char *et;
while(linea[strlen(linea)-1]==32||linea[strlen(linea)-1]==9)//Eliminating spaces and tabs at the end of the line
linea[strlen(linea)-1]=0;
et=nextET(linea);//Save the direction of the next space or tab
if(*et==0){//If there is not, i save all in mnemo
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
nueva.op1[0]='k';nueva.op1[1]=0;//And set a "K" for op1
nueva.op2=NULL;
return nueva;
}
nueva.mnemo=malloc(et-linea+1);<-----------------------------------
strncpy(nueva.mnemo,linea,et-linea);
nueva.mnemo[et-linea]=0;printf("\nj%xj",nueva.mnemo);
linea=et;
while(*linea==9||*linea==32)//Move pointer to the second word
linea++;
if(strchr(linea,',')==NULL){//Check if there is a coma
nueva.op1=malloc(strlen(linea)+1);//Do this if there wasn't any coma
strcpy(nueva.op1,linea);
nueva.op2=NULL;
}
else{//Do this if there was a coma
nueva.op1=malloc(strchr(linea,',')-linea+1);<----------------------------------
strncpy(nueva.op1,linea,strchr(linea,',')-linea);
nueva.op1[strchr(linea,',')-linea]=0;
linea=strchr(linea,',')+1;
nueva.op2=malloc(strlen(linea)+1);
strcpy(nueva.op2,linea);printf("\n2j%xj2",nueva.op2);
}
return nueva;
}
When I print the pointers it happens to be the same number.
note: the function char *nextET(char *line) returns the direction of the first space or tab in the line, if there is not it returns the direction of the end of the line.
sepInst() is called several times in a program and only after it has been called several times it starts failing. These mallocs across all my program are giving me such a headache.
There are two main possibilities.
Either you are freeing the memory somewhere else in your program (search for calls to free or realloc). In this case the effect that you see is completely benign.
Or, you might be suffering from memory corruption, most likely a buffer overflow. The short term cure is to use a specialized tool (a memory debugger). Pick one that is available on your platform. The tool will require recompilation (relinking) and eventually tell you where exactly is your code stepping beyond previously defined buffer limits. There may be multiple offending code locations. Treat each one as a serious defect.
Once you get tired of this kind of research, learn to use the const qualifier and use it with all variable/parameter declarations where you can do it cleanly. This cannot completely prevent buffer overflows, but it will restrict them to variables intended to be writable buffers (which, for example, those involved in your question apparently are not).
On a side note, personally, I think you should work harder to call malloc less. It's a good idea for performance, and it also causes corruption less.
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
should be
// strlen has to traverse your string to get the length,
// so if you need it more than once, save its value.
cbLineA = strlen(linea);
// malloc for the string, and the 2 bytes you need for op1.
nueva.mnemo=malloc(cbLineA + 3);
// strcpy checks for \0 again, so use memcpy
memcpy(nueva.mnemo, linea, cbLineA);
nueva.mnemo[cbLineA] = 0;
// here we avoid a second malloc by pointing op1 to the space we left after linea
nueva.op1 = nueva.mnemo + cbLinea + 1;
Whenever you can reduce the number of mallocs by pre-calculation....do it. You are using C! This is not some higher level language that abuses the heap or does garbage collection!

Resources