I apologize if my formatting is incorrect as this is my first post, I couldn't find a post on the site that dealt with the same issue I am running into. I'm using plain C on ubuntu 12.04 server. I'm trying to concatenate several strings together into a single string, separated by Ns. The string sizes and space between strings may vary, however. A struct was made to store the positional data as several integers that can be passed to multiple functions:
typedef struct pseuInts {
int pseuStartPos;
int pseuPos;
int posDiff;
int scafStartPos;
} pseuInts;
As well as a string struct:
typedef struct string {
char *str;
int len;
} myString;
Since there are break conditions for the concatenated string multiple nodes of a dynamically linked list were assembled containing an identifier and the concatenated string:
typedef struct entry {
myString title;
myString seq;
struct entry *next;
} entry;
The memset call is as follows:
} else if ((*pseuInts)->pseuPos != (*pseuInts)->scafStartPos) {
(*pseuEntry)->seq.str = realloc ((*pseuEntry)->seq.str, (((*pseuEntry)->seq.len) + (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)))); //realloc the string being extended to account for the Ns
memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len)), 'N', (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos))); //insert the correct number of Ns
(*pseuEntry)->seq.len += (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)); //Update the length of the now extended string
(*pseuInts)->pseuPos += (((*pseuInts)->scafStartPos) - ((*pseuInts)->pseuPos)); //update the position values
}
These are all being dereferenced as this else if decision is in a function being called by a function called from main, but the changes to the pseuEntry struct need to be updated in main so as to be passed to another function for further processing.
I've double checked the numbers being used in pseuInts by inserting some printf commands and they are correct in the positioning of how many Ns need to be added, even as they change between different short strings. However, when the program is run the memset only inserts Ns the first time it's called. IE:
GATTGT and TAATTTGACT are separated by 4 spaces and they become:
GATTGTNNNNTAATTTGACT
The second time it is called on the same concatenated string it doesn't work though. IE:
TAATTTGACT and TCTCC are separated by 6 spaces so the long string should become:
GATTGTNNNNTAATTTGACTNNNNNNTCTCC
but it only shows:
GATTGTNNNNTAATTTGACTTCTCC
I've added printfs to display the concatenated string immediately before and after the memset and the they are identical in output.
Sometimes the insertion is adding extra character spaces, but not initializing them so they print nonsense (as would be expected). IE:
GAATAAANNNNNNNNNNNNNNNNN¬GCTAATG
should be
GAATAAANNNNNNNNNNNNNNNNNGCTAATG
I've switched the memset with a for or a while loop and I get the same result. I used an intermediate char * to realloc and still get the same result. I'm looking for for suggestions as to where I should look to try and detect the error.
If you are okay with considering a completely different approach, I would like to offer this:
I understand your intent to be: Replace existing spaces between two strings with an equal number of "N"s. memset() (and associated memory allocations) is the primary method to perform the concatenations.
The problems you have described with your current concatenation attempts are :
1) garbage embedded in resulting string.
2) writing "N" in some unintended memory locations.
3) "N" not being written in other intended memory locations.
Different approach:
First: verify that the memory allocated to the string being modified is sufficient to contain results
second: verify all strings to be concatenated are \0 terminated before attempting concatenation.
third: use strcat(), and a for(;;) loop to append all "N"s, and eventually, subsequent strings.
eg.
for(i=0;i<numNs;i++)//compute numNs with your existing variables
{
strcat(firstStr, "N");//Note: "N" is already NULL term. , and strcat() also ensures null term.
}
strcat(firstStr, lastStr); //a null terminated concatenation
I know this approach is vastly different from what you were doing, but it does address at least the issues identified from your problem statement. If this makes no sense, please let me know and I will address questions as I am able to. (currently have other projects going on)
Looking at your memset:
memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len))), ...
That's the destination. Shouldn't it be:
(memset (((*pseuEntry)->seq.str + ((*pseuEntry)->seq.len) + ((*pseuEntry)->seq.pseuStartPos))
Otherwise I'm missing the meaninging of pseuInts .
Related
I am trying to port some code written in MATLAB to C, so that I can compile the function and execute it faster (the code is executed very often and it would bring a significant speed increase).
So basically what my MATLAB code does it that it takes a matrix and converts it to a string, adding brackets and commas, so I can write it to a text file. Here's an idea of how this would work for a vector MyVec:
MyVec = rand(1,5);
NbVal = length(MyVec)
VarValueAsText = blanks(2 + NbVal*30 + (NbVal-1));
VarValueAsText([1 end]) = '[]';
VarValueAsText(1 + 31*(1:NbVal-1)) = ',';
for i = 1:NbVal
VarValueAsText(1+(i-1)*31+(1:30)) = sprintf('%30.15f', MyVec(i));
end
Now, how can I achieve a similar result in C? It doesn't seem too difficult, since I can calculate in advance the size of my string (char array) and I know the position of each element that I need to write to my memory area. Also the sprintf function exists in C. However, I have trouble understanding how to set this up, also because I don't have an environment where I can learn easily by trial and error (for each attempt I have to recompile, which often leads to a segmentation fault and MATLAB crashing...).
I hope someone can help even though the problem will probably seem trivial, but I have have very little experience with C and I haven't been able to find an appropriate example to start from...
Given an offset (in bytes) into a string, retrieving a pointer to this offset is done simply with:
char *ptr = &string[offset];
If you are iterating through the lines of your matrix to print them, your loop might look as follow:
char *ptr = output_buffer;
for (i = 0; i < n_lines; i++) {
sprintf (ptr, "...", ...);
ptr = &ptr[line_length];
}
Be sure that you have allocated enough memory for your output buffer though.
Remember that sprintf will put a string-terminator at the end of the string it prints, so if the string you "print" into should be longer than the string you print, then that won't work.
So if you just want to overwrite part of the string, you should probably use sprintf to a temporary buffer, and then use memcpy to copy that buffer into the actual string. Something like this:
char temp[32];
sprintf(temp, "...", ...);
memcpy(&destination[position], temp, strlen(temp));
I'm filling a structure with data from a line, the line format could be 3 different forms:
1.-"LD "(Just one word)
2.-"LD A "(Just 2 words)
3.- "LD A,B "(The second word separated by a coma).
The structure called instruccion has only the 3 pointers to point each part (mnemo, op1 and op2), but when allocating memory for the second word sometimes malloc returns the same value that was given for the first word. Here is the code with the mallocs pointed:
instruccion sepInst(char *linea){
instruccion nueva;
char *et;
while(linea[strlen(linea)-1]==32||linea[strlen(linea)-1]==9)//Eliminating spaces and tabs at the end of the line
linea[strlen(linea)-1]=0;
et=nextET(linea);//Save the direction of the next space or tab
if(*et==0){//If there is not, i save all in mnemo
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
nueva.op1[0]='k';nueva.op1[1]=0;//And set a "K" for op1
nueva.op2=NULL;
return nueva;
}
nueva.mnemo=malloc(et-linea+1);<-----------------------------------
strncpy(nueva.mnemo,linea,et-linea);
nueva.mnemo[et-linea]=0;printf("\nj%xj",nueva.mnemo);
linea=et;
while(*linea==9||*linea==32)//Move pointer to the second word
linea++;
if(strchr(linea,',')==NULL){//Check if there is a coma
nueva.op1=malloc(strlen(linea)+1);//Do this if there wasn't any coma
strcpy(nueva.op1,linea);
nueva.op2=NULL;
}
else{//Do this if there was a coma
nueva.op1=malloc(strchr(linea,',')-linea+1);<----------------------------------
strncpy(nueva.op1,linea,strchr(linea,',')-linea);
nueva.op1[strchr(linea,',')-linea]=0;
linea=strchr(linea,',')+1;
nueva.op2=malloc(strlen(linea)+1);
strcpy(nueva.op2,linea);printf("\n2j%xj2",nueva.op2);
}
return nueva;
}
When I print the pointers it happens to be the same number.
note: the function char *nextET(char *line) returns the direction of the first space or tab in the line, if there is not it returns the direction of the end of the line.
sepInst() is called several times in a program and only after it has been called several times it starts failing. These mallocs across all my program are giving me such a headache.
There are two main possibilities.
Either you are freeing the memory somewhere else in your program (search for calls to free or realloc). In this case the effect that you see is completely benign.
Or, you might be suffering from memory corruption, most likely a buffer overflow. The short term cure is to use a specialized tool (a memory debugger). Pick one that is available on your platform. The tool will require recompilation (relinking) and eventually tell you where exactly is your code stepping beyond previously defined buffer limits. There may be multiple offending code locations. Treat each one as a serious defect.
Once you get tired of this kind of research, learn to use the const qualifier and use it with all variable/parameter declarations where you can do it cleanly. This cannot completely prevent buffer overflows, but it will restrict them to variables intended to be writable buffers (which, for example, those involved in your question apparently are not).
On a side note, personally, I think you should work harder to call malloc less. It's a good idea for performance, and it also causes corruption less.
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
should be
// strlen has to traverse your string to get the length,
// so if you need it more than once, save its value.
cbLineA = strlen(linea);
// malloc for the string, and the 2 bytes you need for op1.
nueva.mnemo=malloc(cbLineA + 3);
// strcpy checks for \0 again, so use memcpy
memcpy(nueva.mnemo, linea, cbLineA);
nueva.mnemo[cbLineA] = 0;
// here we avoid a second malloc by pointing op1 to the space we left after linea
nueva.op1 = nueva.mnemo + cbLinea + 1;
Whenever you can reduce the number of mallocs by pre-calculation....do it. You are using C! This is not some higher level language that abuses the heap or does garbage collection!
EDIT:
Once again, I ll try to explain it more precisely.
Am writing IRC client - network application based on sockets. Receiving data from server with recv() function and storing them in the rBuf[] array one character after another.
Then copying them to array rBufh[] which is passed to parsing function.
Important thing is that server is sending just one character at a time [one char in one recv()] and keys (char arrays stored in two dim. array [key1]) are the strings which if parsing method finds one of them in received data it needs to react properly basing it reaction on the 3 digits code which meaning is specified in protocol documentation. - as for basics.
Parsing algorithm/function works as follows:
key1 (containing keys) and rBufh arrays are the arguments passed to it
Take first char of the first string stored in key1 and compare it with first char of rBufh
If they match return int ret = 2 (which means 'part of key has been found in received data'
Now recv() adds one char (as the server only sends on char each time) to rBufh array which is passed to parsing function once again and the algorithm starts again till it finds matching key, returning int ret = 1 (found matching key) or int ret = 0 if in step 3 they did not match.
While parser is working, printing to stdout is stoped and basing on value returned by parser printing is either stoped for a longer time (if part of one of the keys from key1 array is found) or allowed to print (if whole key has been found or parser can't match letter passed to it in rBufH array to any of keys stored in key1)
We are getting closer to understanding what I asked for - I hope : )
Quantity of strings stored in key1 array is known beforehand as they are specified in the protocol doc. so initializing it is possible right away while defining.
What i asked is:
is it possible to pass to key1 (not necessarily using sprintf as I did - it just seemed convenient) string which will contain some part of code meaning 'any three digits'.
e.g.
lets mark by XXX the part of code mentioned 3 lines above
key1[0] = "irc2.gbatemp.net XXX nickname"
and now if parser come across any string matching key1[0] which contains ANY three digits in place of XXX it will return 1 - so other function will have posibility to react properly.
This way i don't have to store many strings with different values of XXX in key1 array but just one which maches all of them.
Ok, now I really did my best to help You understand it and so gain an answer : )
Greetings
Tom
P.S.
Feel free to change the title of this topic as I don't really know how should i discribe the problem in few words as You already noticed. (it is as it is because my first idea was to pass some regex to key1)
If you just want to insert an arbitrary 3 digit number in your string then do this:
int n = 123; // some integer value that you want to represent as 3 digits
sprintf(key1[1], ":irc2.gbatemp.net %03d %s ", n, nickname);
Note that the %03d format specifier is not a regular expression - it just specifies that the integer parameter will be displayed as a 3 digit decimal value, with leading zeroes if needed.
You can't tell the compiler that. Signature is
int sprintf ( char * str, const char * format, ... );
Compiler does not look inside format string and doesn't even make sure you match your %d, %s and int, char* up. This will have to be runtime. Just something like
int printKey(char* key, int n) {
if(!(0 <= n && n <= 999)) {
return -1; //C-style throw an error
}
sprintf(key, ":irc2.gbatemp.net %d %s", n, nickname);
return 0;
}
Also use snprintf not sprintf or you're asking for memory troubles.
I'm looking for a canned C routine that does what glob(3) does, except without matching the results against filenames, e.g.
input: "x[1-4]y"
output: "x1y", "x2y", "x3y", "x4y"
regardless of whether any files with those names happen to exist. EDIT: This doesn't need to produce the list all at once; in fact it would be better if it had an iterator-style "give me the next name now" API, as the list could be enormous.
Obviously this cannot support * and ?, but that's fine; I only need the [a-z] notation. Support for the {foo,bar,baz} notation would be nice too.
Best option is telling me the name of a routine that is already in everybody's C library that does this. Second best would be a pointer to a chunk of BSD-licensed (or more permissively) code. GPL code would be awkward, but I could live with it.
cURL (the command line tool, not the library) contains code that does this job, which is relatively easy to extract:
https://github.com/bagder/curl/blob/master/src/tool_urlglob.c
https://github.com/bagder/curl/blob/master/src/tool_urlglob.h
They'll have to be edited to remove some dependencies on the guts of cURL that are not part of the public library interface. The API is a little confusing, so here's some wrapper code I wrote:
#include "tool_urlglob.h"
struct url_iter
{
char **upats;
URLGlob *uglob;
int nglob;
};
static inline struct url_iter
url_prep(char **upats)
{
struct url_iter it;
it.upats = upats;
it.uglob = NULL;
it.nglob = -1;
return it;
}
static char *
url_next(struct url_iter *it)
{
char *url;
if (!it->uglob) {
for (;;) {
if (!*it->upats)
return 0;
if (!glob_url(&it->uglob, *it->upats, &it->nglob, stderr))
break;
it->upats++;
}
}
if (glob_next_url(&url, it->uglob))
abort();
if (--it->nglob == 0) {
glob_cleanup(it->uglob);
it->uglob = 0;
it->upats++;
}
return url;
}
Pass an array of strings to url_prep, call url_next on the result until it returns NULL. Strings returned from url_next must be deallocated with free when you're done with them.
Here's a sketch of how I'd write the iterator:
Count the instances of [ in the string. This will be the number of "dimensions" you iterate over.
For each dimension, establish a range of values based on the number of characters in the bracket expression.
Simply iterate an n-tuple of integers over these ranges, and use the resulting values as indices into the bracket expressions to expand the string based on the values.
Below is a function from a program:
//read the specified file and check for the input ssn
int readfile(FILE *fptr, PERSON **rptr){
int v=0, i, j;
char n2[MAXS+1], b[1]=" ";
for(i=0; i<MAXR; i++){
j=i;
if(fscanf(fptr, "%c\n%d\n%19s %19s\n%d\n%19s\n%d\n%19s\n%19s\n%d\n%d\n%19s\n\n",
&rptr[j]->gender, &rptr[j]->ssn, rptr[j]->name, n2, &rptr[j]->age,
rptr[j]->job, &rptr[j]->income, rptr[j]->major, rptr[j]->minor,
&rptr[j]->height, &rptr[j]->weight, rptr[j]->religion)==EOF) {
i=MAXR;
}
strcat(rptr[j]->name, b);
//strcat(rptr[j]->name, n2);
if(&rptr[MAXR]->ssn==&rptr[j]->ssn)
v=j;
}
return v;
}
the commented line is like that because for some reason the array 'b' contains the string 'n2' despite an obvious lack of an assignment. This occurs before the first strcat call, but after/during the fscanf call.
it does accomplish the desired goal, but why is n2 concatenated onto the end of b, especially when b only has reserved space for 1 array element?
Here is a snippet of variable definitions after the fscanf call:
*rptr[j]->name = "Rob"
b = " Low"
n2= "Low"
It works, because you got lucky. b and n2 happened to be next to each other in memory, in the right order. C doesn't do boundary checking on arrays and will quite happily let you overflow them. So you can declare an array like this:
char someArray[1] = "lots and lots of characters";
The C compiler (certainly old ones) is going to think this is fine, even though there clearly isn't enough space in the someArray to store that many characters. I'm not sure if it's defined what it'll do in this situation (I suspect not), but on my compiler it limits the population to the size of the array, so it doesn't overflow the boundary (someArray=={'l'}).
Your situation is the same (although less extreme). char b[1] is creating an array with enough room to store 1 byte. You're putting a space in that byte, so there's no room for the null terminator. strcat, keeps copying memory until it gets to a null terminator, consequently it'll keep going until it finds one, even if it's not until the end of the next string (which is what's happening in your case).
If you had been using a C++ compiler, it would have thrown at least a warning (or more likely an error) to tell you that you were trying to put too many items into the array.
B needs to be size 2, 1 for the space, 1 for the null.