This a code that would reverse the data of a document and save it in the same document itself.
However I am getting a Segmentation Fault.Please Help,I don't know why it gives a SegFault.
#include <stdio.h>
#include <stdlib.h>
#include <termios.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main (int argc,char* argv[])
{
int fd,n,i,j;
char* buf;
if(argc<2)
printf("USAGE: %s file-to-reverse.\n",argv[0]);
fd=open(argv[1], O_RDWR);
if(fd==-1)
printf("ERROR: Cannot reverse %s,file does not exist.\n",argv[1]);
i = 0;
j = n-1;
while(i < j)
{
read(fd,buf,n);
char ib = buf[i];
char jb = buf[j];
jb = i++;
ib = j--;
write(fd,buf,n);
}
free(buf);
close(fd);
}
EDIT1
I tried adding :
#include <sys/stat.h>
struct stat fs;
fstat(fd, &fs);
n= fs.st_size;
buf = malloc(n * sizeof (char));
but now it just duplicates the characters inside the document again and again instead of
reversing them.
You don't allocate, nor initialize buf.
You never initialized n so it could be anything, even negative. Use fstat or some other method to determine the size of the file and store that in n.
Your buffer isn't allocated and n = 0 so you will try to read 0 chars.
This should repair your code :
buf = malloc(10 * sizeof (char));
n = 10;
Resources :
Wikipedia - malloc()
linux.die.net - malloc()
linux.die.net - read()
Regarding your second EDIT - your loop is wrong.
(1) Take the read & write out of the loop - that's why it keeps writing again & again.
(2) You need to seek back to the beginning of the file, otherwise you will just be appending the new data to the end of the file.
(3) You actually have to reverse the chars in the buffer before writing them out.
read(fd, buf, n);
while (i < j)
{
char t = buf[i];
buf[i] = buf[j];
buf[j] = t;
i++;
j--;
}
lseek(fd, 0, SEEK_SET);
write(fd, buf, n);
Related
I probably got an easy one for the C programmers out there!
I am trying to create a simple C function that will execute a system command in and write the process output to a string buffer out (which should be initialized as an array of strings of length n). The output needs to be formatted in the following way:
Each line written to stdout should be initialized as a string. Each of these strings has variable length. The output should be an array consisting of each string. There is no way to know how many strings will be written, so this array is also technically of variable length (but for my purposes, I just create a fixed-length array outside the function and pass its length as an argument, rather than going for an array that I would have to manually allocate memory for).
Here is what I have right now:
#define MAX_LINE_LENGTH 512
int exec(const char* in, const char** out, const size_t n)
{
char buffer[MAX_LINE_LENGTH];
FILE *file;
const char terminator = '\0';
if ((file = popen(in, "r")) == NULL) {
return 1;
}
for (char** head = out; (size_t)head < (size_t)out + n && fgets(buffer, MAX_LINE_LENGTH, file) != NULL; head += strlen(buffer)) {
*head = strcat(buffer, &terminator);
}
if (pclose(file)) {
return 2;
}
return 0;
}
and I call it with
#define N 128
int main(void)
{
const char* buffer[N];
const char cmd[] = "<some system command resulting in multi-line output>";
const int code = exec(cmd, buffer, N);
exit(code);
}
I believe the error the above code results in is a seg fault, but I'm not experienced enough to figure out why or how to fix.
I'm almost positive it is with my logic here:
for (char** head = out; (size_t)head < (size_t)out + n && fgets(buffer, MAX_LINE_LENGTH, file) != NULL; head += strlen(buffer)) {
*head = strcat(buffer, &terminator);
}
What I thought this does is:
Get a mutable reference to out (i.e. the head pointer)
Save the current stdout line to buffer (via fgets)
Append a null terminator to buffer (because I don't think fgets does this?)
Overwrite the data at head pointer with the value from step 3
Move head pointer strlen(buffer) bytes over (i.e. the number of chars in buffer)
Continue until fgets returns NULL or head pointer has been moved beyond the bounds of out array
Where am I wrong? Any help appreciated, thanks!
EDIT #1
According to Barmar's suggestions, I edited my code:
#include <stdio.h>
#include <stdlib.h>
#define MAX_LINE_LENGTH 512
int exec(const char* in, const char** out, const size_t n)
{
char buffer[MAX_LINE_LENGTH];
FILE *file;
if ((file = popen(in, "r")) == NULL) return 1;
for (size_t i = 0; i < n && fgets(buffer, MAX_LINE_LENGTH, file) != NULL; i += 1) out[i] = buffer;
if (pclose(file)) return 2;
return 0;
}
#define N 128
int main(void)
{
const char* buffer[N];
const char cmd[] = "<system command to run>";
const int code = exec(cmd, buffer, N);
for (int i = 0; i < N; i += 1) printf("%s", buffer[i]);
exit(code);
}
While there were plenty of redundancies with what I wrote that are now fixed, this still causes a segmentation fault at runtime.
Focusing on the edited code, this assignment
out[i] = buffer;
has problems.
In this expression, buffer is implicitly converted to a pointer-to-its-first-element (&buffer[0], see: decay). No additional memory is allocated, and no string copying is done.
buffer is rewritten every iteration. After the loop, each valid element of out will point to the same memory location, which will contain the last line read.
buffer is an array local to the exec function. Its lifetime ends when the function returns, so the array in main contains dangling pointers. Utilizing these values is Undefined Behaviour.
Additionally,
for (int i = 0; i < N; i += 1)
always loops to the maximum storable number of lines, when it is possible that fewer lines than this were read.
A rigid solution uses an array of arrays to store the lines read. Here is a cursory example (see: this answer for additional information on using multidimensional arrays as function arguments).
#include <stdio.h>
#include <stdlib.h>
#define MAX_LINES 128
#define MAX_LINE_LENGTH 512
int exec(const char *cmd, char lines[MAX_LINES][MAX_LINE_LENGTH], size_t *lc)
{
FILE *stream = popen(cmd, "r");
*lc = 0;
if (!stream)
return 1;
while (*lc < MAX_LINES) {
if (!fgets(lines[*lc], MAX_LINE_LENGTH, stream))
break;
(*lc)++;
}
return pclose(stream) ? 2 : 0;
}
int main(void)
{
char lines[MAX_LINES][MAX_LINE_LENGTH];
size_t n;
int code = exec("ls -al", lines, &n);
for (size_t i = 0; i < n; i++)
printf("%s", lines[i]);
return code;
}
Using dynamic memory is another option. Here is a basic example using strdup(3), lacking robust error handling.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **exec(const char *cmd, size_t *length)
{
FILE *stream = popen(cmd, "r");
if (!stream)
return NULL;
char **lines = NULL;
char buffer[4096];
*length = 0;
while (fgets(buffer, sizeof buffer, stream)) {
char **reline = realloc(lines, sizeof *lines * (*length + 1));
if (!reline)
break;
lines = reline;
if (!(lines[*length] = strdup(buffer)))
break;
(*length)++;
}
pclose(stream);
return lines;
}
int main(void)
{
size_t n = 0;
char **lines = exec("ls -al", &n);
for (size_t i = 0; i < n; i++) {
printf("%s", lines[i]);
free(lines[i]);
}
free(lines);
}
I'm trying to use C multithreading to find out the frequency of each alphabet letter in a text file. Assignment is to: 1) write a function that read every single sentence in a text, ended by '.' 2) write a function that load a sentence in a bidimensional array 3) write a function that generates a pthread for every letter for every sentence (pthread function add 1 to a counter for that letter).
EDIT: I figured out with Valgrind that the problem is in sentence function, by I dont understand why.
Here's the code:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
char alphabet[26] = "abcdefghijklmnopqrstuvwxyz";
int count[26];
char* sentence(char * s){
char* p;
char* q;
char* arr;
int i;
p = s;
q = malloc(100);
arr = q;
for (i=0; *p != '.'; i++){
*q = *p;
q++;
p++;
}
*q = '\0';
return arr;
}
char** load_sentence(char* p, char** q, int i){
q[i] = malloc(strlen(p)+1);
strcpy(q[i], p);
return q;
}
void* count_letter(void * s){
char* p = (char*) s;
int i;
for (i=0; i<26; i++){
if (*p == alphabet[i]){
count[i]++;
}
}
}
void frequency(char* str){
char* s = str;
int i, j, l;
l = strlen(str);
pthread_t tid[l];
for (i=0; i<l; i++){
pthread_create(&tid[i], NULL, count_letter, (void*) s);
s++;
}
for (j=0; j<l; j++){
pthread_join(tid[j], NULL);
}
}
int main(int argc, char* argv[]){
int fd;
char buff[100];
fd = open(argv[1], O_RDONLY);
char ** text = malloc(10*sizeof(char*));
read(fd, buff, sizeof(buff));
char* start = buff;
int i = 0; //number of phrases!
char* p = NULL;
while (*(p = sentence(start)) != '\0'){
text = load_sentence(p, text, i);
start += strlen(p)+1;
i++;
}
int j, k;
for (k=0; k<i; k++){
frequency(text[k]);
}
for (j=0; j<26; j++){
printf("%c : %d times\n", alphabet[j], count[j]);
}
}
It looks like that with cases like this:
hope it's a good reading. bye.
The output is correct:
a : 2 times
b : 1 times
c : 0 times
d : 2 times
e : 3 times
f : 0 times
g : 3 times
h : 1 times
i : 2 times
j : 0 times
k : 0 times
l : 0 times
m : 0 times
n : 1 times
o : 3 times
p : 1 times
q : 0 times
r : 1 times
s : 1 times
t : 1 times
u : 0 times
v : 0 times
w : 0 times
x : 0 times
y : 1 times
z : 0 times
With others, a "memory error", that begins with free() : invalid next size (normal). The error has many lines of memory map and ends with abortion.
I'm quite new to C, sorry for my inexperience.
Is it necessary to introduce a mutex in this case?
Your previous version with mutex had undefined behaviour because you initialized mutex multiple times, according to reference:
Attempting to initialize an already initialized mutex results in
undefined behavior.
You are accesing count concurrently, so you have to use mutex to make thread-safe code. You called pthread_mutex_init in count_letter it is incorrect, this function is the body of your thread (multiple initialization of mutex without destroying it leads to UB), you should call pthread_mutex_init only once, for instance as first line in main function:
int main() {
pthread_mutex_init(&mtx,NULL);
before return add
pthread_mutex_destroy(&mtx);
Critical section in your count_letter function is line
count[i]++;
you should modify it as follows
pthread_mutex_lock(&mtx);
count[i]++;
pthread_mutex_unlock(&mtx);
Now, return to sentence implementation, you need to check if *p doesn't point to null terminator before comparing with .:
for (i=0; *p && *p != '.'; i++){
^^ added
without testing it, \0 != . returns true and your loop continues ...
Erika,
Since I don't really know your assignment please see this as just another way out of a 1000 to count characters. I have not checked it for bugs, rewrite to your needs. Anyhow this is how I would have solved it. If memory is sparse I would read character by character from the file until ".". Anyhow hope it helps you and you get great grades :-)...
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <stdatomic.h>
#define MAX_THREADS 100
atomic_int threadCount;
#define NCHAR 26
char alphabet[NCHAR] = "abcdefghijklmnopqrstuvwxyz";
atomic_int count[NCHAR];
void* count_letter(void * s){
threadCount++;
char* p = (char*) s;
for (int i=0; i<NCHAR; i++)
if (*p == alphabet[i])
count[i]++;
threadCount--;
return NULL;
}
int main(int argc, char* argv[]){
//Init variables
FILE *file;
char *myText;
unsigned long fileLen;
int deadLockGuard=0;
threadCount=0;
//Open the file
file = fopen(argv[1], "rb");
if (!file) {
fprintf(stderr, "Unable to open file %s", argv[1]);
return EXIT_FAILURE;
}
fseek(file, 0, SEEK_END);
fileLen=ftell(file);
rewind(file);
//reserve memory and read the file
myText=(char *)malloc(fileLen+1);
if (!myText) {
fprintf(stderr, "Memory error!");
fclose(file);
return EXIT_FAILURE;
}
fread(myText, fileLen, 1, file);
fclose(file);
//Get each sentence ending with a . and then for each character look at the count for each character in it's own thread.
char *subString = strtok(myText, "."); //This is your sentence/load_sentence method
while (subString != NULL) {
for (int v = 0;v<strlen(subString);v++) { //This is your frequency method
deadLockGuard=0;
while (threadCount >= MAX_THREADS) {
usleep(100); //Sleep 0.1ms
if(deadLockGuard++ == 10000) {
printf("Dead-lock guard1 triggered.. Call Bill Gates for help!"); //No free threads after a second.. Either the computer is DEAD SLOW or we got some creepy crawler in da house.
return EXIT_FAILURE;
}
}
pthread_t tid; //Yes you can overwrite it.. I use a counter to join the workers.
pthread_create(&tid, NULL, count_letter, (void*) subString+v);
}
subString = strtok(NULL, ".");
}
deadLockGuard=0;
//pthread_join all the still woring threads
while (threadCount) {
usleep(1000); //sleep a milli
if(deadLockGuard++ == 2*1000) {
printf("Dead-lock guard2 triggered.. Call Bill Gates for help!"); //Threads are running after 2 seconds.. Exit!!
return EXIT_FAILURE;
}
}
//Garbage collect and print the results.
free(myText);
for (int j=0; j<NCHAR; j++)
printf("%c : %d times\n", alphabet[j], count[j]);
return EXIT_SUCCESS;
}
I am looking to create an array of pointers to strings read from a file in C. However when I try to print out the strings copied to stdout, the last line of the file is always left out.
The program also sometimes experiences a segmentation fault which I haven't been able to completely eliminated. It happens about 2 out of 5 times.
Here is my input.c code:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "input.h"
#define MAXLINES 5000
void writelines(char *arr[], int l);
char *read_lines[MAXLINES];
void get_input(const char *fp) {
FILE *contents;
char *line;
char *temp;
size_t len;
ssize_t read;
int i;
i = 0;
contents = fopen(fp, "r");
if (contents == NULL)
exit(EXIT_FAILURE);
while ((read = getline(&line, &len, contents)) != -1) {
if ((temp = (char *) malloc(strlen(line) + 1)) == NULL) {
printf("Could not allocate required memory.");
exit(EXIT_FAILURE);
}
else {
line[strlen(line) - 1] = '\0';
strcpy(temp, line);
read_lines[i++] = temp;
}
}
fclose(contents);
free(line);
free(temp);
writelines(read_lines, i);
exit(EXIT_SUCCESS);
}
void writelines(char *arr[], int l) {
int i;
for (i = 0; i < l; i++) {
printf("%s\n", arr[i]);
}
}
My main.c file is:
#include <stdio.h>
#include "input.h"
int main(int argc, char *argv[]) {
if (argc == 1)
printf("Please provide a valid source code file.\n");
else
get_input(*(++argv));
return 0;
}
I compile using gcc main.c input.c -Wall with no warnings or errors.
Using gdb I can confirm that the process runs normally.
When it experiences a segmentation fault, the back trace shows a call to strlen that apparently fails.
from the documentation:
If *lineptr is NULL, then getline() will allocate a buffer for storing the line, which should be freed by the user program. (In this case, the value in *n is ignored.)
but in your case you're passing an uninitialized value to getline the first time, so getline thinks it can write to that illegal location and this is undefined behaviour (which explains the "It happens about 2 out of 5 times" thing)
The first fix should be to initialize line:
char *line = NULL;
then, why are you creating a copy of line, and you're not freeing line (memory leak) and you're not resetting it to NULL. So next time getline reuses the previous buffer, which may not be long enough to hold the next line.
The fix is just to store the line:
read_lines[i++] = line;
then set line = NULL so getline allocates the proper len for next line. And drop the malloc code, it's useless.
fixed part (you don't need to pass pointer on len it is ignored):
line = NULL;
while ((read = getline(&line, NULL, contents)) != -1) {
read_lines[i++] = line;
line[strcspn(line, "\n")] = 0; // strip off linefeed if there's one
line = NULL;
}
(linefeed strip adapted from Removing trailing newline character from fgets() input)
I've got a problem reading a couple of lines from a read-only FIFO. In particular, I have to read two lines — a number n, followed by a \n and a string str — and my C program should write str in a write-only FIFO for n times. This is my attempt.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <ctype.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
char *readline(int fd);
int main(int argc, char** argv) {
int in = open(argv[1], O_RDONLY);
mkfifo(argv[2], 0666);
int out = open(argv[2] ,O_WRONLY);
char *line = (char *) malloc(50);
int n;
while (1) {
sscanf(readline(in), "%d", &n);
strcpy(line, readline(in));
int i;
for (i = 0; i < n; i++) {
write(out, line, strlen(line));
write(out, "\n", 1);
}
}
close(in);
close(out);
return 0;
}
char *readline(int fd) {
char *c = (char *) malloc(1);
char line[50];
while (read(fd, c, 1) != 0) {
if (strcmp(c, "\n") == 0) {
break;
}
strcat(line, c);
}
return line;
}
The code is working properly, but it puts a random number of newlines after the last string repetition. Also, this number changes at each execution.
Could someone please give me any help?
Besides the facts that reading character wise and and comparing two characters using "string" comparsion both is far from being efficient, readline() returns a pointer to memory being declared local to readline(), that is line[50] The memory gets deallocated as soon as readline() returns, so accessing it afterwards invokes undefine behaviour.
One possibility to fix this is to declare the buffer to read the line into outside readline() and pass a reference to it down like so:
char * readline(int fd, char * line, size_t size)
{
if ((NULL != line) && (0 < size))
{
char c = 0;
size_t i = 0;
while (read(fd, &c, 1) >0)
{
if ('\n' == c) or (size < i) {
break;
}
line[i] = c;
++i;
}
line [i] = 0;
}
return line;
}
And then call it like this:
char * readline(int fd, char * line, size_t size);
int main(void)
{
...
char line[50] = "";
...
... readline(in, line, sizeof(line) - 1) ...
I have not tried running your code, but in your readline function you have not terminated the line with null ('\0') character. once you hit '\n' character you just breaking the while loop and returning the string line. Try adding '\0' character before returning from the function readline.
Click here for more info.
Your code did not work on my machine, and I'd say you're lucky to get any meaningful results at all.
Here are some problems to consider:
readline returns a locally defined static char buffer (line), which will be destroyed when the function ends and the memory it once occupied will be free to be overwritten by other operations.
If line was not set to null bytes on allocation, strcat would treat its garbage values as characters, and could possibly try to write after its end.
You allocate a 1-byte buffer (c), I suspect, just because you need a char* in read. This is unnecessary (see the code below). What's worse, you do not deallocate it before readline exits, and so it leaks memory.
The while(1) loop would re-read the file and re-print it to the output fifo until the end of time.
You're using some "heavy artillery" - namely, strcat and memory allocation - where there are simpler approaches.
Last, some C standard versions may require that you declare all your variables before using them. See this question.
And here's how I modified your code. Note that, if the second line is longer than 50 characters, this code may also not behave well. There are techniques around the buffer limit, but I don't use any in this example:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <ctype.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
char *readline(int fd, char * buffer);
int main(int argc, char** argv) {
int in = open(argv[1], O_RDONLY);
int out;
int n;
int i;
char line[50];
memset(line, 0, 50);
mkfifo(argv[2], 0666);
out = open(argv[2] ,O_WRONLY);
sscanf(readline(in, line), "%d", &n);
strcpy(line, readline(in, line));
for (i = 0; i < n; i++) {
write(out, line, strlen(line));
write(out, "\n", 1);
}
close(in);
close(out);
return 0;
}
char *readline(int fd, char * buffer) {
char c;
int counter = 0;
while (read(fd, &c, 1) != 0) {
if (c == '\n') {
break;
}
buffer[counter++] = c;
}
return buffer;
}
This works on my box as you described. Compiled with GCC 4.8.2 .
I am writing some code that needs to read fasta files, so part of my code (included below) is a fasta parser. As a single sequence can span multiple lines in the fasta format, I need to concatenate multiple successive lines read from the file into a single string. I do this, by realloc'ing the string buffer after reading every line, to be the current length of the sequence plus the length of the line read in. I do some other stuff, like stripping white space etc. All goes well for the first sequence, but fasta files can contain multiple sequences. So similarly, I have a dynamic array of structs with a two strings (title, and actual sequence), being "char *". Again, as I encounter a new title (introduced by a line beginning with '>') I increment the number of sequences, and realloc the sequence list buffer. The realloc segfaults on allocating space for the second sequence with
*** glibc detected *** ./stackoverflow: malloc(): memory corruption: 0x09fd9210 ***
Aborted
For the life of me I can't see why. I've run it through gdb and everything seems to be working (i.e. everything is initialised, the values seems sane)... Here's the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <errno.h>
//a struture to keep a record of sequences read in from file, and their titles
typedef struct {
char *title;
char *sequence;
} sequence_rec;
//string convenience functions
//checks whether a string consists entirely of white space
int empty(const char *s) {
int i;
i = 0;
while (s[i] != 0) {
if (!isspace(s[i])) return 0;
i++;
}
return 1;
}
//substr allocates and returns a new string which is a substring of s from i to
//j exclusive, where i < j; If i or j are negative they refer to distance from
//the end of the s
char *substr(const char *s, int i, int j) {
char *ret;
if (i < 0) i = strlen(s)-i;
if (j < 0) j = strlen(s)-j;
ret = malloc(j-i+1);
strncpy(ret,s,j-i);
return ret;
}
//strips white space from either end of the string
void strip(char **s) {
int i, j, len;
char *tmp = *s;
len = strlen(*s);
i = 0;
while ((isspace(*(*s+i)))&&(i < len)) {
i++;
}
j = strlen(*s)-1;
while ((isspace(*(*s+j)))&&(j > 0)) {
j--;
}
*s = strndup(*s+i, j-i);
free(tmp);
}
int main(int argc, char**argv) {
sequence_rec *sequences = NULL;
FILE *f = NULL;
char *line = NULL;
size_t linelen;
int rcount;
int numsequences = 0;
f = fopen(argv[1], "r");
if (f == NULL) {
fprintf(stderr, "Error opening %s: %s\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
rcount = getline(&line, &linelen, f);
while (rcount != -1) {
while (empty(line)) rcount = getline(&line, &linelen, f);
if (line[0] != '>') {
fprintf(stderr,"Sequence input not in valid fasta format\n");
return EXIT_FAILURE;
}
numsequences++;
sequences = realloc(sequences,sizeof(sequence_rec)*numsequences);
sequences[numsequences-1].title = strdup(line+1); strip(&sequences[numsequences-1].title);
rcount = getline(&line, &linelen, f);
sequences[numsequences-1].sequence = malloc(1); sequences[numsequences-1].sequence[0] = 0;
while ((!empty(line))&&(line[0] != '>')) {
strip(&line);
sequences[numsequences-1].sequence = realloc(sequences[numsequences-1].sequence, strlen(sequences[numsequences-1].sequence)+strlen(line)+1);
strcat(sequences[numsequences-1].sequence,line);
rcount = getline(&line, &linelen, f);
}
}
return EXIT_SUCCESS;
}
You should use strings that look something like this:
struct string {
int len;
char *ptr;
};
This prevents strncpy bugs like what it seems you saw, and allows you to do strcat and friends faster.
You should also use a doubling array for each string. This prevents too many allocations and memcpys. Something like this:
int sstrcat(struct string *a, struct string *b)
{
int len = a->len + b->len;
int alen = a->len;
if (a->len < len) {
while (a->len < len) {
a->len *= 2;
}
a->ptr = realloc(a->ptr, a->len);
if (a->ptr == NULL) {
return ENOMEM;
}
}
memcpy(&a->ptr[alen], b->ptr, b->len);
return 0;
}
I now see you are doing bioinformatics, which means you probably need more performance than I thought. You should use strings like this instead:
struct string {
int len;
char ptr[0];
};
This way, when you allocate a string object, you call malloc(sizeof(struct string) + len) and avoid a second call to malloc. It's a little more work but it should help measurably, in terms of speed and also memory fragmentation.
Finally, if this isn't actually the source of error, it looks like you have some corruption. Valgrind should help you detect it if gdb fails.
One potential issue is here:
strncpy(ret,s,j-i);
return ret;
ret might not get a null terminator. See man strncpy:
char *strncpy(char *dest, const char *src, size_t n);
...
The strncpy() function is similar, except that at most n bytes of src
are copied. Warning: If there is no null byte among the first n bytes
of src, the string placed in dest will not be null terminated.
There's also a bug here:
j = strlen(*s)-1;
while ((isspace(*(*s+j)))&&(j > 0)) {
What if strlen(*s) is 0? You'll end up reading (*s)[-1].
You also don't check in strip() that the string doesn't consist entirely of spaces. If it does, you'll end up with j < i.
edit: Just noticed that your substr() function doesn't actually get called.
I think the memory corruption problem might be the result of how you're handling the data used in your getline() calls. Basically, line is reallocated via strndup() in the calls to strip(), so the buffer size being tracked in linelen by getline() will no longer be accurate. getline() may overrun the buffer.
while ((!empty(line))&&(line[0] != '>')) {
strip(&line); // <-- assigns a `strndup()` allocation to `line`
sequences[numsequences-1].sequence = realloc(sequences[numsequences-1].sequence, strlen(sequences[numsequences-1].sequence)+strlen(line)+1);
strcat(sequences[numsequences-1].sequence,line);
rcount = getline(&line, &linelen, f); // <-- the buffer `line` points to might be
// smaller than `linelen` bytes
}