Concatenation in C with 2D char array - c

I am reading in a textfile line by line into a 2D array. I want to concatenate the char arrays so I have one long char array. I am having trouble with this, I can get it to work with two char arrays but when I try to do a lot of them I go wrong.
Currently the char arrays look like this:
AGCTTTTCATTC
I want to get something like this:
AGCTTTTCATTCAGCTTTTCATTC
I have inlcuded some of my code.
int counter = 0;
fid = fopen("dna.fna","r");
while(fgets(line, sizeof(line), fid) != NULL && counter!=66283 ) {
if (strlen(line)==70) {
strcpy(dna[counter], line);
counter++;
}
}
int dnaSize = 6628;
//Concatenating the DNA into a single char array.
int i;
char DNA[dnaSize];
for(i = 0; i<66283;i++){
strcpy(DNA[i],dna[i]);
strcat(DNA[i+1],dna[i+1]);
}

You need to loop only up to < counter
Then, are you copying or concatenating? You only need to do one or the other.
I suggest just use strcat in the loop, but initialise DNA.
char DNA[dnaSize] = ""; //initalise so safe to pass to strcat
for(i = 0; i<counter;i++)
{
strcat(DNA,dna[i]); //no need for indexer to DNA
}
Also, you need to consider the sizes of your two arrays. I believe (hope) that dna is an array of an array of char. If it is, I guess it is 66283 long in it's first dimension alone. So it is not going to fit into DNA (6628 long), even if each line was 1 char in length.
Here is an idea on how to allocate exactly the right amount of memory:
#define MAXLINELENGTH (70)
#define MAXDNALINES (66283)
//don't copy this line, it will not work because of the sizes involved (over 4MB)
//it will likely stack overflow
//just do what you are currently doing as long as it's a 2-d array.
char dna[MAXDNALINES][MAXLINELENGTH + 1];
int counter = 0;
int totalSize = 0;
fid = fopen("dna.fna","r");
while(fgets(line, sizeof(line), fid) != NULL && counter!=MAXDNALINES ) {
const int lineLength = strlen(line);
if (lineLength==MAXLINELENGTH) {
strcpy(dna[counter], line);
counter++;
totalSize += lineLength;
}
}
//Concatenating the DNA into a single char array (of exactly the right length)
int i;
char *DNA = malloc(totalSize+1); // the + 1 is for the final null, and putting on heap so don't SO
DNA[0] = '\0'; //the initial null is so that the first strcat works
for(i = 0; i<counter;i++){
strcat(DNA,dna[i]);
}
//do work with DNA here
//finally free it
free(DNA);

Related

Counting strings in C, using fgets and strstr

This is part of an assignment, so the instructions are clear and I'm not allowed to use anything other than what is specified.
The idea is simple:
1) Create an array of structs which hold a string and a count
2) Count the occurrence of the string in each struct and store the count in that struct
3) Print the strings and their number of occurrences
I have been explicitly told to use the fgets and strstr functions
Here is what I've got so far,
#define MAX_STRINGS 50
#define LINE_MAX_CHARS 1000
int main(){
int n = argc - 1;
if (n > MAX_STRINGS) {
n = MAX_STRINGS;
}
Entry entries[MAX_STRINGS];
char **strings = argv+1;
prepare_table(n, strings, entries);
count_occurrences(n, stdin, entries);
print_occurrences(n, entries);
}
void prepare_table (int n, char **strings, Entry *entries) {
// n = number of words to find
// entries = array of Entry structs
for (int i = 0; i < n; i++){
Entry newEntry;
newEntry.string = *(strings + 1);
newEntry.count = 0;
*(entries + i) = newEntry;
}
}
void print_occurrences (int n, Entry *entries) {
for (int i = 0; i < n; i++){
printf("%s: %d\n", (*(entries + i)).string, (*(entries + i)).count);
}
}
void count_occurrences (int n, FILE *file, Entry *entries) {
char *str;
while (fgets(str, LINE_MAX_CHARS, file) != NULL){
for (int i = 0; i < n; i++){ // for each word
char *found;
found = (strstr(str, (*(entries + i)).string)); // search line
if (found != NULL){ // if word found in line
str = found + 1; // move string pointer forward for next iteration
i--; // to look for same word in the rest of the line
(*(entries + i)).count = (*(entries + i)).count + 1; // increment occurrences of word
}
}
}
}
I know for a fact that my prepare_table and print_occurrences functions are working perfectly. However, the problem is with the count_occurrences function.
I've been given a test file to run which just tells me that I'm not producing the correct output. I can't actually see the output to figure out whats wrong
I'm new to pointers, so I'm expecting this to be a simple error on my part. Where is my program going wrong?
fgets(char * restrict str, int size, FILE * restrict stream) writes into the buffer at str... but you don't have a buffer at str. What is str? It's just a pointer. What's it pointing at? Garbage, because you haven't initialized it to something. So it might work or it might not (edit: by which I mean you should expect it not to work, and be surprised if it did, thank you commenters!).
You could fix that by allocating some memory first:
char *str = malloc(LINE_MAX_CHARS);
// do your stuff
free(str);
str = NULL;
Or even statically allocating:
char str[LINE_MAX_CHARS];
That's one problem I can see anyway. You say you don't have output, but surely you can add some debug statements using fprintf(stderr, "") at the very least..?

Most memory-efficient way to read & store list of strings in C

I'd like to know what's the most memory efficient way to read & store a list of strings in C.
Each string may have a different length, so pre-allocating a big 2D array would be wasteful.
I also want to avoid a separate malloc for each string, as there may be many strings.
The strings will be read from a large buffer into this list data-structure I'm asking about.
Is it possible to store all strings separately with a single allocation of exactly the right size?
One idea I have is to store them contiguously in a buffer, then have a char * array pointing to the different parts in the buffer, which will have '\0's in it to delimit. I'm hoping there's a better way though.
struct list {
char *index[32];
char buf[];
};
The data-structure and strings will be strictly read-only.
Here's a mildly efficient format, assuming you know the length of all the strings in advance:
|| total size | string 1 | string 2 | ........ | string N | len(string N) | ... | len(string 2) | len(string 1) ||
You can store the lengths either in fixed-width integers or in variable-width integers, but the point is that you can jump to the end and scan all the lengths relatively efficiently, and from the length sum you can compute the offset of the string. You know when you reached the last string when there is no remaining space.
You can create your single buffer and store them contiguously, expanding the buffer as needed by using realloc(). But then you would need a second array to store string positions and maybe realloc() it as well, so I might simply create a dynamically allocated array and malloc() each string separately.
Find the number and total-length of all strings:
int num = 0;
int len = 0;
char* string = GetNextString(input);
while (string)
{
num += 1;
len += strlen(string);
string = GetNextString(input);
}
Rewind(input);
Then, allocate the following two buffers:
int* indexes = malloc(num*sizeof(int));
char* strings = malloc((num+len)*sizeof(char));
Finally, fill these two buffers:
int index = 0;
for (int i=0; i<num; i++)
{
indexes[i] = index;
string = GetNextString(input);
strcpy(strings+index,string);
index += strlen(string)+1;
}
After that, you can simply use strings[indexes[i]] in order to access the ith string.
Most efficient and memory efficient way is a two pass solution. In the first pass you calculate the total size for all strings, then you allocate the total memory block. In the second pass you read all strings using large buffers.
You can create a pointer array for the strings and calculate the difference between the pointers to get the string sizes. This way you save the null byte as end marker.
Here a complete example:
#include <stdio.h>
#include <memory.h>
#include <stdlib.h>
struct StringMap
{
char *data;
char **ptr;
long cPos;
};
void initStringMap(StringMap *stringMap, long numberOfStrings, long totalCharacters)
{
stringMap->data = (char*)malloc(sizeof(char)*(totalCharacters+1));
stringMap->ptr = (char**)malloc(sizeof(char*)*(numberOfStrings+2));
memset(stringMap->ptr, 0, sizeof(char*)*(numberOfStrings+1));
stringMap->ptr[0] = stringMap->data;
stringMap->ptr[1] = stringMap->data;
stringMap->cPos = 0;
}
void extendString(StringMap *stringMap, char *str, size_t size)
{
memcpy(stringMap->ptr[stringMap->cPos+1], str, size);
stringMap->ptr[stringMap->cPos+1] += size;
}
void endString(StringMap *stringMap)
{
stringMap->cPos++;
stringMap->ptr[stringMap->cPos+1] = stringMap->ptr[stringMap->cPos];
}
long numberOfStringsInStringMap(StringMap *stringMap)
{
return stringMap->cPos;
}
size_t stringSizeInStringMap(StringMap *stringMap, long index)
{
return stringMap->ptr[index+1] - stringMap->ptr[index];
}
char* stringinStringMap(StringMap *stringMap, long index)
{
return stringMap->ptr[index];
}
void freeStringMap(StringMap *stringMap)
{
free(stringMap->data);
free(stringMap->ptr);
}
int main()
{
// The interesting values
long numberOfStrings = 0;
long totalCharacters = 0;
// Scan the input for required information
FILE *fd = fopen("/path/to/large/textfile.txt", "r");
int bufferSize = 4096;
char *readBuffer = (char*)malloc(sizeof(char)*bufferSize);
int currentStringLength = 0;
ssize_t readBytes;
while ((readBytes = fread(readBuffer, sizeof(char), bufferSize, fd))>0) {
for (int i = 0; i < readBytes; ++i) {
const char c = readBuffer[i];
if (c != '\n') {
++currentStringLength;
} else {
++numberOfStrings;
totalCharacters += currentStringLength;
currentStringLength = 0;
}
}
}
// Display the found results
printf("Found %ld strings with total of %ld bytes\n", numberOfStrings, totalCharacters);
// Allocate the memory for the resource
StringMap stringMap;
initStringMap(&stringMap, numberOfStrings, totalCharacters);
// read all strings
rewind(fd);
while ((readBytes = fread(readBuffer, sizeof(char), bufferSize, fd))>0) {
char *stringStart = readBuffer;
for (int i = 0; i < readBytes; ++i) {
const char c = readBuffer[i];
if (c == '\n') {
extendString(&stringMap, stringStart, &readBuffer[i]-stringStart);
endString(&stringMap);
stringStart = &readBuffer[i+1];
}
}
if (stringStart < &readBuffer[readBytes]) {
extendString(&stringMap, stringStart, &readBuffer[readBytes]-stringStart);
}
}
endString(&stringMap);
fclose(fd);
// Ok read the list
numberOfStrings = numberOfStringsInStringMap(&stringMap);
printf("Number of strings in map: %ld\n", numberOfStrings);
for (long i = 0; i < numberOfStrings; ++i) {
size_t stringSize = stringSizeInStringMap(&stringMap, i);
char *buffer = (char*)malloc(stringSize+1);
memcpy(buffer, stringinStringMap(&stringMap, i), stringSize);
buffer[stringSize-1] = '\0';
printf("string %05ld size=%8ld : %s\n", i, stringSize, buffer);
free(buffer);
}
// free the resource
freeStringMap(&stringMap);
}
This example reads a very large text file, splits it into lines and creates an array with a string per line. It only needs two malloc calls. One for the pointer array and one for the sting block.
If it's strictly read-only as you've described, you can store the entire list of strings and their offsets in a single chunk of memory and read the whole thing with a single read.
The first sizeof(long) bytes stores the number of strings, n. The next n longs store the offsets into each string from the start of the string buffer which starts at position (n+1)*sizeof(long). You don't have to store the trailing zero for each string, but if you do, you can access each string with &str_buffer[offset[i]]. If you don't store the trailing '\0' then you would have to copy into a temporary buffer and append it yourself.

C, looping array of char* (strings) does't work. Why?

I have problem with my array of char*-
char *original_file_name_list[500];
while(dp=readdir(dir)) != NULL) {
original_file_name = dp->d_name;
original_file_name_list[counter] = original_file_name;
printf("%s\n",original_file_name_list[0]);
printf("%d\n",counter);
counter++;
}
The problem is, that it prints all files fine. It should print only first file, right?
And if I try printf("%s\n",original_file_name_list[1]); It doesn't work , which means that it is writing only in 1st string. Any idea why?
edit: There is no syntax error due to compiler.
You're not copying the string at all - also your file_name_list array hasn't enough space for a list of filenames - just for a list of pointers. But dp->d_name is just a pointer to a char* - you can't know for how long the memory behind the pointer is valid. Because of that you have to make a copy for yourself.
#include <string.h>
#include <dirent.h>
int main(int argc, char** argv){
char original_file_name_list[50][50];
size_t counter = 0;
while(dp=readdir(dir)) != NULL) // does work fine (ordinary reading files from dir)
{
size_t len = strlen(dp->d_name);
if(len >= 50) len = 49;
strncpy(original_file_name_list[counter], dp->d_name, len);
original_file_name_list[counter][len] = '\0';
printf("%d\n",counter);
counter++;
}
printf("%s\n",original_file_name_list[1]); // <- will work if you have at least 2 files in your directory
return 0;
}
I'm not sure about purpose of counter2 (I have replaced it with counter) but I can propose the following code with strdup() call to store the file names:
char *original_file_name_list[500] = {0}; // it is better to init it here
while(dp=readdir(dir)) != NULL) {
original_file_name_list[counter] = strdup(dp->d_name); // strdup() is ok to use
// here, see the comments
printf("%s\n%d\n",original_file_name_list[counter], counter);
counter++;
}
/* some useful code */
/* don't forget to free the items of list (allocated by strdup(..) )*/
for (int i = 0; i < 500; ++i) {
free(original_file_name_list[i]);
}

Initializing an infinite number of char **

I'm making a raytracing engine in C using the minilibX library.
I want to be able to read in a .conf file the configuration for the scene to display:
For example:
(Az#Az 117)cat universe.conf
#randomcomment
obj:eye:x:y:z
light:sun:100
light:moon:test
The number of objects can vary between 1 and the infinite.
From now on, I'm reading the file, copying each line 1 by 1 in a char **tab, and mallocing by the number of objects found, like this:
void open_file(int fd, struct s_img *m)
{
int i;
char *s;
int curs_obj;
int curs_light;
i = 0;
curs_light = 0;
curs_obj = 0;
while (s = get_next_line(fd))
{
i = i + 1;
if (s[0] == 'l')
{
m->lights[curs_light] = s;
curs_light = curs_light + 1;
}
else if (s[0] == 'o')
{
m->objs[curs_obj] = s;
curs_obj = curs_obj + 1;
}
else if (s[0] != '#')
{
show_error(i, s);
stop_parsing(m);
}
}
Now, I want to be able to store each information of each tab[i] in a new char **tab, 1 for each object, using the ':' as a separation.
So I need to initialize and malloc an undetermined number of char **tab. How can I do that?
(Ps: I hope my code and my english are good enough for you to understand. And I'm using only the very basic function, like read, write, open, malloc... and I'm re-building everything else, like printf, get_line, and so on)
You can't allocate an indeterminate amount of memory; malloc doesn't support it. What you can do is to allocate enough memory for now and revise that later:
size_t buffer = 10;
char **tab = malloc(buffer);
//...
if (indexOfObjectToCreate > buffer) {
buffer *= 2;
tab = realloc(tab, buffer);
}
I'd use an alternative approach (as this is c, not c++) and allocate simply large buffers as we go by:
char *my_malloc(size_t n) {
static size_t space_left = 0;
static char *base = NULL;
if (base==NULL || space_left < n) base=malloc(space_left=BIG_N);
base +=n; return base-n;
}
Disclaimer: I've omitted the garbage collection stuff and testing return values and all safety measures to keep the routine short.
Another way to think this is to read the file in to a large enough mallocated array (you can check it with ftell), scan the buffer, replace delimiters, line feeds etc. with ascii zero characters and remember the starting locations of keywords.

Storing text in a char matrix in C

I want to take a text from the standard input and store it into an array of strings. But I want the array of strings to be dynamic in memory. My code right now is the following:
char** readStandard()
{
int size = 0;
char** textMatrix = (char**)malloc(size);
int index = 0;
char* currentString = (char*)malloc(10); //10 is the maximum char per string
while(fgets(currentString, 10, stdin) > 0)
{
size += 10;
textMatrix = (char**)realloc(textMatrix, size);
textMatrix[index] = currentString;
index++;
}
return textMatrix;
}
The result I have while printing is the last string read in all positions of the array.
Example
Reading:
hello
nice
to
meet
you
Printing:
you
you
you
you
you
Why? I've searched over the Internet. But I didn't find this kind of error.
You are storing the same address (currentString) over and over. Try something like
while(fgets(currentString, 10, stdin) > 0)
{
textMatrix[index] = strdup(currentString); /* Make copy, assign that. */
}
The function strdup is not standard (just widely available). It should be easy to implement it yourself with malloc + memcpy.
currentString always point to the same memory area and all the pointers in textMatrix will point to it
char** readStandard()
{
int size = 0;
char** textMatrix = (char**)malloc(size);
int index = 0;
char currentString[10];
while(fgets(currentString, 10, stdin) > 0)
{
size += sizeof(char*);
textMatrix = (char**)realloc(textMatrix, size);
textMatrix[index] = strdup(currentString);
index++;
}
return textMatrix;
}

Resources