Dynamic Structures And Storing Data without stdlib.h - c

I have tried using Google, but not really sure how to phrase my search to get relevant results. The programming language is C. I was given a (homework) assignment which requires reading a text file and outputting the unique words in the text file. The restriction is that the only allowable import is <stdio.h>. So, is there a way to use dynamic structures without using <stdlib.h>? Would it be necessary to define those dynamic structures on my own? If this has already been addressed on Stack Overflow, then please point me to the question.
Clarification was provided today that the allowable imports now include <stdlib.h> as well as (though not necessary or desirable) the use of <string.h>, which in turn makes this problem easier (and I am tempted to say trivial).

It is telling that you couldn't find anything with Google. Assignments with completely arbitrary restrictions are idiotic. The assignment tells something profound about the quality of the course and the instructor. There is more to be learnt from an assignment that requires the use of realloc and other standard library functions.
You don't need a data structure, only a large enough 2-dimensional char array - you must know at compile time how long words you're going to have and how many of them are there going to be at most; or you need to read the file once and then you're going to allocate a two-dimensional variable-length array on the stack (and possibly blow the stack), reset the file pointer and read the file again into that array...
Then you read the words into it using fgets, loop over the words using 2 nested for loops and comparing the first and second strings together (of course you'd skip if both outer and inner loop are at the same index) - if you don't find a match in the inner loop, you'll print the word.
Doing the assignment this way doesn't teach anything useful about programming, but the only standard library routine you need replicate yourself is strcmp and at least you'll save your energy for something useful instead.

It is not possible to code dynamic data structures in c using only stdio.h. That may be one of the reasons your teacher restricted you to using just stdio.h--they didn't want you going down the rabbit hole of trying to make a linked list or something in which to store unique words.
However, if you think about it, you don't need a dynamic data structure. Here's something to try: (1) make a copy of your source file. (2) declare a results text file to store your results. (3) Copy the first word in your source file to the results file. Then run through your source file and delete every copy of that word. Now there can't be any duplicates of that word. Then move on to the next word and copy and delete.
When you're done, your source file should be empty (thus the reason for the backup) and your results file should have one copy of every unique word from the original source file.
The benefit of this approach is that it doesn't require you to know (or guess) the size of the initial source file.

Agreed on the points above on "exercises with arbitrary constraints" mostly being used to illustrate a lecturers favorite pet peeve.
However, if you are allowed to be naive you could do what others have said and assume a maximum size for your array of unique strings and use a simple buffer. I wrote a little stub illustrating what I was thinking. However, it is shared with the disclaimer that I am not a "real programmer", with all the bad habits and knowledge-gaps that follows...
I have obviously also ignored the topics of reading the file and filtering unique words.
#include <stdio.h> // scanf, printf, etc.
#include <string.h> // strcpy, strlen (only for convenience here)
#define NUM_STRINGS 1024 // maximum number of strings
#define MAX_STRING_SIZE 32 // maximum length of a string (in fixed buffer)
char fixed_buff[NUM_STRINGS][MAX_STRING_SIZE];
char * buff[NUM_STRINGS]; // <-- Will only work for string literals OR
// if the strings that populates the buffer
// are stored in a separate location and the
// buffer refers to the permanent location.
/**
* Fixed length of buffer (NUM_STRINGS) and max item length (MAX_STRING_SIZE)
*/
void example_1(char strings[][MAX_STRING_SIZE] )
{
// Note: terminates when first item in the current string is '\0'
// this may be a bad idea(?)
for(size_t i = 0; *strings[i] != '\0'; i++)
printf("strings[%ld] : %s (length %ld)\n", i, strings[i], strlen(strings[i]));
}
/**
* Fixed length of buffer (NUM_STRINGS), but arbitrary item length
*/
void example_2(char * strings[])
{
// Note: Terminating on reaching a NULL pointer as the number of strings is
// "unknown".
for(size_t i = 0; strings[i] != NULL; i++)
printf("strings[%ld] : %s (length %ld)\n", i, strings[i], strlen(strings[i]));
}
int main(int argc, char* argv[])
{
// Populate buffers
strncpy(fixed_buff[0], "foo", MAX_STRING_SIZE - 1);
strncpy(fixed_buff[1], "bar", MAX_STRING_SIZE - 1);
buff[0] = "mon";
buff[1] = "ami";
// Run examples
example_1(fixed_buff);
example_2(buff);
return 0;
}

Related

How to iterate character by character in strings which lengths are larger than INT_MAX, or SIZE_MAX?

C language, How to iterate character by character in strings which lengths are larger than INT_MAX, or SIZE_MAX?
How to find out that string length exceeded the any MAXIMUM SIZE applicable for the code below?
int len = strlen(item);
int i=0;
while (i <= len ) {
//do smth
i++;
}
You can access characters in a string (or elements in an array generally) without integer indices by using pointers:
for (char *p = item; *p; ++p)
{
// Do something.
// *p is the current character.
}
int len = strlen(item);
first, this is not an impedment to have a string longer thatn INT_MAX, but it will if you have to deal with it's length. If you thinkg about the implementation of strlen() you'll see that, as how strings are defined (a sequence of chars in memory bounded by thre presence of a null char) you'll see that the only possible implementation is to search the string, incrementing the length as you traverse it searching for the first null char on it. This makes your code very ineficient, because you first traverse the string searching for its end, then you traverse it a second time to do useful work.
int i=0;
while (i <= len ) {
//do smth
i++;
}
it should be better to use directly a pointer, in a for loop, like this one:
char *p;
for (p = item; *p; p++) {
// so something knowing that the char `*p` is the iterated char.
}
In this way, you navigate the string and stop when you find the null char, and you will not have to traverse it twice.
By the way, having strings longer than INT_MAX is quite difficult, because normally (and more with the new 64bit architectures) you are not allowed to create a so compact memory structure (this meaning that if you try to create a static array of that size, you will be fighting with the compiler, and if you try to malloc() such a huge amount of memory, you will end fighting wiht the operating system)
It's most normal that developers having to deal with huge amounts of memory, use an unseen structure to hold large amounts of characters. Just imagine that you need to insert one char and this forces you to move one gigabyte of memory one position because you have no other way to make room for it. It's simply unpractical to use such an amount. A simple approach is to use a similar structure as it is used for the file data in a disk in a unix system. The data has a series of direct pointers that point to fixed blocks of memory holding characters, but at some point those pointers become double pointers, pointing to an array of simple poointers, then a triple pointer, etc. This way you can handle strings as sort as one byte (with just a memory page)to more than INT_MAX bytes, by selecting an appropiate size for the page and the number of pointers.
Another approach is the mbuf_t approach used by BSD software to handle networking packets. This is expressely appropiate when you have to add to the string in front of it (e.g. to add a new protocol header) or to the rear of the packet (to add payload and/or checksum or trailing data)
One last thing... if you create an array of 5Gb, most probably every today operating system will swap it, as soon as you stop using part of it. This will make your application to start swaping as soon as you move on the array, and probably you will not be able to run your application in a computer with a limited address space (like, today a 32bit machine is)

What is the proper way to populate an array of Strings in C, such that each string is a single element in the array

I'm trying to initialize a 2D array of strings in C; which does not seem to work like any other language I've coded in. What I'm TRYING to do, is read input, and take all of the comments out of that input and store them in a 2d array, where each string would be a new row in the array. When I get to a character that is next line, I want to advance the first index of the array so that I can separate each "comment string". ie.
char arr[100][100];
<whatever condition>
arr[i][j] = "//First comment";
Then when I get to a '/n' I want to increment the first index such that:
arr[i+1][j] = "//Second comment";
I just want to be able to access each input as an individual element in my array. In Java I wouldn't need to do this, as each string would already be an individual element in a String array. I've only been working with c for 3 weeks now, and things that I used to take for granted as being simple, have proven to be quite frustrating in C.
My actual code is below. It gives me an infinite loop and prints out a ton of numbers:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const int MAXLENGTH = 500;
int main(){
char comment[MAXLENGTH][MAXLENGTH];
char activeChar;
int cIndex = 0;
int startComment = 0;
int next = 0;
while((activeChar = getchar()) != EOF){
if(activeChar == '/'){
startComment = 1;
}
if(activeChar == '\n'){
comment[next][cIndex] = '\0';
next++;
}
if(startComment){
comment[next][cIndex] = activeChar;
cIndex++;
}
}
for(int x = 0 ; x < MAXLENGTH; x++){
for (int j = 0; j < MAXLENGTH; j++){
if(comment[x][j] != 0)
printf("%s", comment[x][j]);
}
}
return 0;
}
The problem you are having is that C was designed to be essentially a glorified assembler. That means that the only stuff it has 'built-in' are things for which there is an obvious correct way to do it. Strings do not meet this criteria. As such strings are not a first-order citizen in c.
In particular there are at least three viable ways to deal with strings, C doesn't force you to use any of them, but instead allows you to use what you want for the job at hand.
Method 1: Static Array
This method appears to be similar to what you are trying to do, and is often used by new C programmers. In this method a string is just an array of characters exactly long enough to fit its contents. Assigning arrays becomes difficult, so this promotes using strings as immutables. It feels likely that this is how most JVM's would implement strings. C code: char my_string[] = "Hello";
Method 2: Static Bounded Array
This method is what you are doing. You decide that your strings must be shorter than a specified length, and pre-allocate a large enough buffer for them. In this case it is relatively easy to assign strings and change them, but they must not become longer than the set length. C code: char my_string[MAX_STRING_LENGTH] = "Hello";
Method 3: Dynamic Array
This is the most advanced and risky method. Here you dynamically allocate your strings so that they always fit their content. If they grow too big, you resize. They can be implemented many ways (usually as a single char pointer that is realloc'd as necessary in combination with method 2, occasionally as a linked list).
Regardless of how you implement strings, to C's eyes they are all just arrays of characters. The only caveat is that to use the standard library you need to null terminate your strings manually (although many [all?] of them specify ways to get around this by manually specifying the length).
This is why java strings are not primitive types, but rather objects of type String.
Interestingly enough, many languages actually use different String types for these solutions. For example Ada has String, Bounded_String, and Unbounded_String for the three methods above.
Solution
Look at your code again: char arr[100][100]; which method is this, and what is it?
Obviously it is method 2 with MAX_STRING_LENGTH of 100. So you could pretend the line says: my_strings arr[100] which makes your issue apparent, this is not a 2D array of strings, but a 2D array of characters which represents a 1D array of strings. To create a 2D array of strings in C you would use: char arr[WIDTH][HEIGHT][MAX_STRING_LENGTH] which is easy to get wrong. As above, however, you have some logic errors in your code, and you can probably solve this problem with just a 1D array of strings. (2D array of chars)
comment is a 2D array of chars, which are single characters. In C, a string is simply an array of characters, so your definition of comment is one way to define a 1D array of strings.
As far as the loading goes, the only obvious potential problem is that you don't ever reset startComment to zero (but you should use a debugger to make sure it's being loaded correctly), however your code to print it out is wrong.
Using printf() with a %s tells it to start printing the string at whatever address you give it, but you're giving it individual characters, not whole strings, so it's interpreting each character in each string (because C is a horrible, horrible language) as an address in RAM and trying to print that RAM. To print an individual character, use %c instead of %s. Or, just make a 1D for loop:
for(int x=0; x<MAX_LENGTH; X++)
printf("%s\n", comment[x])
It's also a bit confusing that you use the same MAX_LENGTH for the number of lines in the array and the length of the string in each line

Trying to read text file data into an array of structs?

Here is what I want to do:
Read my 2 text files into an "array of structs" (this is how it is worded on my assignment).
Dynamically create sufficient memory for each entry read from the file
Here is one of the structs I am working with:
typedef struct {
int eventid;
char eventdate[20];
char venuename[20];
char country[20];
int rockid;
} Venue;
Within my main function I have the array setup to receive the text as:
Venue *(places[20]);
Now comes the more complex part. I need to open the file for reading (I got this to work perfectly) and then dynamically allocate the memory for each entry. I know I need to use malloc to do this but I have never used it before and am sort of at a loss. Here is what I have so far:
void load_data(void)
{
char buffer[20]; //stating that each line can't be longer than 20 chars
int i = 0,len; //declaring 2 int variables
FILE * venuePtr=fopen("venueinfo.txt", "r");
if (venuePtr != NULL)
printf("\n**Venue info file has been opened!**\n");
else{
printf("\nPlease create a file named venueinfo.txt and restart!\n");
} //so far so good...
while (!feof(venuePtr)){ //while we have not found the eof key...
fscanf(venuePtr,"%s",buffer); //we scan each line of text
len = strlen(buffer); //find the length (len) of the string
places[i]=(char*)malloc(len+1); //allocate memory space for the word here
strcpy(places[i],buffer); //copy a word into our array
++i; //finally we move on to the next element in the array
} //end while
The problem resides in the while loop and I have been working this for 2 days straight. I have 5 members in my struct and am thinking that strcpy may not work. This is only part of the problem though I am sure. I just can't wrap my head around reading in everything. The file itself is a super simple txt file and looks like this:
1 Jan10 Citadel Belgium 8
4 May05 Sunrise Belize 6
3 Jun17 Footloose Brazil 4
you are trying to copy string into the venue structure, how do you expect that to work ?
strcpy(venue[i],buffer);
please give an example of your file, you probably need to parse each element and write it to structure members
Stuff you can skip but should eventually do:
use function(void), not just empty() for functions. visual studio, which you are using allows it, but it's bad format.
declaring global variables is likewise bad form. You want to declare them in main and pass them in.
finally, you want to return. return 0 from a successful run of main. If you have a void function, still, return; before the final closing '}' character of the function.
oh and fscan_s isn't portable, it's a Microsoft function.
stuff you actually asked about:
Now, onto your problem. Don't allocate memory and assign it. You have statically allocated memory for your structures, and for your strings, by making them arrays with a given number of characters. If you want to statically allocate memory, you need to use a pointer.
if you scan the first number, the rock id into id, you would assign the first venue's rock id as,
venue[0].rockid = id;
for arrays, you do have to string copy. You already allocated memory for them, so you just have to use strcpy.
but you can't just copy strings into the structure and have it all end up in the right place. You need to get each part and add it seperately
That means you either need to read in each element separately, like "%d%s" to read an int then a string, or whatever, or you need to split up your string after reading in the whole thing. NOTE
That %s won't read the whole line!!! It will stop at the first white-space character (new line, tab, even a space) so if you %s "hi there" you get "hi". You may want to use %[^\n] which while characters don't match \n.
My suggestion is to use fscanf with multiple items, but if you need to split the string up, you want to use sscanf, which lets you scan a string again.
Finally, you don't need to test for feof, and that's actually problematic. It's far better to use while (fscanf(parameters go in here) > 0) since EOF is generally -1 and 0 means no items were scanned. Either way, you are done reading.
I suggest you start small. It looks like you are trying to jump ahead without understanding fundamentals, and C is kinda brutal about needing to know those.
Good luck.
P.S. It's very possible I made a small mistake here just now, writing this, because I'm not a C expert, but I'm sure somebody will come along and help me. Finding mistakes is how we learn, so don't get discouraged. :)

Showing the difference between 2 char * strings in C

I am new to C, and I am trying to figure out the best way to approach this problem. I have 2 strings that are both char *'s.
They have multiple \n characters within the strings themselves, and they are usually about 1000 characters in length. I want to display only single lines that are different. Typically only one character (or a relatively small number) would be different in the entire string. So I was hoping to make it so that I could display only that one changed line (the whole string from \n to \n).
I'm not asking for anybody to write the code, or even supply code examples, just in theory what would be the most efficient way to do this?
I've been looking into using strtok, using the '\n' symbol as a delimiter, and then using strcmp to compare the two strings, and if they were not equal then I could add that string to a "old_data" and "new_data" array. Would this be a bad way to do this?
Any advice would be a huge help.
It sounds like you're on the right track: strsep will let you chunk the string up by newline. One thing to keep in mind is that it operates on the original string in place, and doesn't allocate any new memory, which can be both a blessing and a curse.
Probably the most memory efficient way to do this would be to look into allocating arrays of pointers to hold your "old_data" and "new_data" values, and then just save the pointers that point directly into the original string rather than copying the strings themselves over. As long as your original two strings are going to stick around/not get freed from under you, this could save you a decent chunk of memory.
If you aren't ever going to be removing strings from the arrays, one naïve (but effective) way to implement your arrays is to maintain two state variables — a count, and a capacity — and double the capacity each time you're about to overflow the array. E.g.:
char **strArray = NULL;
unsigned int capacity = 10;
unsigned int count = 0;
strArray = malloc(capacity * sizeof(char *));
/* on insert */
if (count == capacity)
{
capacity *= 2;
strArray = realloc(strArray, capacity * sizeof(char *));
}
strArray[count++] = pointerIntoOriginalString;
Good luck!
strtok() is not reentrant. If you were going to do this with strtok, you'd have to iterate over the arrays one after the other. I recommend using strtok_r(), which is the reentrant implementation of strtok.
The other consideration you need to worry about is making sure that your old_data and new_data arrays are big enough, or resizable. Matt's answer shows a simple example of resizing the array, although if you're new to C you might just want to declare something like:
char *new_data[2000];
char *old_data[2000];
Especially since it sounds like you have a good idea of how many lines are in your buffer.

Why is this C code giving me a bus error?

I have, as usual, been reading quite a few posts on here. I found a particular useful posts on bus errors in general, see here. My problem is that I cannot understand why my particular code is giving me an error.
My code is an attempt to teach myself C. It's a modification of a game I made when I learned Java. The goal in my game is to take a huge 5049 x 1 text file of words. Randomly pick a word, jumble it and try to guess it. I know how to do all of that. So anyway, each line of the text file contains a word like:
5049
must
lean
better
program
now
...
So, I created an string array in C, tried to read this string array and put it into C. I didn't do anything else. Once I get the file into C, the rest should be easy. Weirder yet is that it complies. My problem comes when I run it with ./blah command.
The error I get is simple. It says:
zsh: bus error ./blah
My code is below. I suspect it might have to do with memory or overflowing the buffer, but that's completely unscientific and a gut feeling. So my question is simple, why is this C code giving me this bus error msg?
#include<stdio.h>
#include<stdlib.h>
//Preprocessed Functions
void jumblegame();
void readFile(char* [], int);
int main(int argc, char* argv[])
{
jumblegame();
}
void jumblegame()
{
//Load File
int x = 5049; //Rows
int y = 256; //Colums
char* words[x];
readFile(words,x);
//Define score variables
int totalScore = 0;
int currentScore = 0;
//Repeatedly pick a random work, randomly jumble it, and let the user guess what it is
}
void readFile(char* array[5049], int x)
{
char line[256]; //This is to to grab each string in the file and put it in a line.
FILE *file;
file = fopen("words.txt","r");
//Check to make sure file can open
if(file == NULL)
{
printf("Error: File does not open.");
exit(1);
}
//Otherwise, read file into array
else
{
while(!feof(file))//The file will loop until end of file
{
if((fgets(line,256,file))!= NULL)//If the line isn't empty
{
array[x] = fgets(line,256,file);//store string in line x of array
x++; //Increment to the next line
}
}
}
}
This line has a few problems:
array[x] = fgets(line,256,file);//store string in line x of array
You've already read the line in the condition of the immediately preceding if statement: the current line that you want to operate on is already in the buffer and now you use fgets to get the next line.
You're trying to assign to the same array slot each time: instead you'll want to keep a separate variable for the array index that increments each time through the loop.
Finally, you're trying to copy the strings using =. This will only copy references, it won't make a new copy of the string. So each element of the array will point to the same buffer: line, which will go out of scope and become invalid when your function exits. To populate your array with the strings, you need to make a copy of each one for the array: allocate space for each new string using malloc, then use strncpy to copy each line into your new string. Alternately, if you can use strdup, it will take care of allocating the space for you.
But I suspect that this is the cause of your bus error: you're passing in the array size as x, and in your loop, you're assigning to array[x]. The problem with this is that array[x] doesn't belong to the array, the array only has useable indices of 0 to (x - 1).
You are passing the value 5049 for x. The first time that the line
array[x] = ...
executes, it's accessing an array location that does not exist.
It looks like you are learning C. Great! A skill you need to master early is basic debugger use. In this case, if you compile your program with
gcc -g myprogram.c -o myprogram
and then run it with
gdb ./myprogram
(I am assuming Linux), you will get a stack dump that shows the line where bus error occurred. This should be enough to help you figure out the error yourself, which in the long run is much better than asking others.
There are many other ways a debugger is useful, but this is high on the list. It gives you a window into your running program.
You are storing the lines in the line buffer, which is defined inside the readFile function, and storing pointers to it in the arary. There are two problems with that: you are overwriting the value everytime a new string is read and the buffer is in the stack, and is invalid once the function returns.
You have at least a few problems:
array[x] = fgets(line,256,file)
This stores the address of line into each array element. line in no longer valid when readFile() returns, so you'll have an array of of useless pointers. Even if line had a longer lifetime, it wouldn't be useful to have all your array elements having the same pointer (they'd each just point to whatever happened to be written in the buffer last)
while(!feof(file))
This is an antipattern for reading a file. See http://c-faq.com/stdio/feof.html and "Using feof() incorrectly". This antipattern is likely responsible for your program looping more than you might expect when reading the file.
you allocate the array to hold 5049 pointers, but you simply read however much is in the file - there's no checking for whether or not you read the expected number or to prevent reading too many. You should think about allocating the array dynamically as you read the file or have a mechanism to ensure you read the right amount of data (not too little and not too much) and handle the error when it's not right.
I suspect the problem is with (fgets(line,256,file))!=NULL). A better way to read a file is with fread() (see http://www.cplusplus.com/reference/clibrary/cstdio/fread/). Specify the FILE* (a file stream in C), the size of the buffer, and the buffer. The routine returns the number of bytes read. If the return value is zero, then the EOF has been reached.
char buff [256];
fread (file, sizeof(char), 256, buff);

Resources