Read File that you don't know the lenght of - c

I have coded a program that reads an integers & tells if it's greater or not. I have done it with an array & initialized it with some numbers(As a test if the program works). Now I have to read it from a File with a lot of integers, but I don't know the length of the File. How can I determinate the size of the Array if I don't know the length?

Either you do it on the fly (read and process one line at a time -> one number at a time), or you need to allocate memory dynamically as you read the file as described here: Dynamically expanding array using realloc

Do you need to use an array? I think using list is your best option here. Than you can just use something like myList.add(newElement) or myList.append(newElement) depending on the language you are using.

Related

how to scan characters in a file until newline and store it in a dynamic array?

For example, the program grabs the line: Hello World! and assigns the string to a dynamic array.
The length of each line is unknown and I want compatibility for all sizes.
getline() is the obvious answer here like Barmar suggested, but fgets() is also an option (see https://en.wikibooks.org/wiki/C_Programming/stdio.h/fgets).
But from what I understand, you don't know its size yet you want to put it into a perfectly sized dynamic array right off the bat? That's gonna take some crafty thinking and is difficult with a compiled language. The only way I can think of off the top of my head is quite long in time of execution; open the file twice, once to get each line size, and next time to read in each line of the file with the given file size after malloc'ing the correct number of bytes, and storing pointers to these dynamic arrays in a list. This is going to take a lot longer to execute, so if you're not limited on CPU power, this may be an option.
Normally, you'd just know what maximum size to expect and have the array defined at that maximum size. In the grand scheme of things, an extra 50 bytes isn't gonna hurt anything... which hurts me as an embedded guy to say that, but computers have large enough memory these days...

How to represent a random-access text file in memory (C)

I'm working on a project in which I need to read text (source) file in memory and be able to perform random access into (say for instance, retrieve the address corresponding to line 3, column 15).
I would like to know if there is an established way to do this, or data structures that are particularly good for the job. I need to be able to perform a (probably amortized) constant time access. I'm working in C, but am willing to implement higher level data structures if it is worth it.
My first idea was to go with a linked list of large buffer that will hold the character data of the file. I would also make an array, whose index are line numbers and content are addresses corresponding to the begin of the line. This array would be reallocated on need.
Subsidiary question: does anyone have an idea the average size of a source file ? I was surprised not to find this on google.
To clarify:
The file I'm concerned about are source files, so their size should be manageable, they should not be modified and the lines have variables length (tough hopefully capped at some maximum).
The problem I'm working on needs mostly a read-only file representation, but I'm very interested in digging around the problem.
Conlusion:
There is a very interesting discussion of the data structures used to maintain a file (with read/insert/delete support) in the paper Data Structures for Text Sequences.
If you just need read-only, just get the file size, read it in memory with fread(), then you have to maintain a dynamic array which maps the line numbers (index) to pointer to the first character in the line. Someone below suggested to build this array lazily, which seems a good idea in many cases.
I'm not quite sure what the question is here, but there seems to be a bit of both "how do I keep the file in memory" and "how do I index it". Since you need random access to the file's contents, you're probably well advised to memory-map the file, unless you're tight on address space.
I don't think you'll be able to avoid a linear pass through the file once to find the line endings. As you said, you can create an index of the pointers to the beginning of each line. If you're not sure how much of the index you'll need, create it lazily (on demand). You can also store this index to disk (as offsets, not pointers) if you will need it on subsequent runs. You can estimate the size of the index based on the file size and the expected line length.
1) Read (or mmap) the entire file into one chunk of memory.
2) In a second pass create an array of pointers or offsets pointing to the beginnings of the lines (hint: one after the '\n' ) into that memory.
Now you can index the array to access a specific line.
It's impossible to make insertion, deletion, and reading at a particular line/column/character address all simultaneously O(1). The best you can get is simultaneous O(log n) for all of these operations, and it can be achieved using various sorts of balanced binary trees for storing the file in memory.
Of course, unless your files will be larger than 100 kB or so, you're probably best off not bothering with anything fancy and just using a flat linear buffer...
solution: If lines are about same size, make all lines equally long by appending needed number of metacharacters to each line. Then you can simply calculate the fseek() position from line number, making your search O(1).
If lines are sorted, then you can perform binary search, making your search O(log(nõLines)).
If neither, you can store the indexes of line begginings. But then, you have a problem if you modify file a lot, because if you insert let's say X characters somewhere, you have to calculate which line it is, and then add this X to the all next lines. Similar with with deletion. Yu essentially get O(nõLines). And code gets ugly.
If you want to store whole file in memory, just create aray of lines *char[]. You then get line by first dereference and character by second dereference.
As an alternate suggestion (although I do not fully understand the question), you might want to consider a struct based, dynamically linked list of dynamic strings. If you want to be astutely clever, you could build a dynamically linked list of chars which you then export as strings.
You'd have to use OO type design for this to be manageable.
So structs you'd likely want to build are:
DynamicArray;
DynamicListOfArrays;
CharList;
So it goes:
CharList(Gets Chars/Size) -> (SetSize)DynamicArray -> (AddArray)DynamicListOfArrays
If you build suitable helper functions for malloc and delete, and make it so the structs can either delete themselves automatically or manually. Using the above combinations won't get you O(1) read in (which isn't possible without the files have a static format), but it will get you good time.
If you know the file static length (at least individual line wise), IE no bigger than 256 chars per line, then all you need is the DynamicListOfArries - write directly to the array (preset to 256), create a new one, repeat. Downside is it wastes memory.
Note: You'd have to convert the DynamicListOfArrays into a 'static' ArrayOfArrays before you could get direct point-to-point access.
If you need source code to give you an idea (although mine is built towards C++ it wouldn't take long to rewrite), leave a comment about it. As with any other code I offer on stackoverflow, it can be used for any purpose, even commercially.
Average size of a source file? Does such a thing exist? A source file could go from 0 bytes to thousands of bytes, like any text file, it depends on the number of caracters it contains

How to write dynamically allocated structure to file

I have a complex structure in a C program which has many members that are allocated memory, dynamically. How do I write this structure to a text / binary file? How will I be able to recreate the entire structure from the data read from the file.
struct parseinfo{
int varcount;
int termcount;
char **variables;
char **terminals;
char ***actions;
};
The members variables, terminals and actions are all dynamically allocated and I need to write this structure to a file so that I could reconstruct the structure later.
You must write to this structure Serialize and Deserialize functions, you can't write this structure to file as raw data because you have allocated pointers on a heap and this not make sense to save this values on file.
Short: Not in an automated way.
In essence, it depends highly on the semantics of your structure.
If the data fields inside specify the length of certain arraiys inside it,
you can reconstruct the struct.
But you have to be careful to "beleive" the values. (possible cause of stack overflow (nice word)) if you believe that there are 2^34 entries in an array.
But otherwise it is just going tru every member (pain)
You could search a little about ASN.1 and TLV-structs.
Here are some suggestions for binary serialization:
You can serialize a string either a la C (write the terminating '\0') or
by writing the size (say, an int) followed by the contents.
You can serialize an array of strings by writing the length of the array
followed by the strings.
If it's possible that you deserialize the file on a different
architecture (different int size, different endianness...), then take
care to carefully specify the binary format of the file. In such a case
you may want to take a look at the XDR serialiaztion standard.
For ASCII serialization, I like the JSON format.
The way I would suggest to do it is to think about what does it take to create the items in the first place?
When it comes to the char **variables; char **terminals; char **actions you're obviously going to have to figure out how to declare those and read them in but I don't think you can inject a /0 into a file (EOF character??)
How would you like to see it written to the file? Can you provide a sample output how you think it should be stored? Perhaps one item per line in a file? Does it need to be a binary file?

Simple C question

HI all. I am trying to make little program that reads data from file which has name of user and some data for that user. I am new on C , and how can i calculate this data for its user?line by line reading and adding each char in array? And how can I read line? is there any function?
And how can I use this each line users like object?I will make calculation for specific user.
You can use fgets to read a line at a time from the file.
Then you can parse the fields out and add them to an array or some other data structure. Just keep in mind if you use an array then you need to know ahead of time how many entries the file may contain - for example, no more than 1000. Otherwise you will need to use a data structure that can dynamically allocate memory such as a linked list, vector, etc.
Try this site, i often use it for reference.
http://www.cprogramming.com/tutorial/cfileio.html
Play around with file i/o and get use to the functions and then you will be able to make what you want.
Everything you need is in stdio.h. (Link is to a C++ website but the entire C i/o system is usable in C++; hence the documentation)

Is it possible to read in a string of unknown size in C, without having to put it in a pre-allocated fixed length buffer?

Is it possible to read in a string in C, without allocating an array of fixed size ahead of time?
Everytime I declare a char array of some fixed size, I feel like I'm doing it wrong. I'm always taking a guess at what I think would be the maximum for my usecase, but this isn't always easy.
Also, I don't like the idea of having a smaller string sitting in a larger container. It doesn't feel right.
Am I missing something? Is there some other way I should be doing this?
At the point that you read the data, your buffer is going to have a fixed size -- that's unavoidable.
What you can do, however, is read the data using fgets, and check whether the last character is a '\n', (or you've reached the end of file) and if not, realloc your buffer, and read more.
I rarely find that necessary, but do usually allocate a single fixed buffer for the reading, read data into it, and then dynamically allocate space for a copy of it, allocating only as much space as it actually occupies, not the whole size of the buffer I originally used.
When you say "ahead of time", do you mean at runtime, or at compile time?
At compile time you do this:
char str[1000];
at runtime you do this:
char *str = new char[size];
They only way to get exactly the right size is to know how many characters you are going to read in. If you're reading from a file, you can seek to the nearest newline (or some other condition) and then you know exactly how big the array needs to be. ie:
int numChars = computeNeededSpace(someFileHandle);
char *readBuffer = new char[numChars];
fread(someFileHandle, readBuffer, numChars); //probly wrong parameter order
There is no other way to do this. Put yourself in the programs perspective, how is it supposed to know how many keys the user is going to press? The best thing you can do is limit the user, or whatever input.
there are some more complex things, like creating a linked list of buffers, and allocating chunks of buffers then linking them after. But I think that's not the answer you wanted here.
EDIT: Most languages have string/inputbuffer classes that hide this from you.
You must allocate fixed buffer. If it becomes small, than realloc() it to bigger size and continue.
There's no way of determining the string length until you've read it in, so reading it into a fixed-size buffer is pretty much your only choice.
I suppose you have the alternative to read the string in small chunks, but depending on your application that might not give you enough information at a time to work with.
Perhaps the easiest way to handle this dilemma is by defining maximum lengths for certain string input (#define-ing a constant for this value helps). Use a buffer of this pre-determined size whenever you are reading in a string, but make sure to use the strncpy() form of the string commands so you can specify a maximum number of characters to read. Some commonly-used types of strings (for example, filenames or paths) may have system-defined maximum lengths.
There's nothing inherently 'wrong' about declaring a fixed-size array as long as you use it properly (do proper bounds checking and handle the case where input will overflow the array). It may result in unused memory being allocated, but unfortunately the C language doesn't give us much to work with when it comes to strings.
There is the concept of String Ropes that lets you create trees of fixed size buffers. You still have to have fixed size buffers there is no getting around that really, but this is a pretty neat way of building up strings dynamically.
You can use Chuck Falconer's public domain ggets function to do the buffer management and reallocation for you: http://cbfalconer.home.att.net/download/index.htm
Edit:
Chuck Falconer's website is no longer available. archive.org still has a copy, and I'm hosting a copy too.

Resources