So yeah, saw many similar questions to this one, but thought to try solving it my way. Getting huge amount of text blocks after running it (it compiles fine).
Im trying to get an unknown size of string from a file. Thought about allocating pts at size of 2 (1 char and null terminator) and then use malloc to increase the size of the char array for every char that exceeds the size of the array.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char *pts = NULL;
int temp = 0;
pts = malloc(2 * sizeof(char));
FILE *fp = fopen("txtfile", "r");
while (fgetc(fp) != EOF) {
if (strlen(pts) == temp) {
pts = realloc(pts, sizeof(char));
}
pts[temp] = fgetc(fp);
temp++;
}
printf("the full string is a s follows : %s\n", pts);
free(pts);
fclose(fp);
return 0;
}
You probably want something like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define CHUNK_SIZE 1000 // initial buffer size
int main()
{
int ch; // you need int, not char for EOF
int size = CHUNK_SIZE;
char *pts = malloc(CHUNK_SIZE);
FILE* fp = fopen("txtfile", "r");
int i = 0;
while ((ch = fgetc(fp)) != EOF) // read one char until EOF
{
pts[i++] = ch; // add char into buffer
if (i == size + CHUNK_SIZE) // if buffer full ...
{
size += CHUNK_SIZE; // increase buffer size
pts = realloc(pts, size); // reallocate new size
}
}
pts[i] = 0; // add NUL terminator
printf("the full string is a s follows : %s\n", pts);
free(pts);
fclose(fp);
return 0;
}
Disclaimers:
this is untested code, it may not work, but it shows the idea
there is absolutely no error checking for brevity, you should add this.
there is room for other improvements, it can probably be done even more elegantly
Leaving aside for now the question of if you should do this at all:
You're pretty close on this solution but there are a few mistakes
while (fgetc(fp) != EOF) {
This line is going to read one char from the file and then discard it after comparing it against EOF. You'll need to save that byte to add to your buffer. A type of syntax like while ((tmp=fgetc(fp)) != EOF) should work.
pts = realloc(pts, sizeof(char));
Check the documentation for realloc, you'll need to pass in the new size in the second parameter.
pts = malloc(2 * sizeof(char));
You'll need to zero this memory after acquiring it. You probably also want to zero any memory given to you by realloc, or you may lose the null off the end of your string and strlen will be incorrect.
But as I alluded to earlier, using realloc in a loop like this when you've got a fair idea of the size of the buffer already is generally going to be non-idiomatic C design. Get the size of the file ahead of time and allocate enough space for all the data in your buffer. You can still realloc if you go over the size of the buffer, but do so using chunks of memory instead of one byte at a time.
Probably the most efficient way is (as mentioned in the comment by Fiddling Bits) is to read the whole file in one go (after first getting the file's size):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/stat.h>
int main()
{
size_t nchars = 0; // Declare here and set to zero...
// ... so we can optionally try using the "stat" function, if the O/S supports it...
struct stat st;
if (stat("txtfile", &st) == 0) nchars = st.st_size;
FILE* fp = fopen("txtfile", "rb"); // Make sure we open in BINARY mode!
if (nchars == 0) // This code will be used if the "stat" function is unavailable or failed ...
{
fseek(fp, 0, SEEK_END); // Go to end of file (NOTE: SEEK_END may not be implemented - but PROBABLY is!)
// while (fgetc(fp) != EOF) {} // If your system doesn't implement SEEK_END, you can do this instead:
nchars = (size_t)(ftell(fp)); // Add one for NUL terminator
}
char* pts = calloc(nchars + 1, sizeof(char));
if (pts != NULL)
{
fseek(fp, 0, SEEK_SET); // Return to start of file...
fread(pts, sizeof(char), nchars, fp); // ... and read one great big chunk!
printf("the full string is a s follows : %s\n", pts);
free(pts);
}
else
{
printf("the file is too big for me to handle (%zu bytes)!", nchars);
}
fclose(fp);
return 0;
}
On the issue of the use of SEEK_END, see this cppreference page, where it states:
Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability).
On whether or not you will be able to use the stat function, see this Wikipedia page. (But it is now available in MSVC on Windows!)
Related
I am trying to process a 250MB file using a script in C.
The file is basically a dataset and I want to read just some of the columns and (more importantly) break one of them (which is originally a string) into a sequence of characters.
However, even though I have plenty of RAM available, the code is killed by konsole (using KDE Neon) everytime I run it.
The source is available below:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
FILE *arquivo;
char *line = NULL;
size_t len = 0;
int i = 0;
int j;
int k;
char *vetor[500];
int acertos[45];
FILE *licmat = fopen("licmat.csv", "w");
//creating the header
fprintf(licmat,"CO_CATEGAD,CO_UF_CURSO,ACERTO09,ACERTO10,ACERTO11,ACERTO12,ACERTO13,ACERTO14,ACERTO15,ACERTO16,ACERTO17,ACERTO18,ACERTO19,ACERTO20,ACERTO21,ACERTO22,ACERTO23,ACERTO24,ACERTO25,ACERTO26,ACERTO27,ACERTO28,ACERTO29,ACERTO30,ACERTO31,ACERTO32,ACERTO33,ACERTO34,ACERTO35\n");
if ((arquivo = fopen("MICRODADOS_ENADE_2017.csv", "r")) == NULL) {
printf ("\nError");
exit(0);
}
//reading one line at a time
while (getline(&line, &len, arquivo)) {
char *ptr = strsep(&line,";");
j=0;
//breaking the line into a vector based on ;
while(ptr != NULL)
{
vetor[j]=ptr;
j=j+1;
ptr = strsep(&line,";");
}
//filtering based on content
if (strcmp(vetor[4],"702")==0 && strcmp(vetor[33],"555")==0) {
//copying some info
fprintf(licmat,"%s,%s,",vetor[2],vetor[8]);
//breaking the string (32) into isolated characters
for (k=0;k<27;k=k+1) {
fprintf(licmat,"%c", vetor[32][k]);
if (k<26) {
fprintf(licmat,",");
}
}
fprintf(licmat,"\n");
}
i=i+1;
}
free(line);
fclose(arquivo);
fclose(licmat);
}
The output is perfect up to the point when the script is killed. The output file is just 640KB long and has about 10000 lines only.
What could be the issue?
It looks to me like you're mishandling the memory buffer managed by getline() - which allocates/reallocates as needed - by the use of strsep(), which seems to manipulate that same pointer value.
Once line has been updated to reflect some other element on the line, it's no longer pointing to the start of allocated memory, and then boom the next time getline() needs to do anything with it.
Use a different variable to pass to strsep():
while (getline(&line, &len, arquivo) > 0) { // use ">=" if you want blank lines
char *parseline = line;
char *ptr = strsep(&parseline,";");
// do the same thing later
The key thing here: you're not allowed to muck with the value of line other than to free() it at the end (which you do), and you can't let any other routine do it either.
Edit: updated to reflect getline() returning <0 on error (h/t to #user3121023)
I'm trying to read and base64-decode a file. For some reason OpenSSL reads only a part of it, then on all subsequent calls it simply returns 0. Here is the code I'm using:
#include <stdio.h>
#include <stdlib.h>
#include <openssl/bio.h>
#include <openssl/buffer.h>
#include <openssl/evp.h>
size_t b64_get_datalen(const char* b64, size_t b64len) {
size_t actual_len = 0;
int padding = 0;
for (int i = 0; i < b64len; i++) {
if (b64[i] != '\n') actual_len++;
}
int last = b64len-1;
if (b64[last] == '\n') {
last--;
}
if (b64[last] == '=') {
padding = b64[last-1] == '=' ? 2 : 1;
}
return (actual_len*3)/4 - padding;
}
int main(int argc, char** argv) {
FILE* fp = fopen("7.txt", "r");
if (fp == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
fseek(fp, 0, SEEK_END);
size_t b64_len = ftell(fp);
rewind(fp);
char b64[b64_len];
fread(b64, b64_len, 1, fp);
fclose(fp);
size_t data_len = b64_get_datalen(b64, b64_len);
char data[data_len];
BIO *bio, *base64;
base64 = BIO_new(BIO_f_base64());
bio = BIO_new_mem_buf(b64, -1);
bio = BIO_push(base64, bio);
int read = BIO_read(bio, data, b64_len);
printf("expected %d bytes, got %d bytes\n", (int)data_len, (int)read);
BIO_free_all(bio);
}
The program outputs: expected 2880 bytes, got 2244 bytes
The file I'm trying to read is from cryptopals challenges (http://cryptopals.com/static/challenge-data/7.txt). I solved that challenge in Python, there were no issues with reading and base64-decoding the file.
What can be a reason of such behavior? And a more general question, how one should debug such issues with OpenSSL?
The problem is in this line:
bio = BIO_new_mem_buf(b64, -1);
From the man page for BIO_new_mem_buf:
https://www.openssl.org/docs/man1.1.0/crypto/BIO_s_mem.html
BIO_new_mem_buf() creates a memory BIO using len bytes of data at buf,
if len is -1 then the buf is assumed to be nul terminated and its
length is determined by strlen.
But you have passed -1 and your buffer is not nul terminated! Therefore the strlen will overrun the buffer and keep going until it does find a nul terminator. This will likely include lots of values which are not valid base64, and hence the base64 decode fails.
It appears that the behaviour of the BIO_f_base64() BIO is to read and return what it has successfully read, but give up trying to read any more as soon as it encounters an error - as opposed to failing the entire read. It's not clear to me whether that is intended behaviour or a bug.
And a more general question, how one should debug such issues with OpenSSL?
In general you should inspect the OpenSSL error stack which usually gives good information about the source of a problem. For example by using ERR_print_errors_fp():
https://www.openssl.org/docs/man1.1.0/crypto/ERR_print_errors_fp.html
Not that it would have helped in this particular issue because it seems no error is placed on the error stack for this problem.
first I'm looking for optimization, fast time execution
I would like to read data from input in C so here is my code (Linux)
int main(void) {
char command_str[MAX_COMMAND_SIZE];
while (!feof(stdin)) {
fgets(command_str, MAX_COMMAND_SIZE, stdin);
// Parse data
}
return EXIT_SUCCESS;
}
According to this post Read a line of input faster than fgets? read() function seems to be the solution.
The data input is like:
100 C
1884231 B
8978456 Z
...
From a file, so I execute my program like ./myapp < mytext.txt
It is not possible to know how many entries there is, it's could be 10, 10000 or even more.
From this post
Drop all the casts on malloc and realloc; they aren't necessary and clutter up the code
So if I use a dynamic array my app will be slower I think.
The idea is:
Read the whole input in one go into a buffer.
Process the lines from that buffer.
That's the fastest possible solution.
If someone would help me. Thanks in advance.
while (!feof(f)) is always wrong. Use this instead:
#include <stdio.h>
int main(void) {
char command_str[MAX_COMMAND_SIZE];
while (fgets(command_str, MAX_COMMAND_SIZE, stdin)) {
// Parse data
}
return EXIT_SUCCESS;
}
Reading file contents faster than fgets() is feasible, but seems beyond your skill level. Learn the simple stuff first. There is an awful lot that can be achieved with standard line by line readers... Very few use cases warrant the use of more advanced approaches.
If you want to read the whole input and parse it as a single string, here is a generic solution that should work for all (finite) input types:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
size_t pos = 0, size = 1025, nread;
char *buf0 = malloc(size);
char *buf = buf0;
for (;;) {
if (buf == NULL) {
fprintf(stderr, "not enough memory for %zu bytes\n", size);
free(buf0);
exit(1);
}
nread = fread(buf + pos, 1, size - pos - 1, stdin);
if (nread == 0)
break;
pos += nread;
/* Grow the buffer size exponentially (Fibonacci ratio) */
if (size - pos < size / 2)
size += size / 2 + size / 8;
buf = realloc(buf0 = buf, size);
}
buf[pos] = '\0';
// parse pos bytes of data in buf as a string
printf("read %zu bytes\n", strlen(buf));
free(buf);
return EXIT_SUCCESS;
}
Maybe you could use fseek (stdin, 0, SEEK_END) to go to the end of the standard input stream, then use ftell (stdin) to get its size in bytes, then allocate memory to save all that in a buffer and then process it's contents.
Am trying to open a file(Myfile.txt) and concatenate each line to a single buffer, but am getting unexpected output. The problem is,my buffer is not getting updated with the last concatenated lines. Any thing missing in my code?
Myfile.txt (The file to open and read)
Good morning line-001:
Good morning line-002:
Good morning line-003:
Good morning line-004:
Good morning line-005:
.
.
.
Mycode.c
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[])
{
/* Define a temporary variable */
char Mybuff[100]; // (i dont want to fix this size, any option?)
char *line = NULL;
size_t len=0;
FILE *fp;
fp =fopen("Myfile.txt","r");
if(fp==NULL)
{
printf("the file couldn't exist\n");
return;
}
while (getline(&line, &len, fp) != -1 )
{
//Any function to concatinate the strings, here the "line"
strcat(Mybuff,line);
}
fclose(fp);
printf("Mybuff is: [%s]\n", Mybuff);
return 0;
}
Am expecting my output to be:
Mybuff is: [Good morning line-001:Good morning line-002:Good morning line-003:Good morning line-004:Good morning line-005:]
But, am getting segmentation fault(run time error) and a garbage value. Any think to do? thanks.
Specify MyBuff as a pointer, and use dynamic memory allocation.
#include <stdlib.h> /* for dynamic memory allocation functions */
char *MyBuff = calloc(1,1); /* allocate one character, initialised to zero */
size_t length = 1;
while (getline(&line, &len, fp) != -1 )
{
size_t newlength = length + strlen(line)
char *temp = realloc(MyBuff, newlength);
if (temp == NULL)
{
/* Allocation failed. Have a tantrum or take recovery action */
}
else
{
MyBuff = temp;
length = newlength;
strcat(MyBuff, temp);
}
}
/* Do whatever is needed with MyBuff */
free(MyBuff);
/* Also, don't forget to release memory allocated by getline() */
The above will leave newlines in MyBuff for each line read by getline(). I'll leave removing those as an exercise.
Note: getline() is linux, not standard C. A function like fgets() is available in standard C for reading lines from a file, albeit it doesn't allocate memory like getline() does.
I have a function loadsets() (short for load settings) which is supposed to load settings from a text file named Progsets.txt. loadsets() returns 0 on success, and -1 when a fatal error is detected. However, the part of the code which actually reads from Progsets.txt, (the three fgets()), seem to all fail and return the null pointer, hence not loading anything at all but a bunch of nulls. Is there something wrong with my code? fp is a valid pointer when I ran the code, and I was able to open it for reading. So what's wrong?
This code is for loading the default text color of my very basic text editor program using cmd.
headers:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <Windows.h>
#define ARR_SIZE 100
struct FINSETS
{
char color[ARR_SIZE + 1];
char title[ARR_SIZE + 1];
char maxchars[ARR_SIZE + 1];
} SETTINGS;
loadsets():
int loadsets(int* pMAXCHARS) // load settings from a text file
{
FILE *fp = fopen("C:\\Typify\\Settings (do not modify)\\Progsets.txt", "r");
char *color = (char*) malloc(sizeof(char*) * ARR_SIZE);
char *title = (char*) malloc(sizeof(char*) * ARR_SIZE);
char *maxchars = (char*) malloc(sizeof(char*) * ARR_SIZE);
char com1[ARR_SIZE + 1] = "color ";
char com2[ARR_SIZE + 1] = "title ";
int i = 0;
int j = 0;
int k = 0;
int found = 0;
while (k < ARR_SIZE + 1) // fill strings with '\0'
{
color[k] = title[k] = maxchars[k] = '\0';
SETTINGS.color[k] = SETTINGS.maxchars[k] = SETTINGS.title[k] = '\0';
k++;
}
if (!fp) // check for reading errors
{
fprintf(stderr, "Error: Unable to load settings. Make sure that Progsets.txt exists and has not been modified.\a\n\n");
return -1; // fatal error
}
if (!size(fp)) // see if Progsets.txt is not a zero-byte file (it shouldn't be)
{
fprintf(stderr, "Error: Progsets.txt has been modified. Please copy the contents of Defsets.txt to Progsets.txt to manually reset to default settings.\a\n\n");
free(color);
free(title);
free(maxchars);
return -1; // fatal error
}
// PROBLEMATIC CODE:
fgets(color, ARR_SIZE, fp); // RETURNS NULL (INSTEAD OF READING FROM THE FILE)
fgets(title, ARR_SIZE, fp); // RETURNS NULL (INSTEAD OF READING FROM THE FILE)
fgets(maxchars, ARR_SIZE, fp); // RETURNS NULL (INSTEAD OF READING FROM THE FILE)
// END OF PROBLEMATIC CODE:
system(strcat(com1, SETTINGS.color)); // set color of cmd
system(strcat(com2, SETTINGS.title)); // set title of cmd
*pMAXCHARS = atoi(SETTINGS.maxchars);
// cleanup
fclose(fp);
free(color);
free(title);
free(maxchars);
return 0; // success
}
Progsets.txt:
COLOR=$0a;
TITLE=$Typify!;
MAXCHARS=$10000;
EDIT: Here is the definition of the size() function. Since I'm just working with ASCII text files, I assume that every character is one byte and the file size in bytes can be worked out by counting the number of characters. Anything suspicious?
size():
int size(FILE* fp)
{
int size = 0;
int c;
while ((c = fgetc(fp)) != EOF)
{
size++;
}
return size;
}
The problem lies in your use of the size() function. It repeatedly calls fgetc() on the file handle until it gets to the end of the file, incrementing a value to track the number of bytes in the file.
That's not a bad approach (though I'm sure there are better ones that don't involve inefficient character-based I/O) but it does have one fatal flaw that you seem to have overlooked.
After you've called it, you've read the file all the way to the end so that any further reads, such as:
fgets(color, ARR_SIZE, fp);
will simply fail since you're already at the end of the file. You may want to consider something like rewind() before returning from size() - that will put the file pointer back to the start of the file so that you can read it again.