I'm trying to read text file with C. Text file is a simple language file which works in embeded device and EACH LINE of file has a ENUM on code side. Here is a simple part of my file :
SAMPLE FROM TEXT FILE :
OPERATION SUCCESS!
OPERATION FAILED!\nRETRY COUNT : %d
ENUM :
typedef enum
{
...
MESSAGE_VALID_OP,
MESSAGE_INVALID_OP_WITH_RETRY_COUNT
...
}
Load Strings :
typedef struct
{
char *str;
} Message;
int iTotalMessageCount = 1012;
void vLoadLanguageStrings()
{
FILE *xStringList;
char * tmp_line_message[256];
size_t len = 0;
ssize_t read;
int message_index = 0;
xStringList = fopen("/home/change/strings.bin", "r");
if (xStringList == NULL)
exit(EXIT_FAILURE);
mMessages = (Message *) malloc(iTotalMessageCount * sizeof(Message));
while ((read = fgets(tmp_line_message, 256, xStringList)) != -1 && message_index < iTotalMessageCount)
{
mMessages[message_index].str = (char *) malloc(strlen(tmp_line_message));
memcpy(mMessages[message_index].str, tmp_line_message, strlen(tmp_line_message) -1);
message_index++;
}
fclose(xStringList);
}
As you se in the Sample from text file i have to use \n Feed Line character on some of my lines. After all, i read file successfuly. But if i try to call my text which has feed line \n, feed line character just printed on device screen as \ & n characters.
I already try with getline(...) method. How can i handle \n character without raising the complexity and read file line by line.
As you se in the Sample from text file i have to use \n Feed Line
character on some of my lines.
No, I don't see that. Or at least, I don't see you doing that. The two-character sequence \n is significant primarily to the C compiler; it has no inherent special significance in data files, whether those files are consumed by a C program or not.
Indeed, if the system recognizes line feeds as line terminators, then by definition, it is impossible to embed a literal line feed in a physical line. What it looks like you are trying to do is to encode line feeds as the "\n" character sequence. That's fine, but it's quite a different thing from embedding a line feed character itself.
But after all, i read file successfuly.
But if i try to call my text which has feed line \n, feed line
character just printed on device screen as \ & n characters.
Of course. Those are the characters you read in (not a line feed), so if you write them back out then you reproduce them. If you are encoding line feeds via that character sequence, then your program must decode that sequence if you want it to output literal line feeds in its place.
I already try with getline(...) method. How can i handle \n character
without raising the complexity and read file line by line.
You need to process each line read to decode the \n sequences in it. I would write a function for that. Any way around, however, your program will be more complex, because the current version simply doesn't do all the things it needs to do.
Related
What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others
why does read() on a file in linux add a newline character at EOF even if the file really does not have a newline character ?
my file data is :
1hello2hello3hello4hello5hello6hello7hello8hello9hello10hello11hello12hello13hello14hello15hello
my read() call on this file should hit EOF after reading the last 'o' in "15hello". I use the below :
while( (n = read(fd2, src, read_size-1)) != 0) // read_size = 21
{
//... some code
printf("%s",src);
//... some code
}
where fd2 is the file's descriptor. At the last loop, n was 17 and i had src[16] = '\n'. So......, does the read call in linux add a newline at EOF?
does the read call in linux add a newline at EOF?
No.
Your input file likely has a terminating newline in it - most well-formatted text files do, so multiple files can be concatenated without lines running together.
You could also be running into a stray newline character that was already in your buffer, because read() does not terminate the data read with a NUL character to create an actual C-style string. And I'd guess your code doesn't either, else you would have posted it. Which means your
printf("%s",src);
is quite likely undefined behavior.
why does read() on a file in linux add a newline character at EOF even if the file really does not have a newline character ? No, read() system call doesn't add any new line at end of file.
You are experiencing this kind of behavior because may be you have created text file using vi command and note that default new line gets added if you have created file using vi.
You can validate this on your system by creating a empty text file using vi and then run wc command on that.
Also you can read file data using read() system call all at once if you know the file size(find size using stat() system call) and can avoid while loop.
This
while( (n = read(fd2, src, read_size-1)) != 0) {
/* some code */
}
Change to
struct stat var;
stat(filename, &var); /* check the retuen value of stat()..having all file info now */
off_t size = var.st_size;
Now you have size of file, create one dynamic or stack array equal to size and read the data from file.
char *ptr = malloc(size + 1);
Now read all data at once like
read(fd,ptr,size);/*now ptr having all file contents */
And at last once work done, Don't forgot to free the ptr by calling free(ptr).
This function print the length of words with '*' called histogram.How can I save results into text file? I tried but the program does not save the results.(no errors)
void histogram(FILE *myinput)
{
FILE *ptr;
printf("\nsaving results...\n");
ptr=fopen("results1.txt","wt");
int j, n = 1, i = 0;
size_t ln;
char arr[100][10];
while(n > 0)
{
n = fscanf(myinput, "%s",arr[i]);
i++;
}
n = i;
for(i = 0; i < n - 1; i++)
{
ln=strlen(arr[i]);
fprintf(ptr,"%s \t",arr[i]);
for(j=0;j<ln;j++)
fprintf(ptr, "*");
fprintf(ptr, "\n");
}
fclose(myinput);
fclose(ptr);
}
I see two ways to take care of this issue:
Open a file in the program and write to it.
If running with command line, change the output location for standard out
$> ./histogram > outfile.txt
Using the '>' will change where standard out will write to. The issue with '>' is that it will truncate a file and then write to the file. This means that if there was any data in that file before, it is gone. Only the new data written by the program will be there.
If you need to keep the data in the file, you can change the standard out to append the file with '>>' as in the following example:
$> ./histogram >> outfile.txt
Also, there does not have to be a space between '>' and the file name. I just do that for preference. It could look like this:
$> ./histogram >outfile.txt
If your writing to a file will be a one time thing, changing standard out is probably be best way to go. If you are going to do it every time, then add it to the code.
You will need to open another FILE. You can do this in the function or pass it in like you did the file being read from.
Use 'fprintf' to write to the file:
int fprintf(FILE *restrict stream, const char *restrict format, ...);
Your program may have these lines added to write to a file:
FILE *myoutput = fopen("output.txt", "w"); // or "a" if you want to append
fprintf(myoutput, "%s \t",arr[i]);
Answer Complete
There may be some other issues as well that I will discuss now.
Your histogram function does not have a return identifier. C will set it to 'int' automatically and then say that you do not have a return value for the function. From what you have provided, I would add the 'void' before the function name.
void histogram {
The size of arr's second set of arrays may be to small. One can assume that the file you are reading from does not exceed 10 characters per token, to include the null terminator [\0] at the end of the string. This would mean that there could be at most 9 characters in a string. Else you are going to overflow the location and potentially mess your data up.
Edit
The above was written before a change to the provided code that now includes a second file and fprintf statements.
I will point to the line that opens the out file:
ptr=fopen("results1.txt","wt");
I am wondering if you mean to put "w+" where the second character is a plus symbol. According to the man page there are six possibilities:
The argument mode points to a string beginning with one of the
following sequences (possibly followed by additional characters, as
described below):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is
positioned at the beginning of the file.
a Open for appending (writing at end of file). The file is
created if it does not exist. The stream is positioned at the
end of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file
position for reading is at the beginning of the file, but
output is always appended to the end of the file.
As such, it appears you are attempting to open the file for reading and writing.
A friend of mine needs to use MATLAB for one of his classes, so he called me up (a Computer Science Major) and asked if I could teach him C. I am familiar with C++, so I am also familiar with the general syntax, but had to read up on the IO library for C.
I was creating some simple IO programs to show my friend, but my third program is causing me trouble. When I run the program on my machine using Eclipse (with the CDT) Eclipse's console produces a glitchy output where instead of prompting me for the data, it gets the input and then prints it all at once with FAILURE.
The program is supposed to get a filename from user, create the file, and write to it until the user enters a blank line.
When I compile/run it on my machine via console (g++ files2.c) I am prompted for the data properly, but FAILURE shows up, and there is no output file.
I think the error lies with how I am using the char arrays, since using scanf to get the filename will create a functional file (probably since it ignores whitespace), but not enter the while loop.
#include <stdio.h>
#define name_length 20
#define line_size 80
int main() {
FILE * write_file; // pointer to file you will write to
char filename[name_length]; // variable to hold the name of file
char string_buffer[line_size]; // buffer to hold your text
printf("Filename: "); // prompt for filename
fgets(filename, name_length, stdin); // get filename from user
if (filename[name_length-1] == '\n') // if last char in stream is newline,
{filename[name_length-1] = '\0';} // remove it
write_file = fopen(filename, "w"); // create/overwrite file user named
if (!write_file) {printf("FAILURE");} // failed to create FILE *
// inform user how to exit
printf("To exit, enter a blank line (no spaces)\n");
// while getting input, print to file
while (fgets(string_buffer, line_size, stdin) != NULL) {
fputs(string_buffer, write_file);
if (string_buffer[0] == '\n') {break;}
}
fclose(write_file);
return 0;
}
How should I go about fixing the program? I have found next to nothing on user-terminated input being written to file.
Now if you will excuse me, I have a couple of files to delete off of my University's UNIX server, and I cannot specify them by name since they were created with convoluted filenames...
EDIT------
Like I said, I was able to use
scanf("%s", filename);
to get a working filename (without the newline char). But regardless of if I use scanf or fgets for my while loop, if I use them in conjunction with scanf for the filename, I am not able to write anything to file, as it does not enter the while loop.
How should I restructure my writing to file and my while loop?
Your check for the newline is wrong; you're looking at the last character in filename but it may be before that if the user enters a filename that's shorter than the maximum. You're then trying to open a file that has a newline in it's name.
These lines seem to be incorrect:
if (filename[name_length-1] == '\n') // if last char in stream is newline,
{filename[name_length-1] = '\0';} // remove it
You verify the name_length - 1 character,, which is 19 in your case without any regard of the introduced filename's length. So if your file name's length is less then 18 you won't replace the '\n' character at the end of your string. Obviously the file name can't contain '\n' character.
You need to get the size of you file name first with strlen() as an example.
if (filename[strlen(filename) - 1] == '\n')
{
filename[strlen(filename) - 1] = '\0';
}
(Don't forget to include the string.h header)
I hope I was able to help with my weak english.
Is there a function in C to read a file with a custom delimiter like '\n'?
For example: I have:
I did write \n to exemplify in the file is the LF (Line feed, '\n', 0x0A)
this is the firstline\n this is the second line\n
I'd like the file to read by part and split it in two strings:
this is the firstline\n
this is the second line\n
I know fgets I can read up to a num of characters but not by any pattern. In C++ I know there is a method but in C how to do it?
I'll show another example:
I'm reading a file ABC.txt
abc\n
def\n
ghi\n
With the following code:
FILE* fp = fopen("ABC.txt", "rt");
const int lineSz = 300;
char line[lineSz];
char* res = fgets(line, lineSz, fp); // the res is filled with abc\ndef\nghi\n
fclose(fp);
I excpected fgets had to stop on abc\n
But the res is filled with: abc\ndef\nghi\n
SOLVED: The problem is that I was using Notepad++ in WindowsXP (the one I used
I don't know it happens on other windows) saved the file with different
encoding.
The newline on fgets needs the CRLF not just the CR when you type
enter in notepad++
I opened the windows notepad And it worked the fgets reads the string
up to abc\n on the second example.
fgets() will read one line at a time, and does include the newline character in the line output buffer. Here's an example of the common usage.
#include <stdio.h>
#include <string.h>
int main()
{
char buf[1024];
while ( fgets(buf,1024,stdin) )
printf("read a line %lu characters long:\n %s", strlen(buf), buf);
return 0;
}
But since you asked about using a "custom" delimiter... getdelim() allows you to specify a different end-of-line delimiter.