Text mode adapter for binary (or text) mode file - file

I have a method (parse) that processes data from an input file, which may have been opened in binary mode. However in some subclasses it would be easier to process the data if the file were opened in text mode. So my question is if theres an easy way to wrap any file to get something that acts as a text mode file.
Note that the solution in "Convert binary input stream to text mode" does not really make it as it only produces an iterator (and not a file-like object). Also note that opening the file in text mode in the first place is not an option.
If it simplifies the solution one can assume that the input file is indeed opened in binary mode.

It appears as the buffer argument in io.TextIOWrapper is actually an io.BufferedReader object (ie file opened in binary mode). This is however not obvious from reading the documentation.
This seem to work if the file is known to be opened in binary mode (instance of io.RawIOBase or io.BufferedIOBase):
srctxt = io.TextIOWrapper(src)
It doesn't work however if src is already opened in text mode, but it could be tested by checking if it is is an io.TextIOBase:
if isinstance(src, io.TextIOBase):
srctxt = src
else:
srctxt = io.TextIOWrapper(src)

Related

Trying to store a string in a file using binary mode in C

I am trying to store a simple string in a file opened in wb mode as shown in code below. Now from what i understand, the content of string should be stored as 0s and 1s as it was opened in binary mode but when i opened the file manually in Notepad, I was able to see the exact string stored in the file and not some binary data. Just for curiosity I tried to read this binary file in text mode. Again the string was perfectly shown on output without any random characters. The below code explains my point :
#include<stdio.h>
int main()
{
char str[]="Testing 123";
FILE *fp;
fp = fopen("test.bin","wb");
fwrite(str,sizeof(str),1,fp);
fclose(fp);
return 0;
}
So i have three doubts out of this:
Why on seeing the file in Notepad, it is not showing some random characters but the exact string ?
Can we read a file in text mode which was written in binary mode and vice versa ?
Not exactly related to above question but can we use functions like fgetc, fputc, fgets, fputs, fprintf, fscanf to work with binary mode and functions like fread, fwrite to work with text mode ?
Edit: Forgot to mention that i am working on Windows platform.
In binary mode, file API does not modify the data but just passes it along directly.
In text mode, some systems transform the data. For example Windows changes \n to \r\n in text mode.
On Linux there is no difference between binary vs text modes.
Notepad will print whatever is in the file so even if you write 100% binary data there is a chance that you'll see some readable characters.

How to interact with an external text editor in C

I am developing a command line application in C (linux environment) to edit a particular file format. This file format is a plain XML file, which is compressed, then encrypted, then cryptographically signed.
I'd like to offer an option to the user to edit this kind of file in an easy way, without the hassle of manualy extracting the file, editing it, and then compressing, encrypting and signing it.
Ideally, when called, my application should do the following:
Open the encrypted/compressed file and extract it to a temporary location (like /tmp)
Call an external text editor like nano or sublime-text or gedit depending on which is installed and maybe the user preferences. Wait until the user have edited the file and closed the text editor.
Read the modified temporary file and encrypt/compress it, replacing the old encrypted/compressed file
How can I achieve point no. 2?
I thought about calling nano with system() and waiting for it to return, or placing an inotify() on the temp file to know when it is modified by the graphical text editor.
Which solution is better?
How can i call the default text editor of the user?
Anything that can be done in a better way?
First, consider not writing an actual application or wrapper yourself, which calls another editor, but rather writing some kind of plugin for some existing editor which is flexible enough to support additional formats and passing its input through decompression.
That's not the only solution, of course, but it might be easier for you.
With your particular approach, you could:
Use the EDITOR and/or VISUAL command-line variables (as also pointed out by #KamilCuk) to determine which editor to use.
Run the editor as a child process so that you know when it ends execution, rather than having to otherwise communicate with it. Being notified of changes to the file, or even to its opening or closing, is not good enough, since the editor may make changes multiple files, and some editors don't even keep the file open while you work on it in them.
Remember to handle the cases of the editor failing to come up; or hanging; or you getting some notification to stop waiting for the editor; etc.
Call an external text editor like nano or sublime-text or gedit depending on which is installed and maybe the user preferences. Wait until the user have edited the file and closed the text editor.
Interesting question. One way to open the xml file with the user's default editor is using the xdg-open, but it doesn't give the pid of the application, in which user will edit the file.
You can use xdg-mime query default application/xml to find out the .desktop file of the default editor, but then you have to parse this file to figure out the executable path of the program - this is exactly how xdg-open actually works, in the search_desktop_file() function the line starting with Exec= entry is simply extracted from the *.desktop to call the editor executable and pass the target file as argument... What I am trying to say, is, after you find the editor executable, you can start it, and wait until it's closed, and then check if the file content has been changed. Well, this looks like a lot of unnecessary work...
Instead, you can try a fixed well-known editor, such as gedit, to achieve the desired workflow. You can also provide user a way (i.e. a prompt or config file) to set a default xml editor, i.e. /usr/bin/sublime_text, which then can be used in your programm on next run.
However, the key is here to open an editor that blocks the calling process, until user closes the editor. After the editor is closed, you can simply check if the file has been changed and if so, perform further operations.
To find out, if the file contents have been modified, you can use the stat system call to get the inode change time of the file, before you open the file, and then compare the timestamp value with the current one once it is closed.
i.e.:
stat -c %Z filename
Output: 1558650334
Wrapping up:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void execute_command(char* cmd, char* result) {
FILE *fp;
fp = popen(cmd, "r");
fscanf (fp, "%s" , result);
}
int get_changetime(char* filename) {
char cmd[4096];
char output[10];
sprintf(cmd, "stat -c %%Z %s", filename);
execute_command(cmd, output);
return atoi(output);
}
int main() {
char cmd[4096];
char* filename = "path/to/xml-file.xml";
uint ctime = get_changetime(filename);
sprintf(cmd, "gedit %s", filename);
execute_command(cmd, NULL);
if (ctime != get_changetime(filename)) {
printf("file modified!");
// do your work here...
}
return 0;
}

Does file type rely on file extension?

As a general question: What's the role of file extension when determining file types?
For example, I can change .jpeg file to .png extension and even .txt. Of course, in the case of changing to .txt, it will neither be opened as picture, nor readable.
To determine file type, it seems the safe way is to parse the first few bytes of the file. If extension is not trustable, extension is no more than file name.
As a general rule, you should ALWAYS parse the COMPLETE file in order to be sure that the file is what the extension says. As you can easily imagine, it is pretty simple to create a binary file resembling a e.g. BMP (with a correct header) but then containing something different.
You should never trust the extension neither the header because otherwise a malicious user could exploit some of your code to generate e.g. a buffer overflow, and this is absolutely paramount if you are writing programs that must run at root/admin privilege.
Having said the obvious, the file extension nowadays is mainly used so that the OS can associate a program to that particular file (usually calling the program and passing the selected file as first parameter), and then it's up to the program to determine the file content.
It is a little bit different when talking about executable files. Under Unix, in order to be executable a file has to have the "x" flag set, otherwise it would not run, regardless of the extension. Under Windows, there is not such thing and the OS relies on only a few extensions (EXE, COM, BAT, etc.) to determine which files can be executed.
The EXE file, for example, has to start with "MZ" followed by some information for its allocation and size (http://www.delorie.com/djgpp/doc/exe/) and the OS surely checks its internal headers. Other formats (e.g. the COM executable format of the MS-DOS era) is just "pure" assembly code, so there is no check done by the OS. It just interprets those opcodes, hoping that everything will be fine.
So, to summarize:
File extension is mainly used so that the OS can call the appropriate program to open it (and passing the filename as the first parameter, argc/argv in C language for example)
Windows relies on some file extension to know if a file is executable, while Unix/Mac relies on a particular flag (x) associated with the file
Two things that are not well known about file extensions: directory names can have extension too, and extension can be way longer than the usual 3 characters.
With the help of file extension, you know how to read the first few and all the rest of the bytes. You also know what program to use to read the file. Or if it is an executable, you know that it is to be executed and not shown as a picture.
Yes you can change the file extension, but what does it mean then? It only means that OS (or any program that tried to read the file) is working correctly. Only you are providing bad data to it.
File extension is not something that some bytes of data inherently have. Extensions are given to those bytes depending upon the protocol followed to write them that way. After you have encoded the letters in binary form, you provide that binary form with .txt extension so that the text reader knows that these bytes convert to letters. That's the role of file extension. With bad file extension, this role is not fulfilled, resulting in incomprehension of the data you saved in binary.
As a general question: What's the role of file extension when determining file types?
The file extension usually identifies the application that opens a file.
If you rename a .JPG to a .PNG and while having JPG and PNG opened by the same application (usually an image viewer) that application can read the image stream and process it correctly regardless of having an incorrect file stream.
The problem arises if you rename the file in such a way that the file gets routed to an application that cannot handle the file's content.
If you rename a .DOCX (word) file to an Autocad extension (.DWG), opening the word file in autocad is likely to produce errors (unless per chance autocad can read word files).

Opening a file in C in write mode

I am writing a C program that converts a text file into a binary file and vice versa. The first question I have is about opening a file in "w" mode. Is there any need to have a check that the output file is opened correctly?
FILE *output;
output = fopen("output.bin", "w");
if(output == NULL)
{ printf("Error opening output file\n");}
Basically my question is whether or not output would ever actually == NULL. Because if there was a problem opening the output wouldn't it just create a new file named "output.bin"?
Also my other question is how characters are actually saved in a binary file. I know I'm supposed to save each character as an unsigned char so it can have values between 0 and 255 and then I should write that char to the output file. The actual logical path of how that happens is not making sense if anyone can help me or point me in the right direction I would appreciate it!
Yes, opening a file in write mode might still fail. Here's a bunch of possible reasons, but certainly not the only ones:
You don't have permission to create or change the file.
The file is read-only, or the directory it would be in is read-only.
The file would be inside another file. (test/foo if test is a file and not a directory)
The filesystem is out of space or inodes (on filesystems that have a fixed number of inodes)
The user has hit their disk space quota.
The file would be on another computer, and the network is down.
The filename is invalid - such as C:/???*\\\\foo on windows.
The filename is too long.

JPG file won't open after editing

When I open a ".jpg" picture file with notepad and edit it, after saving the file doesn't open. As an error, it says that file is damaged. And even when I delete some symbol and rewrite it, in the same place, in the same way, and save changes after that, it still won't open. Why?
JPG is a binary format, by that we mean that it represents a series of numbers. Notepad is for editing text files, in a text file those numbers refer to letters (using the ASCII table). Editing a binary file in a text editor is likely to cause corruption as the text editor may not be able to represent all of the file properly (it's not actually text) and may modify it to force it to be text before storing it.
In particular, many numbers are used as control codes (eg. the new line character). As JPG is a binary format those control codes have no meaning and will be dispersed throughout the file creating more havok than just displaying gobbledegook.

Resources