Differences between writing/reading binary/text in c - c

I'm working on a client/server program where the client sends/receives files. The files may be text files or binary files. However, I am not sure what changes I need to make, if any, to accommodate for either file type. Basically I am looking to read/write to a file on the server side without caring what type of file it is, I would like to be able to do so without checking what type of file it was. Would code like this work? Why or why not?
Server snippet:
//CREATING/WRITING TO A FILE
//we are ready to begin reading data from the client, and storing it
int fd = open(pathname, O_CREAT | O_WRONLY | O_EXCL, S_IRUSR | S_IWUSR);
while(nbytes < bytes)
{
//only read the neccessary # of bytes: the remaining bytes vs max buffer size
int min_buffer = (bytes - nbytes) < BUFFER_SIZE ? (bytes - nbytes) : BUFFER_SIZE;
length = recv( client->client_socket, contents, min_buffer, 0);
if(fd < 0) //the fd is bad, but we need to continue reading bytes anyway
{
nbytes += length;
continue;
}
if(length <= 0)
break;//string empty or error occurred...this error means the client closed?
if(write(fd, contents, min_buffer) != min_buffer)
{
//printf("There was an error writing to the file.\n");
}
else
{
nbytes += length;
}
}
//READING A STORED FILE AND SENDING THE DATA TO CLIENT
int fd = open(pathname, O_RDWR, S_IRUSR | S_IWUSR);
if(fd >= 0)
{
while(bytes > 0)
{
bytes = read(fd, buffer, BUFFER_SIZE );
if(bytes > 0)//we have read some bytes
{
//send the client the data
write(client->client_socket, buffer, bytes);
}
else if(bytes < 0)
{
//some error occured
write( client->client_socket, "ERROR: Could not read\n", 22);
return;
}
}
}
So if the client sends a binary file vs a text file, would this code cause issues? (We can assume the client knows what type of file to expect.)
Note: Another confusing detail about this is that there are tutorials for writing/reading binary files in c that didn't seem to have any real differences over regular files, which is what lead me here.

Just do everything with "binary" files. Linux has no difference between "text" and "binary" in a file on OS level, there are just files with bytes in it. Ie. expect that a file contains every possible byte value, and don´t write different code for different kinds of content.
There is a difference in Windows: Text mode in Windows means that a line break (\n) in the program gets converted to/from \r\n when writing to / reading from a file. The written text file read in binary mode will contain this two bytes instead of the original \n and vice-versa. (Additionally, MS isn´t very clear in the documentation that this is the only difference, it can confuse beginners easily.)
If you use standard C fopen and fclose instead of Linux-specific open etc., you can specify to open a file in binary or text mode (on Linux too). This is because code with fopen should work on Windows and Linux without any OS-specific changes; but what you choose in fopen doesn´t matter when running on Linux (which can be verified by reading the source code of fopen etc.)
And about the sockets:
Linux: No difference (again)
Windows: No difference too. There are just bytes, and no strange line break conversions.

I tore my hair out for a day over a binary/text file issue. I was outputting binary data into "files" (apparently text files ... after years and years of C I'd always thought a file was a file) and kept getting spurious characters inserted into the output. I went so far as to download a new compiler but had the same problem. The issue? When I output hex A using any of the family of fprint statements, hex D was being inserted. Yes, line feed characters -- A -- were being replaced by carriage return/line feed -- DA. It's a legacy "end of line" issue based on how different systems have developed. The tough part of finding the problem was realizing A was being interpreted as more than just a binary field, but actually being recognized as a line feed.

Related

How to send file and filename through socket in C?

I am trying to send a file and its name through a socket in C.
The relevant server code is:
char file[18];
memset(file, 0, 18);
file[17] = '\0';
int recvd = recv(newsock, file, 16, 0);
char local_file_path[200];
memset(local_file_path, 0, 200);
if(recvd == -1 || recv == 0) {
fprintf(stderr, "File name not received");
continue;
}
strcat(local_file_path, "/home/ubuntu/results/");
strcat(local_file_path, file);
FILE* fp = fopen(local_file_path, "wb");
char buffer[4096];
while(1)
{
recvd = recv(newsock, buffer, 4096, 0);
fwrite(buffer, sizeof(char), recvd, fp);
if(recvd == -1 || recvd == 0) {
fclose(fp);
break;
}
}
close(newsock);
}
close(servSock);
The relevant client code is:
char* my_16_long_fname = "filename1234.txt"
int ret = send(sock, my_16_long_file_fname, strlen(my_16_long_fname), 0)
This code, however, has been creating lots of undefined behaviour such as:
1.Receiving garbage filenames filled with garbage
2.Receiving empty files (so a name with nothing inside - could be some other bug but possibly due to this)
I have thought about a few solutions:
1.Diferentiate file types by signature/header and generate a file name on the server side. Besides this being a cheap solution which doesn't teach me how to actually solve the problem, it doesn't work with the logic i'm using, where sometimes I send error codes instead of file names after opening the socket.
2.Iterate over the recv'd buffer on the first call to recv until I encounter a '\0' character. Then write the remainder of the buffer as binary data and keep on receiving data as usual.
Is this the most efficient/simplest and solid solution to this issue, which will prevent any undefined behaviour?
There is no way your current code could possibly work. If the filename is always one character, your code can read too many characters. If your filename is always the same number of characters but more than one character, your code can read too few characters. If the filename is a variable number of characters, your code could read a smaller number than was sent.
So there is no sending protocol for which this could be valid receiving code.
Until you are an expert on writing networking code, always follow these two steps:
Document the protocol.
How many bytes does the filename occupy? Is it a fixed number or a variable number? Is it always followed by a zero byte?
Implement the protocol.
For example, your code reads up to 16 bytes for the filename. But it never checks if it received the whole file name. What if it only received a single byte?

Open non text file without windows line ending

I took over a project that use the following function to read files:
char *fetchFile(char *filename) {
char *buffer;
int len;
FILE *f = fopen(filename, "rb");
if(f) {
if(verbose) {
fprintf(stdout, "Opened file %s successfully\n", filename);
}
fseek(f, 0, SEEK_END);
len = ftell(f);
fseek(f, 0, SEEK_SET);
if(verbose) {
fprintf(stdout, "Allocating memory for buffer for %s\n", filename);
}
buffer = malloc(len + 1);
if(buffer) fread (buffer, 1, len, f);
fclose (f);
buffer[len] = '\0';
} else {
fprintf(stderr, "Error reading file %s\n", filename);
exit(1);
}
return buffer;
}
The rb mode is used because sometimes the file can be a spreadsheet and therefore I want the information as in a text file.
The program runs on a linux machine but the files to read come from linux and windows.
I am not sure of what approach is better to not have windows line ending mess with my code.
I was thinking of using dos2unix at the start of this function.
I also thought of opening in r mode, but I believe that could potentially mess things up when opening non-text files.
I would like to understand better the differences between using:
dos2unix,
r vs rb mode,
or any other solution which would fit
better the problem.
Note: I believe that I understand r vs rb modes, but if you could explain why it is a bad or good solution for this specific situation (I think it wouldn't be good because sometimes it opens spreadsheets but I am not sure of that).
If my understanding is correct the rb mode is used because sometimes the file can be a spreadsheet and therefore the programs just want the information as in a text file.
You seem uncertain, and though perhaps you do understand correctly, your explanation does not give me any confidence in that.
C knows about two distinct kinds of streams: binary streams and text streams. A binary stream is simply an ordered sequence of bytes, written and / or read as-is without any kind of transformation. On the other hand,
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one- to-one correspondence
between the characters in a stream and those in the external
representation. [...]
(C2011 7.21.2/2)
For some implementations, such as POSIX-compliant ones, this is a distinction without a difference. For other implementations, such as those targeting Windows, the difference matters. In particular, on Windows, text streams convert on the fly between carriage-return / line-feed pairs in the external representation and newlines (only) in the internal representation.
The b in your fopen() mode specifies that the file should be opened as a binary stream -- that is, no translation will be performed on the bytes read from the file. Whether this is the right thing to do depends on your environment and the application's requirements. This is moot on Linux or another Unix, however, as there is no observable difference between text and binary streams on such systems.
dos2unix converts carriage-return / line-feed pairs in the input file to single line-feed (newline) characters. This will convert a Windows-style text file or one with mixed Windows / Unix line terminators to Unix text file convention. It is irreversible if there are both Windows-style and Unix-style line terminators in the file, and it is furthermore likely to corrupt your file if it is not a text file in the first place.
If your inputs are sometimes binary files then opening in binary mode is appropriate, and conversion via dos2unix probably is not. If that's the case and you also need translation for text-file line terminators, then you first and foremost need a way to distinguish which case applies for any particular file -- for example, by command-line argument or by pre-analyzing the file via libmagic. You then must provide different handling for text files; your main options are
Perform the line terminator conversion in your own code.
Provide separate versions of the fetchFile() function for text and binary files.
The code just copies the contents of a file to an allocated buffer. The UNIX way (YMMV) is to just memory map the file instead of reading it. Much faster.
// untested code
void* mapfile(const char *name)
{
int fd;
struct stat st;
if ((fd = open(name, O_RDONLY)) == -1)
return NULL;
if (fstat(fd, &st)) {
close(fd);
return NULL;
}
void *p = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, 0, fd);
close(fd);
if (p == (void *)MAP_FAILED)
p = NULL;
return p;
}
Something along these lines will work. Adjust settings if you want to write to the file as well.

How to duplicate an image file? [duplicate]

I am designing an image decoder and as a first step I tried to just copy the using c. i.e open the file, and write its contents to a new file. Below is the code that I used.
while((c=getc(fp))!=EOF)
fprintf(fp1,"%c",c);
where fp is the source file and fp1 is the destination file.
The program executes without any error, but the image file(".bmp") is not properly copied. I have observed that the size of the copied file is less and only 20% of the image is visible, all else is black. When I tried with simple text files, the copy was complete.
Do you know what the problem is?
Make sure that the type of the variable c is int, not char. In other words, post more code.
This is because the value of the EOF constant is typically -1, and if you read characters as char-sized values, every byte that is 0xff will look as the EOF constant. With the extra bits of an int; there is room to separate the two.
Did you open the files in binary mode? What are you passing to fopen?
It's one of the most "popular" C gotchas.
You should use freadand fwrite using a block at a time
FILE *fd1 = fopen("source.bmp", "r");
FILE *fd2 = fopen("destination.bmp", "w");
if(!fd1 || !fd2)
// handle open error
size_t l1;
unsigned char buffer[8192];
//Data to be read
while((l1 = fread(buffer, 1, sizeof buffer, fd1)) > 0) {
size_t l2 = fwrite(buffer, 1, l1, fd2);
if(l2 < l1) {
if(ferror(fd2))
// handle error
else
// Handle media full
}
}
fclose(fd1);
fclose(fd2);
It's substantially faster to read in bigger blocks, and fread/fwrite handle only binary data, so no problem with \n which might get transformed to \r\n in the output (on Windows and DOS) or \r (on (old) MACs)

how to read and write from .dat file in verifone

I want to read and write a text or .dat file in verifone to store data on it.
How can I make it ?
here is my code
int main()
{
char buf [255];
FILE *tst;
int dsply = open(DEV_CONSOLE , 0);
tst = fopen("test.txt","r+");
fputs("this text should write in file.",tst);
fgets(buf,30,tst);
write(dsply,buf,strlen(buf));
return 0;
}
Chapter 3 of the "Programmers Manual for Vx Solutions" ("23230_Verix_V_Operating_System_Programmers_Manual.pdf") is all about file management and contains all the functions I typically use when dealing with data files on the terminal. Go read through that and I think you'll find everything you need.
To get you started, you'll want to use open() together with the flags you want
O_RDONLY (read only)
O_WRONLY (write only)
O_RDWR (read and write)
O_APPEND (Opens with the file position pointer at the end of the file)
O_CREAT (create the file if it doesn't already exist),
O_TRUNC (truncate/delete previous contents if the file already exists),
O_EXCL (Returns error value if the file already exists)
On success, open will return a positive integer that is a handle which can be used for subsequent access to the file. On failure, it returns -1;
When the file is open, you can use read() and write() to manipulate the contents.
Be sure to call close() and pass in the return value from open when you are done with the file.
Your example above would look something like this:
int main()
{
char buf [255];
int tst;
int dsply = open(DEV_CONSOLE , 0);
//next we will open the file. We will want to read and write, so we use
// O_RDWR. If the files does not already exist, we want to create it, so
// we use O_CREAT. If the file *DOES* already exist, we want to truncate
// and start fresh, so we delete all previous contents with O_TRUNC
tst = open("test.txt", O_RDWR | O_CREAT | O_TRUNC);
// always check the return value.
if(tst < 0)
{
write(dsply, "ERROR!", 6);
return 0;
}
strcpy(buf, "this text should write in file.")
write(tst, buf, strlen(buf));
memset(buf, 0, sizeof(buf));
read(tst, buf, 30);
//be sure to close when you are done
close(tst);
write(dsply,buf,strlen(buf));
//you'll want to close your devices, as well
close(dsply);
return 0;
}
Your comments also ask about searching. For that, you'll also need to use lseek with one of the following which specifies where you are starting from:
SEEK_SET — Beginning of file
SEEK_CUR — Current seek pointer location
SEEK_END — End of file
example
SomeDataStruct myData;
...
//assume "curPosition" is set to the beginning of the next data structure I want to read
lseek(file, curPosition, SEEK_SET);
result = read(file, (char*)&myData, sizeof(SomeDataStruct));
curPosition += sizeof(SomeDataStruct);
//now "curPosition" is ready to pull out the next data structure.
NOTE that the internal file pointer is already AT "curPosition", but doing it this way allows me to move forward and backward at will as I manipulate what is there. So, for example, if I wanted to move back to the previous data structure, I would simply set "curPosition" as follows:
curPosition -= 2 * sizeof(SomeDataStruct);
If I didn't want to keep track of "curPosition", I could also do the following which would also move the internal file pointer to the correct place:
lseek(file, - (2 * sizeof(SomeDataStruct)), SEEK_CUR);
You get to pick whichever method works best for you.

Errors when implementing a fwrite to get date from socket

I see a plenty of examples but none addresses what I want to accomplish. I need to read the bytes from a socket and write them in to a file. In this Code Project blog I see where in the client script a while loop iterates through a read call:
while((n = read(sockfd, recvBuff, sizeof(recvBuff)-1)) > 0)
So I modified the code do that fputs(recvBuff, f1) where f1 is a pointer to a pdf file. A pdf file is also a file I'm fetching from the server so I need to reassemble it, however the fputs operated with a string and corrupts the file, so I need a byte "writer" so fwrite would have been the choice but I can't get fwrite to work. I ended up modifying my code to resemble some of the examples to test it out but to no avail.
If in fwrite the first parameters is the 'data' how would I pass it? I've tried the read() call as in the while loop above but that seem to return an integer rather then a byte stream. Any ideas?
I'm new to programming but am new to C and would appreciate a little push in a right direction. Thanks.
You want something more like this. fwrite doesn't return a stream it returns the number of items (i.e. the 3rd parameter) successfully written. In this case the "item" is a single char and you are attempting to write "bytesRead" number of them. Good form dictates that you should check that the result fread returns is the same as you requested be written but this rarely fails on a disk file so many people skip it in non-critical situations. You may want to add that on yourself.
FILE *f1;
int sockfd;
char recvBuff[4096];
size_t bytesWritten;
ssize_t bytesRead;
while((bytesRead = read(sockfd, recvBuff, sizeof(recvBuff))) > 0)
bytesWritten = fwrite(recvBuff, 1, bytesRead, f1);

Resources