zlib 1.2.5 unable to recognize this header - zlib

I have a source text and its supposedly-zlib deflated embedding (and \ escaping) within another text file. I do not have docs on its encoding other than it uses zlib with nominal escaping for \0, \t, \n, \r, quote, etc.
The unescaped data has:
first four bytes: 1A 9B 02 00
last four bytes: 76 18 23 82
which inflate complains about having an invalid header.
When I deflate/inflate the matching source text myself using 1.2.5, I get:
first four bytes: 78 9C ED 7D
Can someone suggest what compression is being used given the header bytes? I haven't found any magic numbers or header formula that actually uses those.
EDIT: Here are the relevant files...
codedreadbase.cohdemo is the source text file with the escaped embedded section following the BASE verb. Escapes are:
\n = (newline)
\r = (return)
\0 = 0 (NULL)
\t = tab
\q = "
\s = '
\d = $
\p = %
codedreadbase.deflated is what I am passing to zlib inflateInit/inflate*/inflateEnd after unescpaing the above within the double quotes.
codedreadbase.txt is the original text of the embedded section.

Your first four bytes, 1A 9B 02 00 are the length of the uncompressed data in little-endian order, 170778 in decimal. You have indeed found the start of a valid zlib stream with the next four bytes: 78 5E ED 7D. You just need to properly extract the binary compressed stream from the escaped format. I had no problem and decompressed codedreadbase.txt exactly.
You didn't mention one obvious escape, which is the backslash itself. \\ should go to \. Maybe that's what you're missing. This simple un-escaper in C worked:
#include <stdio.h>
int main(void)
{
int ch;
while ((ch = getchar()) != EOF) {
if (ch == '\\') {
ch = getchar();
if (ch == EOF)
break;
ch =
ch == 'n' ? '\n' :
ch == 'r' ? '\r' :
ch == '0' ? 0 :
ch == 't' ? '\t' :
ch == 'q' ? '"' :
ch == 's' ? '\'' :
ch == 'd' ? '$' :
ch == 'p' ? '%' : ch;
}
putchar(ch);
}
return 0;
}

Related

How to send image or binary data through HTTP POST request in C

I'm trying to POST a binary file to a web server with a client program written in C (Windows). I'm pretty new to socket programming, so tried POST requests using multipart/form-data with plain text messages, and text-based files (.txt, .html, .xml). Those seem to work fine. But when trying to send a PNG file, I'm running into some problems.
The following is how I read the binary file
FILE *file;
char *fileName = "download.png";
long int fileLength;
//Open file, get its size
file = fopen(fileName, "rb");
fseek(file, 0, SEEK_END);
fileLength = ftell(file);
rewind(file);
//Allocate buffer and read the file
void *fileData = malloc(fileLength);
memset(fileData, 0, fileLength);
int n = fread(fileData, 1, fileLength, file);
fclose(file);
I confirmed that all the bytes are getting read properly.
This is how I form my message header and body
//Prepare message body and header
message_body = malloc((int)1000);
sprintf(message_body, "--myboundary\r\n"
"Content-Type: application/octet-stream\r\n"
"Content-Disposition: form-data; name=\"myFile\"; filename=\"%s\"\r\n\r\n"
"%s\r\n--myboundary--", fileName, fileData);
printf("\nSize of message_body is %d and message_body is \n%s\n", strlen(message_body), message_body);
message_header = malloc((int)1024);
sprintf(message_header, "POST %s HTTP/1.1\r\n"
"Host: %s\r\n"
"Content-Type: multipart/form-data; boundary=myboundary\r\n"
"Content-Length: %d\r\n\r\n", path, host, strlen(message_body));
printf("Size of message_header is %d and message_header is \n%s\n", strlen(message_header), message_header);
The connection and sending part also works fine as the request is received properly. But, the received png file is ill-formatted.
The terminal prints out the following for fileData if I use %s in printf
ëPNG
I searched around and came to know that binary data doesn't behave like strings and thus printf/ sprintf/ strcat etc. cannot be used on them. As binary files have embedded null characters, %s won't print properly. It looks like that is the reason fileData only printed the PNG header.
Currently, I send two send() requests to server. One with the header and the other with body and footer combined. That was working for text-based files. To avoid using sprintf for binary data, I tried sending one request for header, one for binary data (body) & one for footer. That doesn't seem to work either.
Also, found that memcpy could be used to append binary data to normal string. That didn't work either. Here is how I tried that (Not sure whether my implementation is correct or not).
sprintf(message_body, "--myboundary\r\n"
"Content-Disposition: form-data; name=\"text1\"\r\n\r\n"
"text default\r\n"
"--myboundary\r\n"
"Content-Type: application/octet-stream\r\n"
"Content-Disposition: form-data; name=\"myFile\"; filename=\"%s\"\r\n\r\n", fileName);
char *message_footer = "\r\n--myboundary--";
char *message = (char *)malloc(strlen(message_body) + strlen(message_footer) + fileLength);
strcat(message, message_body);
memcpy(message, fileData, fileLength);
memcpy(message, message_footer, strlen(message_footer));
I'm stuck at how I could send my payload which requires appending of string (headers), binary data (payload), string (footer).
Any advice/ pointers/ reference links for sending the whole file would be appreciated. Thank You!
How to print binary data
In your question, you stated you were having trouble printing binary data with printf, due to the binary data containing bytes with the value 0. Another problem (that you did not mention) is that binary data may contain non-printable characters.
Binary data is commonly represented in one of the following ways:
in hexadecimal representation
in textual representation, replacing non-printable characters with placeholder characters
both of the above
I suggest that you create your own simple function for printing binary data, which implements option #3. You can use the function isprint to determine whether a character is printable, and if it isn't, you can place some placeholer character (such as 'X') instead.
Here is a small program which does that:
#include <stdio.h>
#include <ctype.h>
#include <string.h>
void print_binary( char *data, size_t length )
{
for ( size_t i = 0; i < length; i += 16 )
{
int bytes_in_line = length - i >= 16 ? 16 : length - i;
//print line in hexadecimal representation
for ( int j = 0; j < 16; j++ )
{
if ( j < bytes_in_line )
printf( "%02X ", data[i+j] );
else
printf( " " );
}
//add spacing between hexadecimal and textual representation
printf( " " );
//print line in textual representation
for ( int j = 0; j < 16; j++ )
{
if ( j < bytes_in_line )
{
if ( isprint( (unsigned char)data[i+j] ) )
putchar( data[i+j] );
else
putchar( 'X' );
}
else
{
putchar( ' ' );
}
}
putchar( '\n' );
}
}
int main( void )
{
char *text = "This is a string with the unprintable backspace character \b.";
print_binary( text, strlen( text ) );
return 0;
}
The output of this program is the following:
54 68 69 73 20 69 73 20 61 20 73 74 72 69 6E 67 This is a string
20 77 69 74 68 20 74 68 65 20 75 6E 70 72 69 6E with the unprin
74 61 62 6C 65 20 62 61 63 6B 73 70 61 63 65 20 table backspace
63 68 61 72 61 63 74 65 72 20 08 2E character X.
As you can see, the function print_binary printed the data in both hexadecimal representation and textual representation, 16 bytes per line, and it correctly replaced the non-printable backspace character with the placeholer 'X' character when printing the textual representation.
Wrong printf conversion format specifier
The line
printf("\nSize of message_body is %d and message_body is \n%s\n", strlen(message_body), message_body);
is wrong. The return type of strlen is size_t, not int. The correct printf conversion format specifier for size_t is %zu, not %d. Using the wrong format specifier causes undefined behavior, which means that it may work on some platforms, but not on others.
Concatenating string with binary data
The following lines are wrong:
char *message = (char *)malloc(strlen(message_body) + strlen(message_footer) + fileLength);
strcat(message, message_body);
memcpy(message, fileData, fileLength);
memcpy(message, message_footer, strlen(message_footer));
The function strcat requires both function arguments to point to null-terminated strings. However, the first function argument is not guaranteed to be null-terminated. I suggest that you use strcpy instead of strcat.
Also, in your question, you correctly stated that the file binary data should be appended to the string. However, that is not what the line
memcpy(message, fileData, fileLength);
is doing. It is instead overwriting the string.
In order to append binary data to a string, you should only overwrite the terminating null character of the string, for example like this:
memcpy( message + strlen(message), fileData, fileLength );

A histogram of the length of words in its input. exercise 1_13, k&r pdf

I have just started learnign to program, and I'm having troubles writing a program from k&r second edition pdf, to write a a program histogram of the length of words in its input I imagined my program would be something like:
(words number)
1 XXX
2 XXXXX
3 XX
4
5 X
12345 (charcacters number)
Here is the code I have done so far:
#include <stdio.h>
#define out 0
#define in 1
int main()
{
char X, nc;
int state, nw, i, x_count[10], c;
i = 0;
nc = 0;
nw = 1;
for (i = 0; i < 10; ++i)
x_count[i] = 0;
while ((c = getchar()) != EOF) {
if (state == in && c != '\b' && c != ' ' && c != '\t')
++nc;
else {
++nw;
state = out;
}
if (state == out) {
for (i = 0; i < nc; i++) {
x_count[i] = X;
}
}
state = in;
}
printf("%d: %c", nw, x_count[i]);
return 0;
}
As pointed out by #kaylum in the comment, the immediate problem that breaks the defined behavior of your code is your use of state before it has been assigned a value in:
if (state == in && ...
state is a variable declared with automatic storage duration. Until the variable state is explicitly assigned a value, its value is indeterminate. Using state while its value is indeterminate results in Undefined Behavior. See: C11 Standard - 6.7.9 Initialization(p10) and J.2 Undefined Behavior
Once you invoke Undefined Behavior in your code, the defined execution is over and your program can do anything between appearing to run correctly or SegFault. See: Undefined, unspecified and implementation-defined behavior
The simple fix is to initialize int state = out; to begin with. (you will start in the out state in order to ignore leading whitespace before the first word)
You have similar problems with your variable X which is not initialized and is used when its value is indeterminate in x_count[i] = X; Moreover, it is unclear what you intend to do with int X to begin with. It is clear from your desired output:
(words number)
1 XXX
2 XXXXX
3 XX
4
5 X
12345 (charcacters number)
That you want to output one 'X' per-character (to indicate the word length for your histogram), but there is no need to store anything in a variable X to do that, you simply need to output one character 'X' for each character in the word. Additionally your output of 4 does not make much sense being empty as your state-variable state should prevent counting empty words. You would never have been in an empty word.
Compounding the confusion is your check for a backspace '\b' character when you check EOF and other whitespace characters for end of word. It looks more likely that you intended a '\n' but though an off-by-one-key typo you have '\b' instead of '\n'. That is conjecture that you will have to add details to clarify...
A Word-Length Histogram
K&R provides very good exercises and the use of a state-loop is a very good place to start. Rather than multiple-included loops to inch-worm over each word and skip over potentially multiple-included whitespace, you simply keep a state-variable state in your case to track whether you are in a word reading characters, or before the first word, between words or after the last word reading whitespace. While you can simply the check for whitespace by including ctype.h and using the isspace() macro, a manual check of multiple whitespace characters is fine.
While defining in and out macros of 1/0 is fine, simply using a variable and assigning 0 for out or non-zero for in works as well. Since you are keeping a character-count to output a length number of 'X' characters, you can just use your character count variable as your state-variable. It will be zero until you read the first character in a word, and then you would reset it to zero after outputting your length number of 'X's to prepare for the next word.
Initializing all variables, and reading either from the filename given as the first argument to the program, or from stdin by default if no argument is given, you can do something similar to:
#include <stdio.h>
int main (int argc, char **argv) {
int cc = 0, /* character count (length) */
wc = 0; /* word count */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
for (;;) { /* loop continually */
int c = fgetc(fp); /* read character from input stream */
if (c == EOF || c == ' ' || c == '\t' || c == '\n') { /* check EOF or ws */
if (cc) { /* if characters counted */
printf ("%3d : ", wc++); /* output word count */
while (cc--) /* loop char count times */
putchar ('X'); /* output X */
putchar ('\n'); /* output newline */
cc = 0; /* reset char count */
}
if (c == EOF) /* if EOF -- bail */
break;
}
else /* otherwise, normal character */
cc++; /* add to character count */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(note: the character-count cc variable is used as the state-variable above. You can use an additional variable like state if that is more clear to you, but think through way using cc above accomplishes the same thing. Also note the change and use of '\n' instead of '\b' as the literal backspace character is rarely encountered in normal input, though it can be generated -- while a '\n' is encountered every time the Enter key is pressed. If you actually want to check for teh backspace character, you can add it to the conditional)
Example Input File
$ cat dat/histfile.txt
my dog has fleas
my alligator has none
Example Use/Output
Using a heredoc for input:
$ cat << eof | ./bin/wordlenhist
> my dog has fleas
> my alligator has none
> eof
0 : XX
1 : XXX
2 : XXX
3 : XXXXX
4 : XX
5 : XXXXXXXXX
6 : XXX
7 : XXXX
Redirecting from a file for input:
$ ./bin/wordlenhist < dat/histfile.txt
0 : XX
1 : XXX
2 : XXX
3 : XXXXX
4 : XX
5 : XXXXXXXXX
6 : XXX
7 : XXXX
Or passing the filename as a argument and opening and reading from the file within your program are all options:
$ ./bin/wordlenhist dat/histfile.txt
0 : XX
1 : XXX
2 : XXX
3 : XXXXX
4 : XX
5 : XXXXXXXXX
6 : XXX
7 : XXXX
Lastly, you can input directly on stdin and generate a manual EOF by pressing Ctrl+d on Linux or Ctrl+z on windows. (note: you will have to press the key combination twice -- can you figure out why?) E.g.
$ ./bin/wordlenhist
my dog has fleas my alligator has none 0 : XX
1 : XXX
2 : XXX
3 : XXXXX
4 : XX
5 : XXXXXXXXX
6 : XXX
7 : XXXX
(also note where the first line of output is placed -- this will help you answer the last question)
If you would like to add a comment below and clarify your intent for int X; and x_count[i] = X; and the use of '\b' I'm happy to help further. Look things over and let me know if you have any questions.

Can't open a file with unicode chars in the file name using CLion

I'm having trouble opening a file that has Unicode characters in its name.
I created a file on my desktop with just a couple lines of text.
c:\users\james\desktop\你好世界.txt
EDIT: I'm using CLion. CLion is passing parameters in unicode.
When I put that string into the Windows run dialog, it finds the file and opens it.
Something interesting, though, is that I get double L'\\' L'\\' in the folder name from my call to CommandLineToArgvW:
L"c:\\\\users\\\\james\\\\desktop\\\\你好世界.txt"
So I wrote a small routine to copy the filename to another wchar_t * and strip the slashes. Still doesn't work.
errno == 2 and f == NULL.
size_t filename_max_len = wcslen(filename);
//strip double slashes
wchar_t proper_filename[MAX_PATH + 1];
wchar_t previous = L'\0';
size_t proper_filename_location = 0;
for(int x = 0; x < filename_max_len; ++x)
{
if(previous == L'\\' && filename[x] == L'\\')
continue;
previous = filename[x];
proper_filename[proper_filename_location++] = filename[x];
}
proper_filename[proper_filename_location] = L'\0';
//Read in binary mode to prevent the C system from screwing with line endings
FILE *f = _wfopen(proper_filename, L"rb");
int le = errno;
if (f == NULL)
{
perror(strerror(le));
if(le == ERROR_FILE_NOT_FOUND)
{
return DUST_ERR_FILE_NOT_FOUND;
}
else {
return DUST_ERR_COULD_NOT_OPEN_FILE;
}
}
I have figured out the issue. My hunch was correct. CLion appears to be providing unicode as input to the program. Using the Windows run dialog and passing it as a parameter to my program, I was able to open and process the file without an issue.
My first guess is that 228, 189, 160 represents the first character of your filename encoded as a UTF-8 byte sequence since it looks like such a sequence to me. E4 BD A0 (228, 189, 160) decodes as U+4F60, which is indeed the Unicode code point corresponding to the first character.
I modified the output section of main in my sample program here to print each argument as a hex-encoded byte sequence. I copied and pasted your path as an argument to the program, and the Han characters are encoded in UTF-8 as:
E4 BD A0
E5 A5 BD
E4 B8 96
E7 95 8C
Your comment mentions slightly different numbers (specifically 8211/U+2013, 8226/U+2022, and 338/U+0152). Looking at code pages Windows 1250 and Windows 1252, bytes 0x96, 0x95, and 0x8C in both code pages correspond exactly to U+2013, U+2022, and U+0152 respectively. I'm guessing your original program goes wrong somewhere when it encounters Unicode input (you are using GetCommandLineW and passing that to CommandLineToArgvW, right?)
Here's a screenshot of my output that I've edited to highlight the relevant character sequences (the ¥ glyphs are meant to be \ glyphs, but I use code page 932 for cmd.exe):

strtok() appends some character to my string

I'm using strtok() to parse a string I get from fgets() that is separated by the ~ character
e.g. data_1~data_2
Here's a sample of my code:
fgets(buff, LINELEN, stdin);
pch = strtok(buff, " ~\n");
//do stuff
pch = strtok(NULL, " ~\n");
//do stuff
The first instance of strtok breaks it apart fine, I get data_1 as is, and strlen(data_1) provides the correct length of it. However, the second instance of strtok returns the string, with something appended to it.
With an input of andrewjohn ~ jamessmith, I printed out each character and the index, and I get this output:
a0
n1
d2
r3
e4
w5
j6
o7
h8
n9
j0
a1
m2
e3
s4
s5
m6
i7
t8
h9
10
What is that "11th" value corresponding to?
EDIT:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char buff[100];
char * pch;
fgets(buff, 100, stdin);
pch = strtok(buff, " ~\n");
printf("FIRST NAME\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
printf("SECOND NAME\n");
pch = strtok(NULL, " ~\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
}
I ran it by:
cat sample.in | ./myfile
Where sample.in had
andrewjohn ~ johnsmith
Output was:
FIRST NAME
a 97 0
n 110 1
d 100 2
r 114 3
e 101 4
w 119 5
j 106 6
o 111 7
h 104 8
n 110 9
SECOND NAME
j 106 0
o 111 1
h 104 2
n 110 3
s 115 4
m 109 5
i 105 6
t 116 7
h 104 8
13 9
So the last character is ASCII value 13, which says it's a carriage return ('\r'). Why is this coming up?
Based on your edit, the input line ends in \r\n. As a workaround you could just add \r to your list of tokens in strtok.
However, this should be investigated further. \r\n is the line ending in a Windows file, but stdin is a text stream, so \r\n in a file would be converted to just \n in the fgets result.
Are you perhaps piping in a file that contains something weird like \r\r\n ? Try hex-dumping the file you're piping in to check this.
Another possible explanation might be that your Cygwin (or whatever) environment has somehow been configured not to translate line endings in a file piped in.
edit: Joachim's suggestion is much more likely - using a \r\n file on a non-Windows system. If this is the case , you can fix it by running dos2unix on the file. But in accordance with the principle "accept everything, generate correctly" it would be useful for your program to handle this file.

Using strcat to append spaces. Compiles but overwrites string

The language I am working in is C.
I am trying to use a mix of built in c string functions in order to take a list of tokens (space separated) and "convert" it into a list of tokens that is split by quotations.
A string like
echo "Hello 1 2 3 4" test test2
gets converted to
[echo] ["Hello] [1] [2] [3] [4"] [test] [test2]
I then use my code (at bottom) to attempt to convert it into something like
[echo] [Hello 1 2 3 4] [test] [test2]
For some reason the second 'token' in the quoted statement gets overridden.
Here's a snippet of the code that runs over the token list and converts it to the new one.
88 for (int i = 0; i < counter; i++) {
89 if ( (strstr(tokenized[i],"\"") != NULL) && (inQuotes == 0)) {
90 inQuotes = 1;
91 tokenizedQuoted[quoteCounter] = tokenized[i];
92 strcat(tokenizedQuoted[quoteCounter]," ");
93 } else if ( (strstr(tokenized[i],"\"") != NULL) && (inQuotes == 1)) {
94 inQuotes = 0;
95 strcat(tokenizedQuoted[quoteCounter],tokenized[i]);
96 quoteCounter++;
97 } else {
98 if (inQuotes == 0) {
99 tokenizedQuoted[quoteCounter] = tokenized[i];
100 quoteCounter++;
101 } else if (inQuotes == 1) {
102 strcat(tokenizedQuoted[quoteCounter], tokenized[i]);
103 strcat(tokenizedQuoted[quoteCounter], " ");
104 }
105 }
106
107 }
In short, adding an space to a char * means that the memory pointed by it needs more bytes. Since you do not provide it, you are overwritting the first byte of the following "word" with \0, so the char * to it is interpreted as the empty string. Note that writting to a location that has not been reserved is an undefined behavior, so really ANYTHING could happen (from segmentation fault to "correct" results with no errors).
Use malloc to create a new buffer for the expanded result with enough bytes for it (do not forget to free the old buffers if they were malloc'd).

Resources