If I analyse multiple PDF files with a hex editor, I see that all of them have two trailers.
That's possible if an object has changed or renewed (https://blog.idrsolutions.com/multiple-trailers-in-a-pdf-file/), but in my case, the PDF files are not edited.
Does anyone know why all of the analysed files have two trailers?
This is a PDF file that contains a lot of text and also two images (there are two trailers in this file, who are (almost) identical to each other: :
0001a30bh: 74 72 61 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 ; TRAILER..<</Size
0001a31bh: 20 34 37 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 ; 47/Root 1 0 R/I
0001a32bh: 6E 66 6F 20 31 35 20 30 20 52 2F 49 44 5B 3C 45 ; nfo 15 0 R/ID[<E
0001a33bh: 42 33 46 46 33 41 31 45 33 37 33 43 36 34 45 39 ; B3FF3A1E373C64E9
0001a34bh: 31 30 45 33 46 42 43 34 45 37 38 39 31 33 43 3E ; 10E3FBC4E78913C>
0001a35bh: 3C 45 42 33 46 46 33 41 31 45 33 37 33 43 36 34 ; <EB3FF3A1E373C64
0001a36bh: 45 39 31 30 45 33 46 42 43 34 45 37 38 39 31 33 ; E910E3FBC4E78913
0001a37bh: 43 3E 5D 20 3E 3E 0D 0A 73 74 61 72 74 78 72 65 ; C>] >>..startxre
0001a38bh: 66 0D 0A 31 30 36 33 32 33 0D 0A 25 25 45 4F 46 ; f..106323..%%EOF
0001a39bh: 0D 0A 78 72 65 66 0D 0A 30 20 30 0D 0A 74 72 61 ; ..xref..0 0..TRA
0001a3abh: 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 20 34 37 ; ILER..<</Size 47
0001a3bbh: 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 6E 66 6F ; /Root 1 0 R/Info
0001a3cbh: 20 31 35 20 30 20 52 2F 49 44 5B 3C 45 42 33 46 ; 15 0 R/ID[<EB3F
0001a3dbh: 46 33 41 31 45 33 37 33 43 36 34 45 39 31 30 45 ; F3A1E373C64E910E
0001a3ebh: 33 46 42 43 34 45 37 38 39 31 33 43 3E 3C 45 42 ; 3FBC4E78913C><EB
0001a3fbh: 33 46 46 33 41 31 45 33 37 33 43 36 34 45 39 31 ; 3FF3A1E373C64E91
0001a40bh: 30 45 33 46 42 43 34 45 37 38 39 31 33 43 3E 5D ; 0E3FBC4E78913C>]
0001a41bh: 20 2F 50 72 65 76 20 31 30 36 33 32 33 2F 58 52 ; /Prev 106323/XR
0001a42bh: 65 66 53 74 6D 20 31 30 35 39 37 32 3E 3E 0D 0A ; efStm 105972>>..
0001a43bh: 73 74 61 72 74 78 72 65 66 0D 0A 31 30 37 34 32 ; startxref..10742
0001a44bh: 31 0D 0A 25 25 45 4F 46 ; 1..%%EOF
This is a PDF file that does only contain some random characters:
000071cbh: 74 72 61 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 ; TRAILER..<</Size
000071dbh: 20 32 33 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 ; 23/Root 1 0 R/I
000071ebh: 6E 66 6F 20 39 20 30 20 52 2F 49 44 5B 3C 39 46 ; nfo 9 0 R/ID[<9F
000071fbh: 46 31 32 45 31 43 30 41 35 36 44 42 34 38 41 33 ; F12E1C0A56DB48A3
0000720bh: 41 31 43 37 32 30 33 38 32 33 30 32 45 32 3E 3C ; A1C720382302E2><
0000721bh: 39 46 46 31 32 45 31 43 30 41 35 36 44 42 34 38 ; 9FF12E1C0A56DB48
0000722bh: 41 33 41 31 43 37 32 30 33 38 32 33 30 32 45 32 ; A3A1C720382302E2
0000723bh: 3E 5D 20 3E 3E 0D 0A 73 74 61 72 74 78 72 65 66 ; >] >>..startxref
0000724bh: 0D 0A 32 38 36 35 39 0D 0A 25 25 45 4F 46 0D 0A ; ..28659..%%EOF..
0000725bh: 78 72 65 66 0D 0A 30 20 30 0D 0A 74 72 61 69 6C ; xref..0 0..TRAIL
0000726bh: 65 72 0D 0A 3C 3C 2F 53 69 7A 65 20 32 33 2F 52 ; ER..<</Size 23/R
0000727bh: 6F 6F 74 20 31 20 30 20 52 2F 49 6E 66 6F 20 39 ; oot 1 0 R/Info 9
0000728bh: 20 30 20 52 2F 49 44 5B 3C 39 46 46 31 32 45 31 ; 0 R/ID[<9FF12E1
0000729bh: 43 30 41 35 36 44 42 34 38 41 33 41 31 43 37 32 ; C0A56DB48A3A1C72
000072abh: 30 33 38 32 33 30 32 45 32 3E 3C 39 46 46 31 32 ; 0382302E2><9FF12
000072bbh: 45 31 43 30 41 35 36 44 42 34 38 41 33 41 31 43 ; E1C0A56DB48A3A1C
000072cbh: 37 32 30 33 38 32 33 30 32 45 32 3E 5D 20 2F 50 ; 720382302E2>] /P
000072dbh: 72 65 76 20 32 38 36 35 39 2F 58 52 65 66 53 74 ; rev 28659/XRefSt
000072ebh: 6D 20 32 38 33 37 34 3E 3E 0D 0A 73 74 61 72 74 ; m 28374>>..start
000072fbh: 78 72 65 66 0D 0A 32 39 32 37 35 0D 0A 25 25 45 ; xref..29275..%%E
0000730bh: 4F 46 ; OF
Those files are most likely created by MS Word. The excerpts you posted look like their interpretation of hybrid reference PDFs.
There are two special constructs in which the PDF specification uses the mechanisms it introduced for incremental updates for something else:
Linearized PDFs (see ISO 32000-2:2020 Annex F) and
Hybrid-reference PDFs (see ISO 32000-2:2020 Section 7.5.8.4).
Your excerpts look like the latter type of PDFs.
Some backgrounds:
With PDF 1.5 Adobe introduced the option to collect multiple non-stream indirect objects in a stream, a so called "object stream". The advantage of doing so is that data in streams can be compressed while otherwise those object cannot be compressed. At the same time they also introduced the option to put the cross reference table data into streams, the so called "cross-reference streams", also to allow compression.
Obviously a new type of cross reference entry was necessary to describe indirect objects in object streams, so they defined entries of that kind, but only for the cross-reference streams, not for the old cross reference tables.
PDFs stored using object and cross-reference streams often indeed are much smaller than the same PDFs stored as regular indirect objects with cross reference tables. On the other hand PDF processors that were not aware of these techniques couldn't open these PDFs at all.
Thus, Adobe came up with the idea of hybrid files: Files that contain the basic objects in a PDF required to view it at all in the old-fashioned way and the objects for newer or optional features in object and cross reference streams. The trailers of the cross reference tables contain an entry XRefStm pointing to the cross reference stream.
For some reason, though, it was specified that object lookup first had to be attempted in the cross reference table, and only if no entry was found there for the object number in question, the associated cross reference stream was to be searched.
As the first cross reference table is required to cover the complete range of object numbers used, this lookup strategy implied that hybrid-reference files needed a second cross reference table whose trailer could point to the cross reference stream that would be used for lookups before the innermost, first cross reference table.
This is what we see in your example:
trailer
<</Size 47/Root 1 0 R/Info 15 0 R/ID[<EB3FF3A1E373C64E910E3FBC4E78913C><EB3FF3A1E373C64E910E3FBC4E78913C>] >>
startxref
106323
%%EOF
xref
0 0
trailer
<</Size 47/Root 1 0 R/Info 15 0 R/ID[<EB3FF3A1E373C64E910E3FBC4E78913C><EB3FF3A1E373C64E910E3FBC4E78913C>] /Prev 106323/XRefStm 105972>>
startxref
107421
%%EOF
Actually most PDF producers implemented hybrid-reference files (if they did at all) under the impression that the cross reference stream and probably also the object streams should go between the first trailer and the second cross reference table. But there is no requirement for that, and the PDF export of MS Office chose to put all the streams before the first cross reference table. As that's the case for your examples, too, I assume they were produced by MS Office.
Related
I am trying to Base64 encode an RSA key after converting it to a decimal string in OpenSSL. I am able to encode most of the string, except for the last few characters and I really don't know why. I know there is a one-line function which can do this in EVP, but I already tried it outside of a test program, and I get really strange memory errors.
My code:
char *rsa_string = BN_bn2dec(n);
printf("RSA modulus as BigNumber: %s\n", rsa_string);
BIO *base64_bio = BIO_new(BIO_f_base64());
BIO *rsa_bio = BIO_new(BIO_s_mem());
BIO_set_flags(base64_bio, BIO_FLAGS_BASE64_NO_NL);
BIO_push(base64_bio, rsa_bio);
int bytes_wrote = BIO_write(base64_bio, rsa_string, strlen(rsa_string));
int size = (((bytes_wrote / 3) * 4) + 1);
char *base64_encoded_key = malloc(size);
memset(base64_encoded_key, 0, size);
BIO_read(rsa_bio, base64_encoded_key, size);
printf("PEM base64 encoded key: %s\n", base64_encoded_key);
The value of rsa_string:
24313072237482078080153238679657972370698125455764727604271323979485556075246432578166301643940422426917948305745382409095484476466658578953561552837314705043006862929591710426227781122807468842903611989398403414596798482701682344649614368612160626447765390887911656651474459060185299697662496646333059927160952849254412616907773180781994902777839713317482262499105976583975621942282154132092996199104128448823598401942504647814124310345584957610465014752297210548757190951415912761894769725791618353941501561569896413562323669731189309270885282683237856701825718378848399084628386291389996469470694254685203803176643
The Base64 encoding:
MjQzMTMwNzIyMzc0ODIwNzgwODAxNTMyMzg2Nzk2NTc5NzIzNzA2OTgxMjU0NTU3NjQ3Mjc2MDQyNzEzMjM5Nzk0ODU1NTYwNzUyNDY0MzI1NzgxNjYzMDE2NDM5NDA0MjI0MjY5MTc5NDgzMDU3NDUzODI0MDkwOTU0ODQ0NzY0NjY2NTg1Nzg5NTM1NjE1NTI4MzczMTQ3MDUwNDMwMDY4NjI5Mjk1OTE3MTA0MjYyMjc3ODExMjI4MDc0Njg4NDI5MDM2MTE5ODkzOTg0MDM0MTQ1OTY3OTg0ODI3MDE2ODIzNDQ2NDk2MTQzNjg2MTIxNjA2MjY0NDc3NjUzOTA4ODc5MTE2NTY2NTE0NzQ0NTkwNjAxODUyOTk2OTc2NjI0OTY2NDYzMzMwNTk5MjcxNjA5NTI4NDkyNTQ0MTI2MTY5MDc3NzMxODA3ODE5OTQ5MDI3Nzc4Mzk3MTMzMTc0ODIyNjI0OTkxMDU5NzY1ODM5NzU2MjE5NDIyODIxNTQxMzIwOTI5OTYxOTkxMDQxMjg0NDg4MjM1OTg0MDE5NDI1MDQ2NDc4MTQxMjQzMTAzNDU1ODQ5NTc2MTA0NjUwMTQ3NTIyOTcyMTA1NDg3NTcxOTA5NTE0MTU5MTI3NjE4OTQ3Njk3MjU3OTE2MTgzNTM5NDE1MDE1NjE1Njk4OTY0MTM1NjIzMjM2Njk3MzExODkzMDkyNzA4ODUyODI2ODMyMzc4NTY3MDE4MjU3MTgzNzg4NDgzOTkwODQ2MjgzODYyOTEzODk5OTY0Njk0NzA2OTQyNTQ2ODUyMDM4MDMxNzY2
I compare this output to other programs which encode to Base64, and notice that my program is off by 4 characters (NDM=). Why is that?
What I have tried:
OpenSSL's BIO documentation - https://www.openssl.org/docs/manmaster/man3/BIO_f_base64.html
OpenSSL's source code - https://github.com/openssl/openssl/blob/d9f073575fdb07b486cd1b38974cd177687ccc1e/apps/rand.c
Different orderings of the BIOs in the BIO operations, and this is the only configuration that produces a Base64 output.
Providing the Base64 BIO read output buffer much more memory than is needed (just in case size was miscalculated).
I tried calling read() twice, but the second call returns -1, which apparently could mean the BIO thinks there is nothing left to read.
Based on what I saw from the examples, what I have seems right. Why isn't the Base64 filter producing the entire string?
Notes:
This is just a test program to figure out how to use the APIs, thus I don't flush the BIO.
Before I added the BIO_set_flags(base64_bio, BIO_FLAGS_BASE64_NO_NL) code, the encoding was off by 40 characters... I also do not understand this.
Any help or direction is appreciated. Thanks!
The source of the data to be base64-encoded notwithstanding, a very basic operation that exhibits how it is done is as follows.
Generate a random block of 256 bytes of data.
Open a Base64 BIO and configure it.
Open a basic memory bio
Chain the aforementioned two bios.
Write data through the bio chain.
Flush the bio chain.
Reap the data from the memory bio.
Close the full BIO chain.
That's pretty much it. Now the code:
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <openssl/bio.h>
#include <openssl/rand.h>
#include <openssl/randerr.h>
int main()
{
// generate a random 256 byte block
uint8_t hdata[256] = {0};
RAND_bytes(hdata, sizeof hdata);
hdata[0] &=0x7F; // don't ask.
// display in stdout just to prove it's there
BIO_dump_fp(stdout, hdata, sizeof hdata);
// configure base64 filter
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);
// configure a memory bio
BIO *bmem = BIO_new(BIO_s_mem());
// chain bios
BIO *bio = BIO_push(b64, bmem);
// write to target chain and flush
BIO_write(bio, hdata, sizeof hdata);
BIO_flush(bio);
// reap memory buffer
char *ptr = NULL;
long len = BIO_get_mem_data(bmem, &ptr);
// dump the converted data to stdout
fwrite(ptr, (size_t)len, 1, stdout);
fputc('\n', stdout);
fflush(stdout);
// close BIO chain
BIO_free_all(bio);
return EXIT_SUCCESS;
}
Sample Output (varies, obviously)
0000 - 5b 94 cd 8b 1c 45 b9 5a-8c 5d f1 e3 08 31 55 8f [....E.Z.]...1U.
0010 - 8b 3c 29 7f 5d b1 72 82-12 6d 24 3f 04 6e 83 72 .<).].r..m$?.n.r
0020 - 1c 1d 01 5c 54 3b 2e c8-cc 47 5a 2f db ec 47 06 ...\T;...GZ/..G.
0030 - 95 25 13 f4 3b 92 2c c0-b6 88 41 d1 62 f7 f2 e4 .%..;.,...A.b...
0040 - 45 22 14 3a fc 1d a3 3b-b3 79 5b f0 c1 06 cb 56 E".:...;.y[....V
0050 - 87 f8 61 1e 82 9f 1b f0-fa 43 90 a0 12 18 28 40 ..a......C....(#
0060 - 88 32 ff f9 62 9f d6 eb-9c dc 69 fa 3a ca a5 ea .2..b.....i.:...
0070 - 20 bb 62 9d 86 e4 76 8f-20 4f 19 42 a7 0d 15 c7 .b...v. O.B....
0080 - 83 78 79 20 b2 a3 44 64-bf 7f a0 84 11 a4 38 96 .xy ..Dd......8.
0090 - 17 83 18 96 84 6b df 94-e3 66 e2 88 63 58 d8 8f .....k...f..cX..
00a0 - 49 67 b8 78 68 e2 8c 8b-55 cf 27 84 4c 35 91 80 Ig.xh...U.'.L5..
00b0 - 0c a0 63 2c f7 c0 c6 db-30 aa d9 8b 64 cb d2 8d ..c,....0...d...
00c0 - c8 71 f9 0e 93 25 66 b4-c7 65 fc 85 a9 93 93 b7 .q...%f..e......
00d0 - 72 30 e9 72 e0 26 16 2c-a3 02 54 bd f6 d2 4a 8f r0.r.&.,..T...J.
00e0 - 0b 9b 7a 6c 35 cd 2c 80-f8 50 0e 31 a9 0b 39 0c ..zl5.,..P.1..9.
00f0 - b1 cf 67 b8 65 75 a3 98-ee 0b d5 6f a0 61 8b df ..g.eu.....o.a..
W5TNixxFuVqMXfHjCDFVj4s8KX9dsXKCEm0kPwRug3IcHQFcVDsuyMxHWi/b7EcGlSUT9DuSLMC2iEHRYvfy5EUiFDr8HaM7s3lb8MEGy1aH+GEegp8b8PpDkKASGChAiDL/+WKf1uuc3Gn6Osql6iC7Yp2G5HaPIE8ZQqcNFceDeHkgsqNEZL9/oIQRpDiWF4MYloRr35TjZuKIY1jYj0lnuHho4oyLVc8nhEw1kYAMoGMs98DG2zCq2Ytky9KNyHH5DpMlZrTHZfyFqZOTt3Iw6XLgJhYsowJUvfbSSo8Lm3psNc0sgPhQDjGpCzkMsc9nuGV1o5juC9VvoGGL3w==
The flush is important. When you consider how a base64 stream filter could work, it will wait for 3 octets before emitting four. It could be even more elaborate internally, but mere mortals would likely write it as such. Therefore, it doesn't know when you are "done" writing data, ending in a state that could be on a partial/incomplete triplet until you say so, and you do that via a flush of the bio chain.
Your Text String, Base64 Encoded
Doing the same for your text string is extremely similar to the prior code. Obviously the source data is different, but the rest will be similar.
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/bio.h>
#include <openssl/evp.h>
int main()
{
const char *str =
"2431307223748207808015323867965797237069812545576472760427132397948555"
"6075246432578166301643940422426917948305745382409095484476466658578953"
"5615528373147050430068629295917104262277811228074688429036119893984034"
"1459679848270168234464961436861216062644776539088791165665147445906018"
"5299697662496646333059927160952849254412616907773180781994902777839713"
"3174822624991059765839756219422821541320929961991041284488235984019425"
"0464781412431034558495761046501475229721054875719095141591276189476972"
"5791618353941501561569896413562323669731189309270885282683237856701825"
"718378848399084628386291389996469470694254685203803176643";
const size_t slen = strlen(str);
// display in stdout
BIO_dump_fp(stdout, str, (int)slen);
// configure base64 filter
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);
// configure a memory bio
BIO *bmem = BIO_new(BIO_s_mem());
// chain bios
BIO *bio = BIO_push(b64, bmem);
// write to target chain and flush
BIO_write(bio, str, (int)slen);
BIO_flush(bio);
// reap memory buffer
char *ptr = NULL;
long len = BIO_get_mem_data(bmem, &ptr);
// dump the converted data to stdout
fwrite(ptr, (size_t)len, 1, stdout);
fputc('\n', stdout);
fflush(stdout);
// close BIO chain
BIO_free_all(bio);
return EXIT_SUCCESS;
}
Output
0000 - 32 34 33 31 33 30 37 32-32 33 37 34 38 32 30 37 2431307223748207
0010 - 38 30 38 30 31 35 33 32-33 38 36 37 39 36 35 37 8080153238679657
0020 - 39 37 32 33 37 30 36 39-38 31 32 35 34 35 35 37 9723706981254557
0030 - 36 34 37 32 37 36 30 34-32 37 31 33 32 33 39 37 6472760427132397
0040 - 39 34 38 35 35 35 36 30-37 35 32 34 36 34 33 32 9485556075246432
0050 - 35 37 38 31 36 36 33 30-31 36 34 33 39 34 30 34 5781663016439404
0060 - 32 32 34 32 36 39 31 37-39 34 38 33 30 35 37 34 2242691794830574
0070 - 35 33 38 32 34 30 39 30-39 35 34 38 34 34 37 36 5382409095484476
0080 - 34 36 36 36 35 38 35 37-38 39 35 33 35 36 31 35 4666585789535615
0090 - 35 32 38 33 37 33 31 34-37 30 35 30 34 33 30 30 5283731470504300
00a0 - 36 38 36 32 39 32 39 35-39 31 37 31 30 34 32 36 6862929591710426
00b0 - 32 32 37 37 38 31 31 32-32 38 30 37 34 36 38 38 2277811228074688
00c0 - 34 32 39 30 33 36 31 31-39 38 39 33 39 38 34 30 4290361198939840
00d0 - 33 34 31 34 35 39 36 37-39 38 34 38 32 37 30 31 3414596798482701
00e0 - 36 38 32 33 34 34 36 34-39 36 31 34 33 36 38 36 6823446496143686
00f0 - 31 32 31 36 30 36 32 36-34 34 37 37 36 35 33 39 1216062644776539
0100 - 30 38 38 37 39 31 31 36-35 36 36 35 31 34 37 34 0887911656651474
0110 - 34 35 39 30 36 30 31 38-35 32 39 39 36 39 37 36 4590601852996976
0120 - 36 32 34 39 36 36 34 36-33 33 33 30 35 39 39 32 6249664633305992
0130 - 37 31 36 30 39 35 32 38-34 39 32 35 34 34 31 32 7160952849254412
0140 - 36 31 36 39 30 37 37 37-33 31 38 30 37 38 31 39 6169077731807819
0150 - 39 34 39 30 32 37 37 37-38 33 39 37 31 33 33 31 9490277783971331
0160 - 37 34 38 32 32 36 32 34-39 39 31 30 35 39 37 36 7482262499105976
0170 - 35 38 33 39 37 35 36 32-31 39 34 32 32 38 32 31 5839756219422821
0180 - 35 34 31 33 32 30 39 32-39 39 36 31 39 39 31 30 5413209299619910
0190 - 34 31 32 38 34 34 38 38-32 33 35 39 38 34 30 31 4128448823598401
01a0 - 39 34 32 35 30 34 36 34-37 38 31 34 31 32 34 33 9425046478141243
01b0 - 31 30 33 34 35 35 38 34-39 35 37 36 31 30 34 36 1034558495761046
01c0 - 35 30 31 34 37 35 32 32-39 37 32 31 30 35 34 38 5014752297210548
01d0 - 37 35 37 31 39 30 39 35-31 34 31 35 39 31 32 37 7571909514159127
01e0 - 36 31 38 39 34 37 36 39-37 32 35 37 39 31 36 31 6189476972579161
01f0 - 38 33 35 33 39 34 31 35-30 31 35 36 31 35 36 39 8353941501561569
0200 - 38 39 36 34 31 33 35 36-32 33 32 33 36 36 39 37 8964135623236697
0210 - 33 31 31 38 39 33 30 39-32 37 30 38 38 35 32 38 3118930927088528
0220 - 32 36 38 33 32 33 37 38-35 36 37 30 31 38 32 35 2683237856701825
0230 - 37 31 38 33 37 38 38 34-38 33 39 39 30 38 34 36 7183788483990846
0240 - 32 38 33 38 36 32 39 31-33 38 39 39 39 36 34 36 2838629138999646
0250 - 39 34 37 30 36 39 34 32-35 34 36 38 35 32 30 33 9470694254685203
0260 - 38 30 33 31 37 36 36 34-33 803176643
MjQzMTMwNzIyMzc0ODIwNzgwODAxNTMyMzg2Nzk2NTc5NzIzNzA2OTgxMjU0NTU3NjQ3Mjc2MDQyNzEzMjM5Nzk0ODU1NTYwNzUyNDY0MzI1NzgxNjYzMDE2NDM5NDA0MjI0MjY5MTc5NDgzMDU3NDUzODI0MDkwOTU0ODQ0NzY0NjY2NTg1Nzg5NTM1NjE1NTI4MzczMTQ3MDUwNDMwMDY4NjI5Mjk1OTE3MTA0MjYyMjc3ODExMjI4MDc0Njg4NDI5MDM2MTE5ODkzOTg0MDM0MTQ1OTY3OTg0ODI3MDE2ODIzNDQ2NDk2MTQzNjg2MTIxNjA2MjY0NDc3NjUzOTA4ODc5MTE2NTY2NTE0NzQ0NTkwNjAxODUyOTk2OTc2NjI0OTY2NDYzMzMwNTk5MjcxNjA5NTI4NDkyNTQ0MTI2MTY5MDc3NzMxODA3ODE5OTQ5MDI3Nzc4Mzk3MTMzMTc0ODIyNjI0OTkxMDU5NzY1ODM5NzU2MjE5NDIyODIxNTQxMzIwOTI5OTYxOTkxMDQxMjg0NDg4MjM1OTg0MDE5NDI1MDQ2NDc4MTQxMjQzMTAzNDU1ODQ5NTc2MTA0NjUwMTQ3NTIyOTcyMTA1NDg3NTcxOTA5NTE0MTU5MTI3NjE4OTQ3Njk3MjU3OTE2MTgzNTM5NDE1MDE1NjE1Njk4OTY0MTM1NjIzMjM2Njk3MzExODkzMDkyNzA4ODUyODI2ODMyMzc4NTY3MDE4MjU3MTgzNzg4NDgzOTkwODQ2MjgzODYyOTEzODk5OTY0Njk0NzA2OTQyNTQ2ODUyMDM4MDMxNzY2NDM=
A copy of that base64 line decoded at base64decode.org will line up perfectly with your source string.
BIGNUM Base64 Encoding
On the off-chance my suspicion is correct, and what your ultimate goal is a base64 encoding of the bignum itself, that too is doable, with barely any more work. Using the original source string, we can create a BIGNUM, then send that through the base64 filtered bio chain just as before:
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <openssl/bio.h>
#include <openssl/bn.h>
#include <openssl/evp.h>
int main()
{
const char *str =
"2431307223748207808015323867965797237069812545576472760427132397948555"
"6075246432578166301643940422426917948305745382409095484476466658578953"
"5615528373147050430068629295917104262277811228074688429036119893984034"
"1459679848270168234464961436861216062644776539088791165665147445906018"
"5299697662496646333059927160952849254412616907773180781994902777839713"
"3174822624991059765839756219422821541320929961991041284488235984019425"
"0464781412431034558495761046501475229721054875719095141591276189476972"
"5791618353941501561569896413562323669731189309270885282683237856701825"
"718378848399084628386291389996469470694254685203803176643";
// convert relaly big integer asci string to a bignum
BIGNUM *bn = NULL;
if (BN_asc2bn(&bn, str) == 1)
{
// convert the big number to big-endian binary.
int nbytes = BN_num_bytes(bn);
unsigned char *bin = OPENSSL_malloc(nbytes);
BN_bn2bin(bn, bin);
// dump to stdout
BIO_dump_fp(stdout, bin, nbytes);
// configure base64 filter
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);
// configure bio chain
BIO *bmem = BIO_new(BIO_s_mem());
BIO *bio = BIO_push(b64, bmem);
BIO_write(bio, bin, nbytes);;
BIO_flush(bio);
// no longer need the BN or buffer
OPENSSL_clear_free(bin, nbytes);
BN_free(bn);
// reap memory buffer
char *ptr = NULL;
long len = BIO_get_mem_data(bmem, &ptr);
// dump the converted data to stdout
fwrite(ptr, (size_t)len, 1, stdout);
fputc('\n', stdout);
fflush(stdout);
// close BIO chain
BIO_free_all(bio);
}
return EXIT_SUCCESS;
}
Output
0000 - c0 98 bc ce e0 43 b1 06-de 88 ff c2 f2 02 6d 08 .....C........m.
0010 - 92 99 30 4d 06 c4 e1 39-18 48 f8 24 bd a6 7e e0 ..0M...9.H.$..~.
0020 - 3d c3 7a 59 1d ff 70 a6-2e 8b 5d c9 c6 3d 38 cb =.zY..p...]..=8.
0030 - aa f7 4a d6 1b 24 54 c4-9f 4f 74 b4 52 2b 9a 89 ..J..$T..Ot.R+..
0040 - 3b 72 d8 ce 60 fa dc 72-36 9d 0a 31 45 32 54 94 ;r..`..r6..1E2T.
0050 - 61 c4 9e 05 32 68 08 d8-d8 41 e3 0c d5 b3 81 13 a...2h...A......
0060 - ec 5c 95 c5 23 a7 71 a3-0d c0 e6 13 04 14 db 6c .\..#.q........l
0070 - 9d f2 10 e0 52 ff 44 be-a9 c4 8a 8e ee 13 3f 4e ....R.D.......?N
0080 - a1 3e 04 72 ea 35 5f 42-04 e7 aa b0 82 df c1 07 .>.r.5_B........
0090 - a6 db 7d d5 81 b7 33 cf-a9 bc 95 76 63 ae 9b 7a ..}...3....vc..z
00a0 - f9 11 79 c0 31 8d e0 53-11 e6 34 3d a9 a6 53 9c ..y.1..S..4=..S.
00b0 - 90 a6 68 da 0a 94 05 b3-0e 79 be a6 8b 82 4c 27 ..h......y....L'
00c0 - 76 3e 76 59 81 db dd dd-27 c4 ea cd b1 d9 b1 86 v>vY....'.......
00d0 - e3 7b f0 9a 7c d5 3c aa-ae a1 8d f0 96 73 2c 96 .{..|.<......s,.
00e0 - 53 01 4e 49 2e 4b ed 86-cd 98 60 99 8b 5a 21 c5 S.NI.K....`..Z!.
00f0 - 1a e6 b7 1d 45 8f d1 bf-83 f9 bc e6 46 14 7e c3 ....E.......F.~.
wJi8zuBDsQbeiP/C8gJtCJKZME0GxOE5GEj4JL2mfuA9w3pZHf9wpi6LXcnGPTjLqvdK1hskVMSfT3S0UiuaiTty2M5g+txyNp0KMUUyVJRhxJ4FMmgI2NhB4wzVs4ET7FyVxSOncaMNwOYTBBTbbJ3yEOBS/0S+qcSKju4TP06hPgRy6jVfQgTnqrCC38EHptt91YG3M8+pvJV2Y66bevkRecAxjeBTEeY0PammU5yQpmjaCpQFsw55vqaLgkwndj52WYHb3d0nxOrNsdmxhuN78Jp81TyqrqGN8JZzLJZTAU5JLkvths2YYJmLWiHFGua3HUWP0b+D+bzmRhR+ww==
The above shows the 256 bytes in the console dump as representing the number in big-endian binary form. That is the data that is base64 encoded and produces final line of output. The code should look very familiar, as it is basically identical to the first sample in this answer; only the source of data is different.
I suspect this is what you will need eventually, though you're posted code suggests you're already starting with a bignum, so the beginning of the above code that creates a bignum from the string is already done.
I have 2 unsigned char arrays, when I print them both using BIO_dump_fp, I see that array1 has the same value in the second column as array2 in the third column (but in lower case):
array1:
0000 - ff 1d 43 5f 99 24 a8 60-bb 09 3b 83 ca 4d 7d 50 ..C_.$.`..;..M}P
0010 - 73 cb 98 24 9d 55 39 e8-dc 2b d2 90 f0 c2 db d5 s..$.U9..+......
array2:
0000 - 46 46 31 44 34 33 35 46-39 39 32 34 41 38 36 30 FF1D435F9924A860
0010 - 42 42 30 39 33 42 38 33-43 41 34 44 37 44 35 30 BB093B83CA4D7D50
0020 - 37 33 43 42 39 38 32 34-39 44 35 35 33 39 45 38 73CB98249D5539E8
0030 - 44 43 32 42 44 32 39 30-46 30 43 32 44 42 44 35 DC2BD290F0C2DBD5
Can somebody please explain to me what columns 2 and 3 actually represent, and how to convert array2 to the same format as array1 (as they seem to contain the same data)?
I am reading text from a file and outputting it as binary. I have modified the binary conversion as per follows:
Each capital letter shall start with 01 and will be followed by 5 bits.
The 5 bits shall hold the value of the letter.
The letters will have the value as A-2,B-3,C-4,D-5...
For example: HI-> (0101001)(0101010)
My code snippet is as follows:
void printinbits(int n)
{
for (int c = 4; c >= 0; c--)
{
long int k = n >> c;
if (k & 1)
printf("1");
else
printf("0");
}
}
int main()
{
//first letter is being repeated
char check[200];
FILE*fin= fopen("/Users/priya/Desktop/test.txt.rtf","r");
while((fscanf(fin,"%199s",check))==1)
{
for(int i=0;i<strlen(check);++i)
{
if(check[i]>=65&&check[i]<=90)
{
printf("01");
int n=check[i];
n-=63;
printinbits(n);
}
}
}
return 0;
}
My input->
HELLO
My output->
(0101001)(0101001)(0100110)(0101101)(0101101)(0110000)
(As you can see, the first letter H is being repeated)(Various letters are separated by brackets)
Here's a hex dump of a file hello.rtf containing the word HELLO in upper case. It was generated by TextEdit on a Mac.
0x0000: 7B 5C 72 74 66 31 5C 61 6E 73 69 5C 61 6E 73 69 {\rtf1\ansi\ansi
0x0010: 63 70 67 31 32 35 32 5C 63 6F 63 6F 61 72 74 66 cpg1252\cocoartf
0x0020: 31 34 30 34 5C 63 6F 63 6F 61 73 75 62 72 74 66 1404\cocoasubrtf
0x0030: 34 36 30 0A 7B 5C 66 6F 6E 74 74 62 6C 5C 66 30 460.{\fonttbl\f0
0x0040: 5C 66 73 77 69 73 73 5C 66 63 68 61 72 73 65 74 \fswiss\fcharset
0x0050: 30 20 48 65 6C 76 65 74 69 63 61 3B 7D 0A 7B 5C 0 Helvetica;}.{\
0x0060: 63 6F 6C 6F 72 74 62 6C 3B 5C 72 65 64 32 35 35 colortbl;\red255
0x0070: 5C 67 72 65 65 6E 32 35 35 5C 62 6C 75 65 32 35 \green255\blue25
0x0080: 35 3B 7D 0A 5C 6D 61 72 67 6C 31 34 34 30 5C 6D 5;}.\margl1440\m
0x0090: 61 72 67 72 31 34 34 30 5C 76 69 65 77 77 31 30 argr1440\vieww10
0x00A0: 38 30 30 5C 76 69 65 77 68 38 34 30 30 5C 76 69 800\viewh8400\vi
0x00B0: 65 77 6B 69 6E 64 30 0A 5C 70 61 72 64 5C 74 78 ewkind0.\pard\tx
0x00C0: 37 32 30 5C 74 78 31 34 34 30 5C 74 78 32 31 36 720\tx1440\tx216
0x00D0: 30 5C 74 78 32 38 38 30 5C 74 78 33 36 30 30 5C 0\tx2880\tx3600\
0x00E0: 74 78 34 33 32 30 5C 74 78 35 30 34 30 5C 74 78 tx4320\tx5040\tx
0x00F0: 35 37 36 30 5C 74 78 36 34 38 30 5C 74 78 37 32 5760\tx6480\tx72
0x0100: 30 30 5C 74 78 37 39 32 30 5C 74 78 38 36 34 30 00\tx7920\tx8640
0x0110: 5C 70 61 72 64 69 72 6E 61 74 75 72 61 6C 5C 70 \pardirnatural\p
0x0120: 61 72 74 69 67 68 74 65 6E 66 61 63 74 6F 72 30 artightenfactor0
0x0130: 0A 0A 5C 66 30 5C 66 73 32 34 20 5C 63 66 30 20 ..\f0\fs24 \cf0
0x0140: 48 45 4C 4C 4F 7D HELLO}
0x0146:
You may or may not be able to see the H of 'Helvetica' as the only other capital letter in the file — that would account for producing the output for HHELLO. It looks like you might be on a Mac too, so maybe you'd see the same result — or, at least, an equivalent one. (I used a homebrew hex dump program; you'd probably use xxd -g 1 test.txt.rtf, which would produce the hex with lower-case letters, and wouldn't include the final byte count line.)
You could, and should, print the data that your program reads in the loop, at least while debugging it, so that you can see what the program is processing. This is a very basic debugging technique.
In TextEdit, you can switch between rich text and plain text with the 'Make Plain Text' or 'Make Rich Text' option under the Format menu, or using ⇧⌘T (shift command T) to toggle between the two modes. Note how the file name changes as you do that.
Community Wiki since M Oehm pointed out the likely problem.
I have a problem with reading string of triples digit-digit-space.
Relevant code (lib's may / may not be needed in this particular code):
#include <stdio.h>
#include <stdlib.h>
#define N 20
#define M 20
#define n (3*N*M)
int main()
{
char str_t[n];
scanf("%s", str_t);
for(i=0;i<n;i++)
printf("%c", str_t[i]);
return 0;
}
The input is as mentioned set of triples repeated 399 times finished with d-d, saved to char array[1200].
I assume that pasting into console is okay since I did it before. When it comes to printing back the array, I get random mumbo jumbo like: 3�X2���W2��#M!�
Input:
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08 49 49 99
40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00 81 49 31 73 55 79
14 29 93 71 40 67 53 88 30 03 49 13 36 65 52 70 95 23 04 60 11 42 69
24 68 56 01 32 56 71 37 02 36 91 22 31 16 71 51 67 63 89 41 92 36 54
22 40 40 28 66 33 13 80 24 47 32 60 99 03 45 02 44 75 33 53 78 36 84
20 35 17 12 50 32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38
64 70 67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21 24
55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72 21 36 23 09
75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95 78 17 53 28 22 75 31
67 15 94 03 80 04 62 16 14 09 53 56 92 16 39 05 42 96 35 31 47 55 58
88 24 00 17 54 24 36 29 85 57 86 56 00 48 35 71 89 07 05 44 44 37 44
60 21 58 51 54 17 58 19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77
04 89 55 40 04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98
66 88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69 04 42
16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36 20 69 36 41 72
30 23 88 34 62 99 69 82 67 59 85 74 04 36 16 20 73 35 29 78 31 90 01
74 31 49 71 48 86 81 16 23 57 05 54 01 70 54 71 83 51 54 69 16 92 33
48 61 43 52 01 89 19 67 48
From http://linux.die.net/man/3/scanf, concerning the %s format:
Matches a sequence of non-white-space characters... The input string
stops at white space
If the input consists of a single line, you can use fgets instead of scanf:
fgets(str_t, n, stdin);
I'm new to C and system programming. I want to open an archive file and print out the name of the files inside the archive file (e.g., my archive file is weds.a; inside weds.a, I have thurs.txt and fri.txt". I want to create an output that shows
thurs.txt
fri.txt
EDITED: It should work like the ar -t command.
Can someone give me some tips on how to do it? I've been reading the man page and looking for examples online, but I'm getting no where. I believe I'm missing something. The code I have below only prints the link count. Can someone help? Thanks in advance for your help!!
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/utsname.h>
#include <ctype.h>
#include <string.h>
int main (int argc, char **argv)
{
int in_fd;
struct stat sb;
if (argc != 2) {
printf("Error", argv[0]);
exit(EXIT_FAILURE);
}
if (stat(argv[1], &sb) == -1) {
perror("stat");
exit(EXIT_FAILURE); //change from EXIT_SUCCESS to EXIT_FAILURE
}
//open the archive file (e.g., hw.a)
in_fd = open(argv[1], O_RDONLY);
if (in_fd == -1)
{
perror("Can't open input file\n");
exit(-1);
}
printf("Link Count: %ld\n", (long)sb.st_nlink);
return 0;
}
The easiest way is to use the ar program to list the names:
ar -tv weds.a
The - is optional; the v means you'll get size and time information.
If you want to get into reading the archive file itself, you'll have to be aware of the differences in the formats on different systems. The relevant header is (normally) <ar.h>. I have information for a number of systems, many of them obsolete, and there are a variety of different tricks used for handling long file names (and other even more basic file format issues) but you may have a more limited scope in mind. Any such work based on <ar.h> will be non-trivial; you're best off reusing what already exists (the ar program) if at all possible.
This is an archive from a Mac OS X 10.8.4 machine.
$ cat thurs.txt
0123456789:;<=>?#ABCDEFGHIJKLMNO
$ cat fri.txt
PQRSTUVWXYZ[\]^_`abcdefghijklmno
$ odx weds.a
0x0000: 21 3C 61 72 63 68 3E 0A 74 68 75 72 73 2E 74 78 !<arch>.thurs.tx
0x0010: 74 20 20 20 20 20 20 20 31 33 37 34 30 39 36 30 t 13740960
0x0020: 31 32 20 20 32 38 37 36 20 20 35 30 30 30 20 20 12 2876 5000
0x0030: 31 30 30 36 34 34 20 20 33 33 20 20 20 20 20 20 100644 33
0x0040: 20 20 60 0A 30 31 32 33 34 35 36 37 38 39 3A 3B `.0123456789:;
0x0050: 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B <=>?#ABCDEFGHIJK
0x0060: 4C 4D 4E 4F 0A 0A 66 72 69 2E 74 78 74 20 20 20 LMNO..fri.txt
0x0070: 20 20 20 20 20 20 31 33 37 34 30 39 36 30 30 35 1374096005
0x0080: 20 20 32 38 37 36 20 20 35 30 30 30 20 20 31 30 2876 5000 10
0x0090: 30 36 34 34 20 20 33 33 20 20 20 20 20 20 20 20 0644 33
0x00A0: 60 0A 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D `.PQRSTUVWXYZ[\]
0x00B0: 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D ^_`abcdefghijklm
0x00C0: 6E 6F 0A 0A no..
0x00C4:
$
Fortunately for you, the same files produce essentially the same archive on Linux too. In the Linux header <ar.h> you find:
/* Archive files start with the ARMAG identifying string. Then follows a
`struct ar_hdr', and as many bytes of member file data as its `ar_size'
member indicates, for each member file. */
#define ARMAG "!<arch>\n" /* String that begins an archive file. */
#define SARMAG 8 /* Size of that string. */
#define ARFMAG "`\n" /* String in ar_fmag at end of each header. */
struct ar_hdr
{
char ar_name[16]; /* Member file name, sometimes / terminated. */
char ar_date[12]; /* File date, decimal seconds since Epoch. */
char ar_uid[6], ar_gid[6]; /* User and group IDs, in ASCII decimal. */
char ar_mode[8]; /* File mode, in ASCII octal. */
char ar_size[10]; /* File size, in ASCII decimal. */
char ar_fmag[2]; /* Always contains ARFMAG. */
};
The Mac OS X header has the same structure and ARMAG and ARFMAG values, but one extra macro:
#define AR_EFMT1 "#1/" /* extended format #1 */
You can see the ARMAG string at the start of the file. Each file is then preceded by a struct ar_hdr. Note that the example names here are blank terminated, not slash terminated.
After that, you find the data for the file. You can read the header in its entirety. Note that if any of the names reaches above 15 characters, or if a name contains spaces, then you get an extra entry at the start of the archive file that contains the file name strings, and you also get a modified name entry in the per-file header that identifies the relevant string in the string table.
Linux archive with long names etc
0x0000: 21 3C 61 72 63 68 3E 0A 2F 2F 20 20 20 20 20 20 !<arch>.//
0x0010: 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
* (1)
0x0030: 20 20 20 20 20 20 20 20 34 36 20 20 20 20 20 20 46
0x0040: 20 20 60 0A 66 69 6C 74 65 72 2E 73 74 64 65 72 `.filter.stder
0x0050: 72 2E 73 68 2F 0A 6C 6F 6E 67 20 6E 61 6D 65 20 r.sh/.long name
0x0060: 77 69 74 68 20 73 70 61 63 65 73 2E 74 78 74 2F with spaces.txt/
0x0070: 0A 0A 74 68 75 72 73 2E 74 78 74 2F 20 20 20 20 ..thurs.txt/
0x0080: 20 20 31 33 37 34 30 39 36 32 31 31 20 20 31 39 1374096211 19
0x0090: 39 34 38 34 35 30 30 30 20 20 31 30 30 36 34 30 94845000 100640
0x00A0: 20 20 33 33 20 20 20 20 20 20 20 20 60 0A 30 31 33 `.01
0x00B0: 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 23456789:;<=>?#A
0x00C0: 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 0A 0A BCDEFGHIJKLMNO..
0x00D0: 66 72 69 2E 74 78 74 2F 20 20 20 20 20 20 20 20 fri.txt/
0x00E0: 31 33 37 34 30 39 36 31 39 37 20 20 31 39 39 34 1374096197 1994
0x00F0: 38 34 35 30 30 30 20 20 31 30 30 36 34 30 20 20 845000 100640
0x0100: 33 33 20 20 20 20 20 20 20 20 60 0A 50 51 52 53 33 `.PQRS
0x0110: 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 TUVWXYZ[\]^_`abc
0x0120: 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 0A 0A 2F 30 defghijklmno../0
0x0130: 20 20 20 20 20 20 20 20 20 20 20 20 20 20 31 33 13
0x0140: 37 31 31 34 35 35 38 34 20 20 31 39 39 34 38 34 71145584 199484
0x0150: 35 30 30 30 20 20 31 30 30 36 34 30 20 20 32 33 5000 100640 23
0x0160: 30 20 20 20 20 20 20 20 60 0A 23 21 2F 62 69 6E 0 `.#!/bin
0x0170: 2F 62 61 73 68 0A 73 65 74 20 2D 78 0A 72 6D 20 /bash.set -x.rm
0x0180: 2D 66 20 6F 75 74 2E 5B 31 32 33 5D 0A 2E 2F 67 -f out.[123]../g
0x0190: 65 6E 6F 75 74 65 72 72 2E 73 68 20 31 3E 2F 64 enouterr.sh 1>/d
0x01A0: 65 76 2F 6E 75 6C 6C 0A 2E 2F 67 65 6E 6F 75 74 ev/null../genout
0x01B0: 65 72 72 2E 73 68 20 32 3E 2F 64 65 76 2F 6E 75 err.sh 2>/dev/nu
0x01C0: 6C 6C 0A 28 20 2E 2F 67 65 6E 6F 75 74 65 72 72 ll.( ./genouterr
0x01D0: 2E 73 68 20 32 3E 26 31 20 31 3E 26 33 20 7C 20 .sh 2>&1 1>&3 |
0x01E0: 67 72 65 70 20 27 5B 30 2D 39 5D 30 27 20 3E 26 grep '[0-9]0' >&
0x01F0: 32 29 20 33 3E 6F 75 74 2E 33 20 32 3E 6F 75 74 2) 3>out.3 2>out
0x0200: 2E 32 20 31 3E 6F 75 74 2E 31 0A 6C 73 20 2D 6C .2 1>out.1.ls -l
0x0210: 20 6F 75 74 2E 5B 31 32 33 5D 0A 28 20 2E 2F 67 out.[123].( ./g
0x0220: 65 6E 6F 75 74 65 72 72 2E 73 68 20 32 3E 26 31 enouterr.sh 2>&1
0x0230: 20 31 3E 26 33 20 7C 20 67 72 65 70 20 27 5B 30 1>&3 | grep '[0
0x0240: 2D 39 5D 30 27 20 3E 26 32 29 20 33 3E 26 31 0A -9]0' >&2) 3>&1.
0x0250: 2F 31 38 20 20 20 20 20 20 20 20 20 20 20 20 20 /18
0x0260: 31 33 37 34 30 39 36 35 37 37 20 20 31 39 39 34 1374096577 1994
0x0270: 38 34 35 30 30 30 20 20 31 30 30 36 34 30 20 20 845000 100640
0x0280: 33 33 20 20 20 20 20 20 20 20 60 0A 30 31 32 33 33 `.0123
0x0290: 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 456789:;<=>?#ABC
0x02A0: 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 0A 0A DEFGHIJKLMNO..
0x02AE:
Mac OS X archive with long names etc
0x0000: 21 3C 61 72 63 68 3E 0A 74 68 75 72 73 2E 74 78 !<arch>.thurs.tx
0x0010: 74 20 20 20 20 20 20 20 31 33 37 34 30 39 36 30 t 13740960
0x0020: 31 32 20 20 32 38 37 36 20 20 35 30 30 30 20 20 12 2876 5000
0x0030: 31 30 30 36 34 34 20 20 33 33 20 20 20 20 20 20 100644 33
0x0040: 20 20 60 0A 30 31 32 33 34 35 36 37 38 39 3A 3B `.0123456789:;
0x0050: 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B <=>?#ABCDEFGHIJK
0x0060: 4C 4D 4E 4F 0A 0A 66 72 69 2E 74 78 74 20 20 20 LMNO..fri.txt
0x0070: 20 20 20 20 20 20 31 33 37 34 30 39 36 30 30 35 1374096005
0x0080: 20 20 32 38 37 36 20 20 35 30 30 30 20 20 31 30 2876 5000 10
0x0090: 30 36 34 34 20 20 33 33 20 20 20 20 20 20 20 20 0644 33
0x00A0: 60 0A 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D `.PQRSTUVWXYZ[\]
0x00B0: 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D ^_`abcdefghijklm
0x00C0: 6E 6F 0A 0A 66 69 6C 74 65 72 2E 73 74 64 65 72 no..filter.stder
0x00D0: 72 2E 73 68 31 33 37 34 30 39 37 37 39 34 20 20 r.sh1374097794
0x00E0: 32 38 37 36 20 20 35 30 30 30 20 20 31 30 30 36 2876 5000 1006
0x00F0: 34 34 20 20 32 33 30 20 20 20 20 20 20 20 60 0A 44 230 `.
0x0100: 23 21 2F 62 69 6E 2F 62 61 73 68 0A 73 65 74 20 #!/bin/bash.set
0x0110: 2D 78 0A 72 6D 20 2D 66 20 6F 75 74 2E 5B 31 32 -x.rm -f out.[12
0x0120: 33 5D 0A 2E 2F 67 65 6E 6F 75 74 65 72 72 2E 73 3]../genouterr.s
0x0130: 68 20 31 3E 2F 64 65 76 2F 6E 75 6C 6C 0A 2E 2F h 1>/dev/null../
0x0140: 67 65 6E 6F 75 74 65 72 72 2E 73 68 20 32 3E 2F genouterr.sh 2>/
0x0150: 64 65 76 2F 6E 75 6C 6C 0A 28 20 2E 2F 67 65 6E dev/null.( ./gen
0x0160: 6F 75 74 65 72 72 2E 73 68 20 32 3E 26 31 20 31 outerr.sh 2>&1 1
0x0170: 3E 26 33 20 7C 20 67 72 65 70 20 27 5B 30 2D 39 >&3 | grep '[0-9
0x0180: 5D 30 27 20 3E 26 32 29 20 33 3E 6F 75 74 2E 33 ]0' >&2) 3>out.3
0x0190: 20 32 3E 6F 75 74 2E 32 20 31 3E 6F 75 74 2E 31 2>out.2 1>out.1
0x01A0: 0A 6C 73 20 2D 6C 20 6F 75 74 2E 5B 31 32 33 5D .ls -l out.[123]
0x01B0: 0A 28 20 2E 2F 67 65 6E 6F 75 74 65 72 72 2E 73 .( ./genouterr.s
0x01C0: 68 20 32 3E 26 31 20 31 3E 26 33 20 7C 20 67 72 h 2>&1 1>&3 | gr
0x01D0: 65 70 20 27 5B 30 2D 39 5D 30 27 20 3E 26 32 29 ep '[0-9]0' >&2)
0x01E0: 20 33 3E 26 31 0A 23 31 2F 32 38 20 20 20 20 20 3>&1.#1/28
0x01F0: 20 20 20 20 20 20 31 33 37 34 30 39 37 38 32 32 1374097822
0x0200: 20 20 32 38 37 36 20 20 35 30 30 30 20 20 31 30 2876 5000 10
0x0210: 30 36 34 34 20 20 36 31 20 20 20 20 20 20 20 20 0644 61
0x0220: 60 0A 6C 6F 6E 67 20 6E 61 6D 65 20 77 69 74 68 `.long name with
0x0230: 20 73 70 61 63 65 73 2E 74 78 74 00 00 00 30 31 spaces.txt...01
0x0240: 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 23456789:;<=>?#A
0x0250: 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 0A 0A BCDEFGHIJKLMNO..
0x0260:
Differences
The Linux archive has a list of strings at the top of the file that have to be remembered. The Mac OS X archive has the special entry #1/28 which identifies the header as being followed by a 28-byte entry containing the file name (null padded to a multiple of 4 bytes; the length given includes the null padding). The Mac archive has no space after the name when it is exactly 16 characters long; the Linux archive puts the 16-character name into the string table.