Transferring external images & files to Google Cloud Storage - google-app-engine

If I use the Google Cloud Storage File Transfer console
https://console.cloud.google.com/storage/transfer?project=XXXX
How do I generate an MD5 string for my image? Say my image is located at https://www.planwallpaper.com/static/images/desktop-year-of-the-tiger-images-wallpaper.jpg for example.
I can easily get the bytes value, but how would I generate the MD5 for this?
The docs were a bit vague. Any ideas?

An MD5 hash is used to ensure the data transferred into GCS is imported correctly. HTTPS data transfers include a variety of built-in checksums, but for very large imports of many, many files, errors can and do show up, and so GCS wants to be sure that each object that it downloads is exactly what you think it is.
An MD5 is a 128 bit number that is the result of running the MD5 algorithm on an object. This number can be represented in a variety of ways (the popular md5sum command uses hexadecimal strings). GCS asks that you represent this number as a base64 encoding. Here's a command that can generate an MD5 sum in the right format:
openssl md5 -binary NameOfSourceFile | openssl enc -base64
There's a standard GCS object that can be used to validate your MD5 logic. The object https://storage.googleapis.com/md5-test/md5-test has a base64'd MD5 string of BfnRTwvHpofMOn2Pq7EVyQ==.

Related

Storing sensitive data within the source of compiled programs

In this example usage of libcurl the program will "log in" to an IMAP server using a user and secret password string, stored in two separate strings within the source, for the sake of brevity. Most real email clients (gnus, mutt) read from a plain text config file or an encrypted file, if I were to compile the example file with a real username and secret key, would it be possible to decompile the binary produced to parse the private key? I imagine a private key entered into a macro before compilation is more secure than reading a config file after compiling, is this understanding correct?
If the username and secret are strings and are simply included in the source, they can be extracted from the binary executable without even decompiling. See the strings command. You can take steps to include the username and secret in the source such that they are not readily findable by strings (e.g., by XORing the string with some other bit pattern in the program before adding it, then XORing in the executable to recover the original). This is "security through obscurity," however, and is not recommended.
Placing the username and secret in a configuration file lets you use the system's file permissions so that people using the binary may not necessarily have permission to read the file. You may also be able to set up a PKI authentication arrangement, or possibly use Kerberos key authorization.
The amount of effort to go to depends on the value of what you're trying to protect.

Is there a tool to search differences in hex files?

I have two hex files, and want to search for values that have changed from 9C in one file to 9D in the next one.
Even a tool that synchronizes the windows would work, as I can't seem to find any that enable me to do this. All I've found are tools that let me search for the same value and same address between two files.
Thanks!
"There's no such thing as a hex file."
Oh well perhaps I should delete all my hex files.
I use Motorola and Intel formats of hex files (they are just text files representing bytes, but include a record structure for addresses, checksum etc per line. These files need to be treated as binary (in .gitattributes) since they cannot be merged. I use Kdiff3 to see differences (or any other text diff tool).

HDF gzip compression vs. ASCII gzip compression

I have a 2D matrix with 1100x1600 data points. Initially, I stored it in an ascii-file which I tar-zipped using the command
tar -cvzf ascii_file.tar.gz ascii_file
Now, I wanted to switch to hdf5 files, but they are too large, at least in the way I am using them... First, I write the array into an hdf5-file using the c-procedures
H5Fcreate, H5Screate_simple, H5Dcreate, H5Dwrite
in that order. The data is not compressed within the hdf-file and it is relatively large, so I compressed it using the command
h5repack --filter=GZIP=9 hdf5_file hdf5_file.gzipped
Unfortunatelly, this hdf file with the zipped content is still larger than the compressed ascii file by a factor of 5, see the following table:
file size
--------------------------
ascii_file 5721600
ascii_file.tar.gz 287408
hdf5_file 7042144
hdf5_file.gzipped 1117033
Now my question(s): Why is the gzipped ascii-file so much smaller and is there a way to make the hdf-file smaller?
Thanks.
well, after reading Mark Adler's comment, I realized that this question is somehow stupid: In the ascii case, the values are truncated after a certain number of digits, whereas in the hdf case the "real" values ("real" = whatever precision the data type has I am using) are stored.
There was, however, one possibility to further reduce the size of my hdf file: by applying the shuffle filter using the option
--filter=SHUF

Hexadecimal virus signatures database

Over the past couple of weeks, I was in the process of developing a simple virus scanner. It works great but my question is does anybody know where I can get a database (a single file) that contains 8000 or more virus signatures WITH their names, and possibly risk meter (high, low, unknown)?
Try the ClamAV database. This also includes some more complex signatures, but some are just byte sequences.
The CVD file format is a compressed tar file with a header block attached; see here for header information, or this PDF for the real details.
As I understand it, you should be able to decompress it with
dd if=file.cvd bs=512 skip=1 | tar zxvf -
This will unpack to a collection of various files; for files that have simple hex signatures, these will be found in a file with the extension .db. Not all of these signatures are pure hex -- many of them contain wildcards such as ?? for "allow any byte here", * for "allow any number of intervening bytes here", (-4096) for "allow up to 4k of intervening bytes here", and so forth.

MD5 implementation in C for a XML file

I need to implement the MD5 checksum to verify a MD5 checksum in a XML file including all XML tags and which has received from our client. The length of the received MD5 checksum is 32 byte hexadecimal digits.
We need set MD5 Checksum field should be 0 in received XML file prior to checksum calculation and we have to indepandantly calculate and verify the MD5 checksum value in a received XML file.
Our application is implemented in C. Please assist me on how to implement this.
Thanks
This directly depends on the library used for XML parsing. This is tricky however, because you can't embed the MD5 in the XML file itself, for after embedding the checksum inside, unless you do the checksum only from the specific elements. As I understand you receive the MD5 independently? Is it calculated from the whole file, or only the tags/content?
MD5 Public Domain code link - http://www.fourmilab.ch/md5/
XML library for C - http://xmlsoft.org/
Exact solutions depend on the code used.
Based on your comment you need to do the following steps:
load the xml file (possibly even as plain-text) read the MD5
substitute the MD5 in the file with zero, write the file down (or better to memory)
run MD5 on the pure file data and compare it with the value stored before
There are public-domain implementations of MD5 that you should use, instead of writing your own. I hear that Colin Plumb's version is widely used.
Don't reinvent the wheel, use a proven existing solution: http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html
Incidentally that was the first link that came up when I googled "md5 c implementation".
This is rather nasty. The approach suggested seems to imply you need to parse the XML document into something like a DOM tree, find the MD5 checksum and store it for future reference. Then you would replace the checksum with 0 before re-serializing the document and calculating it's MD5 hash. This all sounds doable but potentially tricky. The major difficulty I see is that your new serialization of the document may not be the same as the original one and irrelevant (to XML) differences like the use of single or double quotes around attribute values, added line breaks or even a different encoding will cause the hashs to differ. If you go down this route you'll need to make sure your app and the procedure used to create the document in the first place make the same choices. For this sort of problem canonical XML is the standard solution (http://www.w3.org/TR/xml-c14n).
However, I would do something different. With any luck it should be quite easy to write a regular expression to locate the MD5 hash in the file and replace it with 0. You can then use this to grab the hash and replace with 0 it in the XML file before recalculating the hash. This sidesteps all the possible issues with parsing, changing and re-serializing the XML document. To illustrate I'm going to assume the hash '33d4046bea07e89134aecfcaf7e73015' lives in the XML file like this:
<docRoot xmlns='some-irrelevant-uri>
<myData>Blar blar</myData>
<myExtraData number='1'/>
<docHash MD5='33d4046bea07e89134aecfcaf7e73015' />
<evenMoreOfMyData number='34'/>
</docRoot>
(which I've called hash.xml), that the MD5 should be replaced by 32 zeros (so the hash is correct) and illustrate the procedure on a shell command line using perl, md5 and bash. (Hopefully translating this into C won't be too hard given the existence of regular expression and hashing libraries.)
Breaking down the problem, you first need to be able to find the hash that is in the file:
perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml
(this works by looking for the start of the MD5 attribute of the docHash element, allowing for possible other attributes, and then grabbing the next 32 hex characters. If it finds them it bungs them in the magic $_ variable, if not it sets $_ to be empty, then the value of $_ gets printed for each line. This results in the string "33d4046bea07e89134aecfcaf7e73015" being printed.)
Then you need to calculate the hash of the the file with the has replaced with zeros:
perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5
(where the regular expression is almost the same, but this time the hex characters are replaced by zeros and the whole file is printed. Then the MD5 of this is calculated by piping the result through an md5 hashing program. Putting this together with a bit of bash gives:
if [ `perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml` = `perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5` ] ; then echo OK; else echo ERROR; fi
which executes those two small commands, compares the output and prints "OK" if the outputs match or "ERROR" if they don't. Obviously this is just a simple prototype, and is in the wrong language, I think it illustrates the most straight forward solution.
Incidentally, why do you put the hash inside the XML document? As far as I can see it doesn't have any advantage compared to passing the hash along on a side channel (even something as simple as in a second file called documentname.md5) and makes the hash validation more difficult.
Check out these examples for how to use the XMLDSIG standard with .net
How to: Sign XML Documents with Digital Signatures
How to: Verify the Digital Signatures of XML Documents
You should maybe consider to change the setting for preserving whitespaces.

Resources