i'm using MD5 hashing to encrypt passwords for a program. But it is not creating all the characters and that to some are unreadable.
Here is an screenshot.
link-http://i46.tinypic.com/2qvf2o2.jpg
Any help is appreciated
Thanks
Presumably you want to convert the array of bytes returned by MD5 to a hexidecimal string for display. Something like d131dd02c5e6eec4.
Here's how you can do that:
In Java, how do I convert a byte array to a string of hex digits while keeping leading zeros?
You're interpreting the bytes returned by MD5 as raw character data.
Since MD5 does not return bytes that represent characters, you get meaningless results.
What you're getting back is a binary value. So it's a bunch of raw bytes that may or may not map to valid characters in your default codepage. What you should do is convert the byte[] to hex. You can use something like Apache Commons Codec to encode this. http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Hex.html#encodeHex(byte[])
Related
I saw the other questions about the subject but all of them were missing important details:
I want to convert \u00252F\u00252F\u05de\u05e8\u05db\u05d6 to utf8. I understand that you look through the stream for \u followed by four hex which you convert to bytes. The problems are as follows:
I heard that sometimes you look for 4 bytes after and sometimes 6 bytes after, is this correct? If so, then how do you determine which it is? E.g. is \u00252F 4 or 6 bytes?
In the case of \u0025 this maps to one byte instead of two (0x25), why? Is the four hex supposed to represent utf16 which i am supposed to convert to utf8?
How do I know whether the text is supposed to be the literal characters \u0025 or the unicode sequence? Does that mean that all backslashes must be escaped in the stream?
Lastly, am I being stupid in doing this by hand when I can use iconv to do this for me?
If you have the iconv interfaces at your disposal, you can simply convert the \u0123\uABCD etc. sequences to an array of bytes 01 23 AB CD ..., replacing any unescaped ASCII characters with a 00 byte followed by the ASCII byte, then run the array through iconv with a conversion descriptor obtained by iconv_open("UTF-8", "UTF-16-BE").
Of course you can also do it much more efficiently working directly with the input yourself, but that requires reading and understanding the Unicode specification of UTF-16 and UTF-8.
In some conventions (like C++11 string literals), you parse a specific number of hex digits, like four after \u and eight after \U. That may or may not be the convention with the input you provided, but it seems a reasonable guess. Other styles, like C++'s \x you parse as many hex digits as you can find after the \x, which means that you have to jump through some hoops if you do want to put a literal hex digit immediately after one of these escaped characters.
Once you have all the values, you need to know what encoding they're in (e.g., UTF-16 or UTF-32) and what encoding you want (e.g., UTF-8). You then use a function to create a new string in the new encoding. You can write such a function (if you know enough about both encoding formats), or you can use a library. Some operating systems may provide such a function, but you might want to use a third-party library for portability.
I have to decrypt an encrypted message in C program which has been encrypted using 3DESm method in CBC mode. It looks quite different from the encrypted message that we get from 3DES method. Generally the ciphertext that we get after encrypting from is a hexadecimal number which is of length 16 and has includes the characters like 0-9 and A-F only. But the ciphertext that I need to decrypt is more than 20 characters in length and includes all characters from A-Z, a-z , 0-9 and also includes few special characters like '+' and '='.
Which library would be helpful to do the needful?
The ciphertext you get from any normal blockcipher consistst of raw bytes/ binary data, not hex. In the case of DES these are 8 bytes per block.
You can then encode them, if you prefer text over binary data. Looks like your ciphertext was Base64 and not Hex encoded. But that's a choice independent from the choice of cipher.
Base64 uses all ASCII letters and numbers as well as + and / to encode the data, and = as padding if the input isn't a multiple of 6 bits.
I'm reading out text from a website with Curl. All the rawdata is being returned character by character with
return memEof(mp) ? EOF : (int)(*(unsigned char *)(mp->readptr++));
My problem is, that all the special characters such as ÄÖÜäöüß etc are all wrong and look very cryptic. I'm currently correcting them manually by adjusting their values using this table:
http://www.barcoderesource.com/barcodeasciicharacters.shtml
I was wondering now, if there is a more elegant way to do this and how others approach these kinds of issues.
This is an encoding issue. If you read data byte by byte, you can handle correctly and easily just single byte encodings (like ISO-8859 "family" and many more), provided you have a way to convert them correctly in a target encoding, if you need. But with encodings like UTF-8 you are less lucky, since to get the right code you need to read 1 byte, or maybe 2, or maybe three... If you stream them into a string, and print the string altogether, and the output device intended encoding is the same of the input encoding, you get the right char shown anyway.
If it does not happen, and you are not printing each byte as if it were a symbol for sure, then the output device intended encoding does not match the one the string is written with.
If the output, once you print the string "altogether" looks ok, then the problem is that you are interpreting each byte as a single character, while it is not (you have a multibyte encoding for char like the special one you mentioned; likely it is UTF-8 but it could be not too).
If you get equal results in both cases (when you print each byte one by one and when you output a string that keeps the whole byte sequence), then the output device intended encoding is a single byte encoding like the input encoding, but they do not match.
Further details would need to know how you collect the bytes you read in order to print them and say that they looks cryptic.
An example.
const char *string = "\xc3\xa8\xc3\xb2\xc3\xa0";
int i;
for(i = 0; string[i] != 0; i++)
{
printf("%c\n", string[i]);
// using \n is important; if you "sequence" the chars and the output enc is
// utf-8, you obtain the right output
}
printf("%s", string);
You obtain different results if the output device encoding is UTF-8; if it is a single byte encoding, you obtain the same output (newlines apart), but it is "wrong" with respect to what I've written, i.e. èòà.
The "same" text is, in Latin1, "\xe8\xf2\xe0". Latin1 is a single byte encoding, so the above speech applies. If printed on a terminal understanding utf-8, you can obtain something like �� ...
So, encodings matter, device/format output encoding matters too, and you must be aware of both in order to handle and show properly the text. (About format, an example could be html, where you can specify the encoding of the content... you must be coherent, and you'll see everything fine)
I guess you have to use an external library like iconv to create a wchar_t string which contains the data. This depends on the used character encoding.
I was thinking about this problem to myself previously, and came up with the question:
"Theoretically, isn't it possible that a hash function of say, X random bytes, when hashed, would be considered vulnerable since NULL bytes in any character array would be interpreted as the end of a string in C? And therefore as an attacker we may ignore that character (and possibly others) within our initial string to brute force?"
Sorry if I'm not very clear on this.
The output of hash functions is typically a fixed-length array of bytes (or machine words); it should be interpreted as a fixed-length array, not as a zero-delimited string.
The solution is to use memcmp(3) in your code instead of strcmp(3).
Hash functions do not produce string (character based) output. They produce (fixed-length) byte arrays. You can convert that to Base64 or Base16 if you want, then you have a string (which will never contain a null byte, just 00).
Is it possible to know if a file has Unicode (16-byte per char) or 8-bit ASCII content?
You may be able to read a byte-order-mark, if the file has this present.
UTF-16 characters are all at least 16-bits, with some being 32-bits with the right prefix (0xE000 to 0xFFFF). So simply scanning each char to see if less than 128 won't work. For example, the two bytes 0x20 0x20 would encode in ASCII and UTF-8 for two spaces, but encode in UTF-16 for a single character 0x2020 (dagger). If the text is known to be English with the occasional non-ASCII character, then most every other byte will be zero. But without some apriori knowledge about the text and/or it's encoding, there is no reliable way distinguish a general ASCII string from a general UTF-16 string.
Ditto to what Brian Agnew said about reading the byte order mark, a special two bytes that might appear at the beginning of the file.
You can also know if it is ASCII by scanning every byte in the file and seeing if they are all less than 128. If they are all less than 128, then it's just an ASCII file. If some of them are more than 128, there is some other encoding in there.
First off, ASCII is 7-bit, so if any byte has its high bit set you know the file isn't ASCII.
The various "common" character sets such as ISO-8859-x, Windows-1252, etc, are 8-bit, so if every other byte is 0, you know that you're dealing with Unicode that only uses the ISO-8859 characters.
You'll run into problems where you're trying to distinguish between Unicode and some encoding such as UTF-8. In this case, almost every byte will have a value, so you can't make an easy decision. You can, as Pascal says do some sort of statistical analysis of the content: Arabic and Ancient Greek probably won't be in the same file. However, this is probably more work than it's worth.
Edit in response to OP's comment:
I think that it will be sufficient to check for the presence of 0-value bytes (ASCII NUL) within your content, and make the choice based on that. The reason being that JavaScript keywords are ASCII, and ASCII is a subset of Unicode. Therefore any Unicode representation of those keywords will consist of one byte containing the ASCII character (low byte), and another containing 0 (the high byte).
My one caveat is that you carefully read the documentation to ensure that their use of the word "Unicode" is correct (I looked at this page to understand the function, did not look any further).
If the file for which you have to solve this problem is long enough each time, and you have some idea what it's supposed to be (say, English text in unicode or English text in ASCII), you can do a simple frequency analysis on the chars and see if the distribution looks like that of ASCII or of unicode.
Unicode is an alphabet, not a encoding. You probably meant UTF-16. There is lot of libraries around (python-chardet comes to mind instantly) to autodetect encoding of text, though they all use heuristics.
To programmatically discern the type of a file -- including, but not limited to the encoding -- the best bet is to use libmagic. BSD-licensed it is part of just about every Unix-system you are about to encounter, but for a lesser ones you can bundle it with your application.
Detecting the mime-type from C, for example, is as simple as:
Magic = magic_open(MAGIC_MIME|MAGIC_ERROR);
mimetype = magic_buffer(Magic, buf, bufsize);
Other languages have their own modules wrapping this library.
Back to your question, here is what I get from file(1) (the command-line interface to libmagic(3)):
% file /tmp/*rdp
/tmp/meow.rdp: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
For your specific use-case, it's very easy to tell. Just scan the file, if you find any NULL ("\0"), it must be UTF-16. JavaScript got to have ASCII chars and they are represented by a leading 0 in UTF-16.