Computing the MD5 checksum of a string - md5

How do I compute the MD5 checksum of a string in Standard ML?
What I'd like is preferrably a function md5 : string -> string.

Since there is no MD5 checksum function in the standard library, use one of the following libraries found online.
md5.sml (Tom 7's public domain implementation from 2001):
- MD5.bintohex (MD5.md5 "Hello World!")
val it = "ED076287532E86365E841E92BFC50D8C" : string
md5.sml (Daniel Wang's implementation from 2001 found in MLton's tests):
- MD5.toHexString (MD5.final (MD5.update (MD5.init, Byte.stringToBytes "Hello World!")))
> val it = "ed076287532e86365e841e92bfc50d8c" : string

Related

OCaml type constructors in C = <unknown constructor>

I've written some OCaml bindings for some C code; they seem to work fine, except that, at the interpreter, the type constructor appears to be opaque to OCaml. So, for example:
# #load "foo.cma" ;;
# open Foo ;;
# let b = barbie "bar";;
val b : Foo.kung = <unknown constructor>
# let k = ken "kenny" ;;
val k : Foo.tza = <abstr>
I'm trying to get rid of the <unknown constructor> and replace it with a meaningful print.
The contents of foo.ml are:
type kung = Tsau of string [##boxed] ;;
type tza ;; (* Obviously, an abstract type *)
external barbie : string -> kung = "barbie_doll" ;;
external ken : string -> tza = "ken_doll" ;;
The C code is the minimal amount of code to get this to work:
CAMLprim value barbie(value vstr) { ... }
CAMLprim value ken(value vsttr) { ... }
The actual C code uses caml_alloc_custom() to hold stuff; and obviously has to return what caml_alloc_custom() alloced, so that it doesn't get lost. That's the core reason for using custom types: so I can return the custom malloc's.
Even though these types are abstract or opaque, I'd like to have the let expression print something meaningful. For example, perhaps this?
val b : Foo.kung = "Hi my name is bar"
val k : Foo.tza = "Yo duude it's kenny"
The second question would be: if it's possible to print something meaningful, what should it be? Obviously, the constructors were invoked with strings, so whatever is printed should include those string values...
The third question is: is it possible to specify types and type constructors in C? I suspect that the answer is "obviously no", because OCaml types are static, compile-time types, and are not dynamically constructable. But it never hurts to ask. I mean, the ocaml interpreter is able to deal with brand new type declarations just fine, so somehow, OCaml types aren't entirely static; there's some kind of 'dynamic' aspect to them. I've not found any documentation on this.
As suggested in a comment, the answer is to use #install_printer, which is briefly touched on in the toplevel documentation. This is but a hint: example code as follows (keeping with the earlier example).
In foostubs.ml:
(** Signature declarations for C functions *)
external kung_prt : kung -> string = "c_kung_str" ;;
(* Need to #install_printer kung_pretty ;;
* FYI, the OCaml documentation is confusing
* as to what a Format.formatter actually is:
* it is used like a C stream or outport. *)
let kung_pretty : Format.formatter -> kung -> unit =
function oport ->
fun x -> Format.fprintf oport "Hi %s" (kung_prt x) ;;
The printer in C would look like this:
CAMLprim value c_kung_str(value vkung)
{
CAMLparam1(vkung);
const char* name = String_val(vkung);
char buff[200];
strcpy(buff, "my name is ");
strncat(buff, name, 200);
CAMLreturn(caml_copy_string(buff));
}
Then, either you have to tell the user to #install_printer kung_pretty ;; or, better yet, provide the user with a setupfoo.ml that does this, and tell them to #use "setupfoo.ml".

TextEncodings.Base64Url.Decode vs Convert.FromBase64String

I was working on creating a method that would generate a JWT token. Part of the method reads a value from my web.config that services as the "secret" used to generate the hash used to create the signature for the JWT token.
<add key="MySecret" value="j39djak49H893hsk297353jG73gs72HJ3tdM37Vk397" />
Initially I tried using the following to convert the "secret" value to a byte array.
byte[] key = Convert.FromBase64String(ConfigurationManager.AppSettings["MySecret"]);
However, an exception was thrown when this line was reached ...
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
So I looked into the OAuth code and so another method being used to change a base64 string into a byte array
byte[] key = TextEncodings.Base64Url.Decode(ConfigurationManager.AppSettings["MySecret"]);
This method worked without issue. To me it looks like they are doing the same thing. Changing a Base64 text value into an array of bytes. However, I must be missing something. Why does Convert.FromBase64String fail and TextEncodings.Base64Url.Decode work?
I came across the same thing when I migrated our authentication service to .NET Core. I had a look at the source code for the libraries we used in our previous implementation, and the difference is actually in the name itself.
The TextEncodings class has two types of text encoders, Base64TextEncoder and Base64UrlEncoder. The latter one modifies the string slightly so the base64 string can be used in an url.
My understanding is that it is quite common to replace + and / with - and _. As a matter of fact we have been doing the same with our handshake tokens. Additionally the padding character(s) at the end can also be removed. This leaves us with the following implementation (this is from the source code):
public class Base64UrlTextEncoder : ITextEncoder
{
public string Encode(byte[] data)
{
if (data == null)
{
throw new ArgumentNullException("data");
}
return Convert.ToBase64String(data).TrimEnd('=').Replace('+', '-').Replace('/', '_');
}
public byte[] Decode(string text)
{
if (text == null)
{
throw new ArgumentNullException("text");
}
return Convert.FromBase64String(Pad(text.Replace('-', '+').Replace('_', '/')));
}
private static string Pad(string text)
{
var padding = 3 - ((text.Length + 3) % 4);
if (padding == 0)
{
return text;
}
return text + new string('=', padding);
}
}

DigestUtils.md5Hex() generates wrong hash value when passing String object

I'm trying to generate a md5 hash in Kotlin using the DigestUtils class from the org.apache.commons.codec. Here's the test code
#Test
fun md5Test(){
val userPassword: String = "123"
val md5Hash: String = "202cb962ac59075b964b07152d234b70"
assertEquals(md5Hash, DigestUtils.md5Hex(userPassword))
}
The problem is that when I run this test it fails and says that the generated md5 hash is 28c1a138574866e9c2e5a19dca9234ce
But... when I pass the String value instead of the object
assertEquals(md5Hash, DigestUtils.md5Hex("123"))
The test passes without errors
Why this is happening?
Here is a complete solution to get MD5 base64 hash:
fun getMd5Base64(encTarget: ByteArray): String? {
val mdEnc: MessageDigest?
try {
mdEnc = MessageDigest.getInstance("MD5")
// Encryption algorithmy
val md5Base16 = BigInteger(1, mdEnc.digest(encTarget)) // calculate md5 hash
return Base64.encodeToString(md5Base16.toByteArray(), 16).trim() // convert from base16 to base64 and remove the new line character
} catch (e: NoSuchAlgorithmException) {
e.printStackTrace()
return null
}
}
This is the most simple and complete solution in kotlin:
val hashedStr = String.format("%032x", BigInteger(1, MessageDigest.getInstance("MD5").digest("your string value".toByteArray(Charsets.UTF_8))))

Salted sha512 in C, cannot synchronise with Symfony2's FOSUserBundle

My developement is separated into two components :
The website, a Symfony application using FOSUserBundle, which encrypts password using SHA512, and a salt.
An authentication module, programmed in C, which should be able to reproduce the SHA512 salted hash once it's given the salt, and the cleartext password.
Some information about my environment
I'm using Linux Ubuntu 12.04.
ldd --version answers EGLIBC 2.15-0ubuntu10.4 2.15 (maybe I need 2.7 ? But apt-get is a real PAIN when it comes to upgrading packages correctly).
The crypt.h header files mentions #(#)crypt.h 1.5 12/20/96
The problem itself
My problem occurs in the authentication module : I'm unable to get the same hash as the one produced by Symfony's FOSUserBundle. Here's my example :
The password salt, used by Symfony, is bcccy6eiye8kg44scw0wk8g4g0wc0sk.
The password itself is test
With this information, Symfony stores this final hash :
fH5vVoACB4e8h1GX81n+aYiRkSWxeu4TmDibNChtLNZS3jmFKBZijGCXcfzCSJFg+YvNthxefHOBk65m/U+3OA==
Now, in my C authentication module, I run this piece of code (crypt.h is included) :
char* password = "test";
char* salt = "$6$bcccy6eiye8kg44scw0wk8g4g0wc0sk";
char* hash = malloc(256);
memset(hash, 0, 256);
encode64(crypt(password, salt), hash, strlen(password));
fprintf(stdout, "%s\n", hash);
(here is my base64 encoder : http://libremail.tuxfamily.org/sources/base64-c.htm)
And this outputs...
JDYkYg==
Which is completely different from my Symfony2 hash.
Browsing Stack Overflow, I found this question (Symfony2 (FOSUserBundle) SHA512 hash doesn't match C# SHA512 hash) written by someone encountering the same issue (with C# though). So I decided to run this test...
char* password = "test{bcccy6eiye8kg44scw0wk8g4g0wc0sk}";
char* salt = "$6$bcccy6eiye8kg44scw0wk8g4g0wc0sk"; // I tried without salt, or with "$6$" as well.
char* hash = malloc(256);
memset(hash, 0, 256);
encode64(crypt(password, salt), hash, strlen(password));
fprintf(stdout, "%s\n", hash);
Of course, it was a complete failure, I got :
JDYkYmNjY3k2ZWl5ZThrZzQ0cyRycmN6TnpJUXFOYU1VRlZvMA==
I've tried mixing the password and the salt in various ways, but I could never get the Symfony's salt in the authentication module. Is there something I've missed on the way ? Have I misunderstood the way Symfony's FOSUserBundle stores passwords ?
Not really an answer but I'm guessing you have not looked into how Symfony encodes passwords in any great detail? The encoding process is tucked away into an encoder object. For SHA512 we use:
namespace Symfony\Component\Security\Core\Encoder;
class MessageDigestPasswordEncoder extends BasePasswordEncoder
{
/**
* Constructor.
*
* #param string $algorithm The digest algorithm to use
* #param Boolean $encodeHashAsBase64 Whether to base64 encode the password hash
* #param integer $iterations The number of iterations to use to stretch the password hash
*/
public function __construct($algorithm = 'sha512', $encodeHashAsBase64 = true, $iterations = 5000)
{
$this->algorithm = $algorithm;
$this->encodeHashAsBase64 = $encodeHashAsBase64;
$this->iterations = $iterations;
}
public function encodePassword($raw, $salt)
{
if (!in_array($this->algorithm, hash_algos(), true)) {
throw new \LogicException(sprintf('The algorithm "%s" is not supported.', $this->algorithm));
}
$salted = $this->mergePasswordAndSalt($raw, $salt);
$digest = hash($this->algorithm, $salted, true);
// "stretch" hash
for ($i = 1; $i < $this->iterations; $i++) {
$digest = hash($this->algorithm, $digest.$salted, true);
}
return $this->encodeHashAsBase64 ? base64_encode($digest) : bin2hex($digest);
}
public function isPasswordValid($encoded, $raw, $salt)
{
return $this->comparePasswords($encoded, $this->encodePassword($raw, $salt));
}
protected function mergePasswordAndSalt($password, $salt)
{
if (empty($salt)) {
return $password;
}
if (false !== strrpos($salt, '{') || false !== strrpos($salt, '}')) {
throw new \InvalidArgumentException('Cannot use { or } in salt.');
}
return $password.'{'.$salt.'}';
}
As you can see, one immediate problem is that hashing is repeated 5000 times by default. This (as well as the other inputs can all be adjusted in you app/config/security.yml file).
You can also see where the salt and password get merged together. Which explains the other stackoverflow answer.
It would be trivial to make a symfony command to just run this encoding algorithm from the symfony console for testing. After that is just a question of adjusting the inputs or tweaking your C code until the results match.
If you are lucky then all your will have to do is add the iteration loop.

Token return values in ANTLR 3 C

I'm new to ANTLR, and I'm attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I'd like to have each rule return a value, eg:
number returns [long value]
:
( INT {$value = $INT.ivalue;}
| HEX {$value = $HEX.hvalue;}
)
;
HEX returns [long hvalue]
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+ {$hvalue = strtol((char*)$text->chars,NULL,16);}
;
INT returns [long ivalue]
: '0'..'9'+ {$ivalue = strtol((char*)$text->chars,NULL,10);}
;
Each rule collects the return value of it's child rules until the topmost rule returns a nice struct full of my data.
As far as I can tell, ANTLR allows lexer rules (tokens, eg 'INT' & 'HEX') to return values just like parser rules (eg 'number'). However, the generated C code will not compile:
error C2228: left of '.ivalue' must have class/struct/union
error C2228: left of '.hvalue' must have class/struct/union
I did some poking around, and the errors make sense - the tokens end up as generic ANTLR3_COMMON_TOKEN_struct, which doesn't allow for a return value. So maybe the C target just doesn't support this feature. But like I said, I'm new to this, and before I go haring off to find another approach I want to confirm that I can't do it this way.
So the question is this: 'Does antler3C support return values for lexer rules, and if so what is the proper way to use them?'
Not really any new information, just some details on what #bemace already mentioned.
No, lexer rules cannot have return values. See 4.3 Rules from The Definitive ANTLR reference:
Rule Arguments and Return Values
Just like function calls, ANTLR parser and tree parser rules can have
arguments and return values. ANTLR lexer rules cannot have return
values [...]
There are two options:
Option 1
You can do the transforming to a long in the parser rule number:
number returns [long value]
: INT {$value = Long.parseLong($INT.text);}
| HEX {$value = Long.parseLong($HEX.text.substring(2), 16);}
;
Option 2
Or create your own token that has, say, a toLong(): long method:
import org.antlr.runtime.*;
public class YourToken extends CommonToken {
public YourToken(CharStream input, int type, int channel, int start, int stop) {
super(input, type, channel, start, stop);
}
// your custom method
public long toLong() {
String text = super.getText();
int radix = text.startsWith("0x") ? 16 : 10;
if(radix == 16) text = text.substring(2);
return Long.parseLong(text, radix);
}
}
and define in the options {...} header in your grammar to use this token and override the emit(): Token method in your lexer class:
grammar Foo;
options{
TokenLabelType=YourToken;
}
#lexer::members {
public Token emit() {
YourToken t = new YourToken(input, state.type, state.channel,
state.tokenStartCharIndex, getCharIndex()-1);
t.setLine(state.tokenStartLine);
t.setText(state.text);
t.setCharPositionInLine(state.tokenStartCharPositionInLine);
emit(t);
return t;
}
}
parse
: number {System.out.println("parsed: "+$number.value);} EOF
;
number returns [long value]
: INT {$value = $INT.toLong();}
| HEX {$value = $HEX.toLong();}
;
HEX
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+
;
INT
: '0'..'9'+
;
When you generate a parser and lexer, and run this test class:
import org.antlr.runtime.*;
import java.io.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("0xCafE");
FooLexer lexer = new FooLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
FooParser parser = new FooParser(tokens);
parser.parse();
}
}
it will produce the following output:
parsed: 51966
The first options seems the more practical in your case.
Note that, as you can see, the examples given are in Java. I have no idea if option 2 is supported in the C target/runtime. I decided to still post it to be able to use it as a future reference here on SO.
Lexer rules must return Token objects, because that's what the Parser expects to work with. There may be a way to customize the type of token object used, but it's easier just to convert tokens to values in the lowest-level parser rules.
social_title returns [Name.Title title]
: SIR { title = Name.Title.SIR; }
| 'Dame' { title = Name.Title.DAME; }
| MR { title = Name.Title.MR; }
| MS { title = Name.Title.MS; }
| 'Miss' { title = Name.Title.MISS; }
| MRS { title = Name.Title.MRS; };
There is a third option: You can pass an object as argument to the lexer rule. This object contains a member that represents the lexer's return value. Within the lexer rule, you can set the member. Outside the lexer rule, at the point you call it, you can get the member and do whatever you want with this 'return value'.
This way of parameter passing corresponds to the 'var' parameters in Pascal or the 'out' parameters in C++ and other programming languages.

Resources