SML doesn't acknowledge ord or chr - ml

I'm using this example from "Gentle Intro to ML"
fun incFirst s = chr(ord s + 1) ^ substring(s, 1, size s -1);
But my "Standard ML of New Jersey v110.76 [built: Tue Oct 22 14:04:11 2013]" doesn't like it.
Error: operator and operand don't agree [tycon mismatch]
I can't even do this:
> ord "c";
without getting an error
Error: operator and operand don't agree [tycon mismatch]
It doesn't recognize
load "Char";
either. What am I doing wrong?

You need to do ord #"c" because "c" is a string and #"c" is a character.
Your function has two problems:
Trying to glue together a character and a string using ^, which operates only on strings
Trying to use ord on a string
An ugly solution is this:
fun incFirst s =
Char.toString(chr(ord(String.sub(s,0)) + 1)) ^ substring(s, 1, size s -1);
I think you could probably make a prettier solution using explode and implode and a let block with a pattern match but I don't remember enough SML syntax off the top of my head to do it.

Related

Regular expression unexpected pattern matching

I am trying to create a syntax parser using C-Bison and Flex. In Flex I have a regular expression which matches integers based on the following:
Must start with any digit in range 1-9 and followed by any number of digits in range 0-9. (ex. Correct: 1,12,11024 | Incorrect: 012)
Can be signed (ex. +2,-5)
The number 0 must not be followed by any digit (0-9) and must not signed. (ex. Correct: 0 | Incorrect: 012,+0,-0)
Here is the regex I have created to perform the matching:
[^+-]0[^0-9]|[+-]?[1-9][0-9]*
Here is the expression I am testing:
(1 + 1 + 10)
The matches:
1
1
10)
And here is my question, why does it match '10)'?
The reason I used the above expression, instead of the much simpler one,
(0|[+-]?[1-9][0-9]*) is due to inability of the parser to recognise incorrect expressions such as 012.
The problem seems to occur only when before the ')' precedes the digit '0'. However if the '0' is preceded by two or more digits (ex. 100), then the ')' is not matched.
I know for a fact if I remove [^0-9] from the regex it doesn't match the ')'.
It matches 10( because 1 matches [^+-], 0 matches 0 and ( matches [^0-9].
The reason I used the above expression, instead of the much simpler one, (0|[+-]?[1-9][0-9]*) is due to inability of the parser to recognise incorrect expressions such as 012.
How so? Using the above regex, 012 would be recognized as two tokens: 0 and 12. Would this not cause an error in your parser?
Admittedly, this would not produce a very good error message, so a better approach might be to just use [0-9]+ as the regex and then use the action to check for a leading zero. That way 012 would be a single token and the lexer could produce an error or warning about the leading zero (I'm assuming here that you actually want to disallow leading zeros - not use them for octal literals).
Instead of a check in the action, you could also keep your regex and then add another one for integers with a leading zero (like 0[0-9]+ { warn("Leading zero"); return INT; }), but I'd go with the check in the action since it's an easy check and it keeps the regex short and simple.
PS: If you make - and + part of the integer token, something like 2+3 will be seen as the integer 2, followed by the integer +3, rather than the integers 2 and 3 with a + token in between. Therefore it is generally a better idea to not make the sign a part of the integer token and instead allow prefix + and - operators in the parser.

Special meaning of <> and anonymous arrays inside regex in Perl 6

Outside regex, <> behaves more or less like single quotes. My shallow understanding seems to tell me that, inside regex, <> allows evaluation and interpolation of codes:
# Outside regex, <> acts like single quotes:
> my $x = <{"one"}>
{"one"}
> $x.WHAT
(Str)
# Inside regex, <> evaluates and interpolates:
> my $b="one";
one
> say "zonez" ~~ m/ <{$b}> / # Evaluates {$b} then quotes: m/ one /
「one」
> say "zonez" ~~ m/ <$b> / # Interpolates and quotes without {}
「one」
Because an array variable is allowed inside a regex, I suspect that the Perl 6 regex engine expends the array into OR's when there is <> inside regex surrounding the array.
I also suspect that in a user-defined character class, <[ ]>, the array [] inside <> more or less works like an anonymous array in a way similar to #a below, and the contents of the array (chars in the character class) are expended to OR's.
my #a = $b, "two";
[one two]
> so "zonez" ~~ m/ #a /;
True
> say "ztwoz" ~~ m/ <{[$b, "two"]}> / # {} to eval array, then <> quotes
「two」
> say "ztwoz" ~~ m/ <{#a}> /
「two」
> say "ztwoz" ~~ m/ <#a> /
「two」
> say "ztwoz" ~~ m/ one || two / # expands #a into ORs: [||] #a;
# [||] is a reduction operator;
「two」
And char class expansion:
> say "ztwoz" ~~ m/ <[onetw]> / # like [||] [<o n e t w>];
「t」
> say "ztwoz" ~~ m/ o|n|e|t|w /
「t」
> my #m = < o n e t w >
[o n e t w]
> say "ztwoz" ~~ m/ #m /
「t」
I have not looked into the Rakudo source code, and my understanding is limited. I have not been able to construct anonymous arrays inside regex to prove that <> indeed constructs arrays inside regex.
So, is <> inside regex something special? Or should I study the Rakudo source code (which I really try not to do at this time)?
Outside of a regex <> acts like qw<>, that is it quotes and splits on spaces.
say <a b c>.perl;
# ("a", "b", "c")
It can be expanded to
q :w 'a b c'
Q :q :w 'a b c'
Q :single :words 'a b c'
I recommend reading Language: Quoting Constructs as this is a more broad topic than can be discussed here.
This has almost nothing to do with what <> does inside of a regex.
The use of <> in regexes is not useful in base Perl 6 code, and qw is not that useful in regexes. So these characters are doing double duty, mainly because there are very few non-letter and non-number characters in ASCII. The only time it acts like qw is if the character immediately following < is a whitespace character.
Inside of a regex it can be thought of as injecting some code into the regex; sort of like a macro, or a function call.
/<{ split ';', 'a;b;c' }>/;
/ [ "a" | "b" | "c" ] /;
( Note that | tries all alternations at the same time while || tries the leftmost one first, followed by the next one, etc. That is || basically works the way | does in Perl 5 and PCRE. )
/<:Ll - [abc]>/
/ [d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z] / # plus other lowercase letters
/ <#a> /
/ [ "one" | "two" ] /
Note that / #a / also dissolves into the same construct.
/ <?{ 1 > 0 }> /
# null regex always succeeds
/ [ '' ] /
/ <?{ 1 == 0 }> /
# try to match a character after the end of the string
# (can never succeed)
/ [ $ : . ] /
Those last two aren't quite accurate, but may be a useful way to think about it.
It is also used to call regex "methods".
grammar Foo {
token TOP { <alpha> } # calls Grammar.alpha and uses it at that point
}
If you noticed I always surrounded the substitution with [] as it always acts like an independent sub expression.
Technically none of these are implemented in the way I've shown them, it is just a theoretical model that is easier to explain.
Within regex <> are used for what I tend to call "generalized assertions". Whenever you match something with regex, you're making a series of assertions about what the string should look like. If all of the assertions are true, the entire regex matches. For example, / foo / asserts that the string "foo" appears within the string being matched; / f o* / asserts that the string should contain an "f" followed by zero or more "o", etc.
In any case, for generalized assertions, Rakudo Perl 6 uses the character immediately after the < to determine what kind of assertion is being made. If the character after < is alphabetic (e.g. <foo>) it is taken to mean a named subrule; if the character after < is {, it's an assertion that contains code that is to be interpolated into the pattern (e.g., <{ gen_some_regex(); }>); if the character after < is a [, it's a character class; if the character after < is a : then it expects to match an Unicode property (e.g., <:Letter>); if the character after < is a ? or !, you get positive and negative zero-width assertions respectively; etc.
And finally, outside of regex, <> act as "quote words". If the character immediately following the < is a whitespace character, within regex, it will also act as a kind of "quote words":
> "I'm a bartender" ~~ / < foo bar > /
「bar」
This is matched as if it were an alternation, that is < foo bar > will match one of foo or bar as if you'd written foo | bar.

Why is arithmetic + preferred over textual one?

If I run
select '1' + '1'
the result is 11, since I have added a text to another one.
If I run
select 1 + '1'
the result is 2. I assume the arithmetic operator is chosen over the concatenator because of the type of the first operand. If my reasoning was valid, then the result of
select '1' + 1
would be 11. But instead, it is 2. So, it seems that the operator + is tried to be used as an arithmetic operator and if neither of the operands is arithmetic, then goes on to the next operator. If that is true, that would explain why did I get the error of
Conversion failed when converting the varchar value 'customer_' to data type
int.
instead of customer_<somenumber> when I ran a select and had 'customer_' + <somenumber>.
Long story short: I think I observed that arithmetic + is preferred over its meaning of concatenation at SQL Server. Am I right? If so, is there an official reason of this behavior?
What you're running into is a matter of data type precedence. SQL Server looks to character data types after numerics. So regardless of the ordering of your operands (1 + '1' vs '1' + 1), it's attempting to convert your types to numerics, and succeeding.
The same happens with your second attempt - it's trying to convert the string customer_ to an integer because you're using an arithmetic operator along with an integer.
Yes, the precedence is arithmetic first when compared to concatenation.
https://msdn.microsoft.com/en-us/library/ms190276.aspx
Your error, as you know, is because the it won't implicitly attempt to convert INT to VARCHAR

How to differentiate '-' operator from a negative number for a tokenizer

I am creating an infix expression parser, an so I have to create a tokenizer. It works well, except for one thing: I do not now how to differentiate negative number from the "-" operator.
For example, if I have:
23 / -23
The tokens should be 23, / and -23, but if I have an expression like
23-22
Then the tokens should be 23, - and 22.
I found a dirty workaround which is if I encounter a "-" followed by a number, I look at the previous character and if this character is a digit or a ')', I treat the "-" as an operator and not a number.
Apart from being kind of ugly, it doesn't work for expressions like
--56
where it gets the following tokens: - and -56 where it should get --56
Any suggestion?
In the first example the tokens should be 23, /, - and 23.
The solution then is to evaluate the tokens according to the rules of associativity and precedence. - cannot bind to / but it can to 23, for example.
If you encounter --56, is split into -,-,56 and the rules take care of the problem. There is no need for special cases.

SSIS How to get part of a string by separator

I need an SSIS expression to get the left part of a string before the separator, and then put the new string in a new column. I checked in derived column, it seems no such expressions. Substring could only return string part with fixed length.
For example, with separator string - :
Art-Reading Should return Art
Art-Writing Should return Art
Science-chemistry Should return Science
P.S.
I knew this could be done in MySQL with SUBSTRING_INDEX(), but I'm looking for an equivalent in SSIS, or at least in SQL Server
Better late than never, but I wanted to do this too and found this.
TOKEN(character_expression, delimiter_string, occurrence)
TOKEN("a little white dog"," ",2)
returns little the source is below
http://technet.microsoft.com/en-us/library/hh213216.aspx
of course you can:
just configure your derived columns like this:
Here is the expression to make your life easier:
SUBSTRING(name,1,FINDSTRING(name,"-",1) - 1)
FYI, the second "1" means to get the first occurrence of the string "-"
EDIT:
expression to deal with string without "-"
FINDSTRING(name,"-",1) != 0 ? (SUBSTRING(name,1,FINDSTRING(name,"-",1) - 1)) : name
You can specify the length to copy in the SUBSTRING function and check for the location of the dash using CHARINDEX
SELECT SUBSTRING(#sString, 1, CHARINDEX('-',#sString) - 1)
For the SSIS expression it is pretty much the same code:
SUBSTRING(#[User::String], 1, FINDSTRING(#[User::String], "-", 1)-1)
if SUBSTRING length param returns -1 then it results in error,
"The length -1 is not valid for function "SUBSTRING". The length parameter cannot be negative. Change the length parameter to zero or a positive value."

Resources