Ruby C API string and symbol equality? - c

Writing a C extension for a Ruby gem, I need to test a parameter value for equality with a known symbol and string.
I realize you can intern a string with
char *foo = "foo";
VALUE foo_string_value = rb_intern(foo);
and then convert it into a symbol:
VALUE foo_sym_value = ID2SYM(foo_string_value);
My question is whether I can now reliably test the parameter, which may be a symbol or string, :foo or 'foo', with these using C pointer equality:
if (param == foo_string_value || param == foo_sym_value) {
...
}
In other words, does the interning guarantee that pointer equality is also value equality for symbols and strings?
If this is the wrong approach, then what's the right one to do this comparison?

I figured this out. A reasonable way to do what I want appears to be
VALUE square_sym = ID2SYM(rb_intern("square"));
VALUE kind_as_sym = rb_funcall(kind_value, rb_intern("to_sym"), 0);
if (square_sym == kind_as_sym) {
...
}
Symbols are always interned and have unique values. Strings aren't and don't. They can be interned to get IDs that are integers unique for each string, but that doesn't help in this case.

Related

Implementation of Array.length in OCaml

I want to understand how Array.length is implemented. I managed to write it with Array.fold_left:
let length a = Array.fold_left (fun x _ -> x + 1) 0 a
However in the standard library, fold_left uses length so that can't be it. For length there is just this line in the stdlib which I don't understand:
external length : 'a array -> int = "%array_length"
How can I write length without usingfold_left?
EDIT:
I tried to do it with pattern matching, however it is not exhaustive, how can I make the matching more precise? (The aim is to remove the last element and return i+1 when only one element is left)
let length a =
let rec aux arr i =
match arr with
| [|h|] -> i+1
| [|h;t|] -> aux [|h|] (i+1)
in aux a 0;;
The array type is a primitive type, like the int type.
The implementation of many primitives functions on those primitive type is done with either C functions or compiler primitives.
The Array.length function belongs to the compiler primitive category and it is defined in the standard library by:
external length : 'a array -> int = "%array_length"
Here, this declaration bind the value length to the compiler primitive %array_length (compiler primitive names start with a % symbol) with type 'a array -> int. The compiler translates such compiler primitive to a lower level implementation during the translation process from the source code to either native code or bytecode.
In other words, you cannot reimplement Array.length or the array type in general in an efficient way yourself because this type is a basic building block defined by the compiler itself.
For length there is just this line in the stdlib which I don't understand:
The external keyword indicates that this function is implemented in C, and "%array_length" is the C symbol naming this function. The OCaml runtime is implemented in C and some types, like arrays, are built-in (also called primitives).
See for example Implementing primitives in Chapter 20: Interfacing C with OCaml
I tried to do it with pattern matching, however it is not exhaustive, how can I make the matching more precise
Note that OCaml tells you which pattern is not matched:
Here is an example of a case that is not matched:
[| |]
So you have to account for empty vectors as well.

Passing Swift strings as C strings in variable argument lists

I am trying to use the old printf-style string formatters to define a new Swift string from an old Swift string.
I notice this works fine as long as I am starting with a Swift string literal, but not a string variable.
// OK!
String(format:"%s", "hello world".cStringUsingEncoding(NSUTF8StringEncoding))
// ERROR: argument type '[CChar]?' does not conform to expected type 'CVarArgType'
let s = "hello world"
String(format:"%s", s.cStringUsingEncoding(NSUTF8StringEncoding))
Why does this happen?
And what's the best workaround?
(Please note that I am aware of but do not want to use the Cocoa string formatter %#. I want the printf-style formatting code because what I'm actually trying to do is get quick and dirty tabular alignment with codes like %-10s.)
This question concerns Swift 2.2.
Don't create a C string just for alignment. There is a method stringByPaddingToLength for that.
// Swift 2
s.stringByPaddingToLength(10, withString: " ", startingAtIndex: 0)
// Swift 3
s.padding(toLength: 10, withPad: " ", startingAt: 0)
Note that it will truncate the string if it is longer than 10 UTF-16 code units.
The problem here is, there are two methods in Swift named cStringUsingEncoding:
func cStringUsingEncoding(encoding: UInt) -> UnsafePointer<Int8>, as a method of NSString
func cStringUsingEncoding(encoding: NSStringEncoding) -> [CChar]?, as an extension of String
If we want a pointer, we need to ensure we are using an NSString, not a String.
In the first example, a string literal can be a String or NSString, so the compiler chooses NSString since the other one won't work.
But in the second example, the type of s is already set to String, so the method that returns [CChar]? is chosen.
This could be worked-around by forcing s to be an NSString:
let s: NSString = "hello world"
String(format:"%s", s.cStringUsingEncoding(NSUTF8StringEncoding))
kennytm's answer is clearly the best way.
But for anyone else who is still wondering about how to do it the wrong way, and get access to a c string based on a Swift String without going through NSString, this also seems to work (Swift 2.2):
var s:String = "dynamic string"
s.nulTerminatedUTF8.withUnsafeBufferPointer { (buffptr) -> String in
let retval:String = String(format:"%s",buffptr.baseAddress)
return retval
}

Find longest suffix of string in given array

Given a string and array of strings find the longest suffix of string in array.
for example
string = google.com.tr
array = tr, nic.tr, gov.nic.tr, org.tr, com.tr
returns com.tr
I have tried to use binary search with specific comparator, but failed.
C-code would be welcome.
Edit:
I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.
Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.
Let i = 0.
Let S = null
Let c = prefix[i]
Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.
Increment i.
Go to step 3.
Is that what you're looking for?
Example:
prefix = rt.moc.elgoog
array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt
Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.
Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.
Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.
Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.
etc.
Another naïve, pseudo-answer.
Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.
Not very efficient....
Naive, pseudo-answer:
Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
Iterate over array and see if suffix is in given string
If it is, exit the loop because you are done! If not, continue.
Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.
Edit 0:
Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.
For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.
Edit 1:
On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...
If your array of strings is something along the following:
char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";
etc, then you can simply do this:
int x, max = 0;
for (x = 0; x < STRINGS; x++) {
if (strlen(string[x]) > max) {
max = strlen(string[x]);
}
}
x = 0;
while(true) {
if (string[max][x] == ".") {
GOTO out;
}
x++;
}
out:
char output[MAX_STRING_LENGTH];
int y = 0;
while (string[max][x] != NULL) {
output[y++] = string[++x];
}
(The above code may not actually work (errors, etc.), but you should get the general idea.
Why don't you use suffix arrays ? It works when you have large number of suffixes.
Complexity, O(n(logn)^2), there are O(nlogn) versions too.
Implementation in c here. You can also try googling suffix arrays.

Array of strings as hash function key?

Is it possible, in any language (it doesn't matter), to have a hash function which uses an array of strings as keys?
I mean something like this:
hash(["word1", "word2", ...]) = "element"
instead of the classical:
hash("word") = "element"
I need something like that because each word I would like to use as key could change the output element of the function. I've got a sequence of words and I want a specific output for that sequence (also the order may change the result).
Sure. Any data structure at all can be hashed. You only need to come up with a strict definition of equality and then ensure that hash(A) == hash(B) if A == B. Suppose your definition is that [s1, s2, ..., sm] == [t1, t2, ..., tn] if and only if m == n and si == ti for i = 1..m and further string s == t if and only if |s|==|t| and s[i]==t[i] for 0<=i<|s|. You can build a hash in a many, many ways:
Concatenate all the strings in the list and hash the result with any string hash function.
Do the same, adding separators such as commas (,)
Hash each string individually and xor the results.
Hash eash string individually, shift the previous hash value, and xor the new value into the hash.
Infinitely many more possibilities...
Tigorous definition of equality is important. If for example order doesn't matter in the lists or the string comparison is case-insensitive, then the hash function must still be designed to ensure hash(A) == hash(B) if A == B . Getting this wrong will cause lookups to fail.
Java is one language that lets you define a hash function for any data type. And in fact a library list of strings will work just fine as a key using the default hash function.
HashMap<ArrayList<String>, String> map = new HashMap<ArrayList<String>, String>();
ArrayList<String> key = new ArrayList<String>();
key.add("Hello");
key.add("World");
map.put(key, "It's me.");
// map now contains mapping ["Hello", "World"] -> "It's me."
Yes it is possible, but in most cases you will have to define your own hash function that translates an array into a hash-key. For example, in java, the array.hashCode() is based on the Object.hashCode() function which is based on the Reference itself and not the contents of the Object.
You may also have a look at Arrays.deepHashCode() function in java if you are interested in an implementation of a hashing function built on top of an array.

Is Array.toString() guaranteed to remain as is in ActionScript 3?

Is it fine to display the output of Array.toString() to the user, or is there a possibility that the string format could change in future versions of ActionScript 3 or other compilers?
Here's an excerpt describing Array.toString from ECMA-262, which ActionScript 3 follows very closely:
15.4.4.2
Array.prototype.toString ( ) When the toString method is called, the following steps are taken:
1. Let array be the result of calling ToObject on the this value.
2. Let func be the result of calling the [[Get]] internal method of array with argument "join".
3. If IsCallable(func) is false, then let func be the standard built-in method Object.prototype.toString (15.2.4.2).
4. Return the result of calling the [[Call]] internal method of func providing array as the this value and an empty arguments list.
And Array.join:
15.4.4.5
Array.prototype.join (separator) The elements of the array are converted to Strings, and these Strings are then concatenated, separated by occurrences of the separator. If no separator is provided, a single comma is used as the separator. The join method takes one argument, separator, and performs the following steps:
1. Let O be the result of calling ToObject passing the this value as the argument.
2. Let lenVal be the result of calling the [[Get]] internal method of O with argument "length".
3. Let len be ToUint32(lenVal).
4. If separator is undefined, let separator be the single-character String ",".
5. Let sep be ToString(separator).
6. If len is zero, return the empty String.
7. Let element0 be the result of calling the [[Get]] internal method of O with argument "0".
8. If element0 is undefined or null, let R be the empty String; otherwise, Let R be ToString(element0).
9. Let k be 1.
10. Repeat, while k < len
a. Let S be the String value produced by concatenating R and sep.
b. Let element be the result of calling the [[Get]] internal method of O with argument ToString(k).
c. If element is undefined or null, Let next be the empty String; otherwise, let next be ToString(element).
d. Let R be a String value produced by concatenating S and next.
e. Increase k by 1.
11. Return R.
So, the default behavior is very well defined, and won't change.
It is safe to use it as it is. Array.toString() has been the same since AS3 came out.
Return value of Array.toString() is by now equal to that of Array.join().
If you are that concerned about this behaviour not changing, use Array.join() (or, to be completely pedantic, Array.join(',')) explicitly, and you'll be safe. Joining array works this way since ActionScript exists and it's absolutely unlikely that Adobe will change something to it and loose backwards compatibility (and, well, sanity).

Resources