Is there anyone that knows what the following code possibly does? - c

/* utf-8: 0xc0, 0xe0, 0xf0, 0xf8, 0xfc */
static unsigned char _mblen_table_utf8[] =
{
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
};
I bet it has something to do with the encodings,
but how exactly it works?
UPDATE
while (str < ptr)
{
j = mblen[(*str)];
tree_nput(r->tree, cr, sizeof(struct rule_item), str, j);
str += j;
}
}

Because a character in a multibyte string has a variable length, this table maps each character to a length.
The last 64 characters are wider than one byte, having lengths of 2 to 6.
The usage would be something like that:
unsigned char current_char = *mbstr;
for (i = 0; i < _mblen_table_utf8[current_char]; i++) {
/* treat *mbstr++ as a part of the current character */
}

Historically, each character was coded on 7 bits (then 8 bits) which was more than enough to encode european languages alphabets.
Only the 128 first characters were common to everyone, the remaining 128 were standardized through codepages (ISO-8859-1 is an example).
The need to encode longer alphabet languages such as Chinese resulted in the Unicode effort were each character is coded on several bytes.
UTF-8 is a way to encode Unicode characters in an efficient, variable code-length way. This means that the first byte you read determines the length of the character byte-sequence.
Basically, your table is a lookup-table to check how many bytes is a character that start from the byte you use as table index. You will see another version of this table here with explanations.
I added the table indexes as comments to make it clearer:
/* utf-8: 0xc0, 0xe0, 0xf0, 0xf8, 0xfc */
static unsigned char _mblen_table_utf8[] =
{
/*0x00*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x10*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x20*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x30*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x40*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x50*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x60*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x70*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x80*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0x90*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0xA0*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0xB0*/ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
/*0xC0*/ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
/*0xD0*/ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
/*0xE0*/ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
/*0xF0*/ 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
};

The array appears to be a lookup table for determining the number of bytes in a UTF-8 character, given the first byte. Basically the first byte (as an unsigned value) is used as an index into the array, and the element at that index gives the length of the byte sequence for the UTF-8 character.
Invalid and mid-sequence bytes seem to map to 1-byte in this table, so if encountered out of place the code using this table would probably treat them as single characters (unless it specifically ignores them).
One use for a table like this is for counting characters in a UTF-8 string (not bytes, but Unicode characters). Each time you count a character, you look up the length and move ahead by the length of the character's byte sequence instead of moving ahead one byte... it works well as long as you start at the beginning of a character and the string is valid UTF-8 all the way through.

Without any further details, the code above does exactly this: it declares a static unsigned char array and initializes it with the values inside the curly brackets.

Related

Looping through a collection and deleting things on the way

I want to go through a collection and find the first pair of matching elements, but my current approach is having trouble with the indexing going out of bounds all the time.
Here's a simplified MWE example:
function processstuff(stuff)
for pointer1 in 1:length(stuff)
for pointer2 in pointer1:length(stuff)
println("$(stuff)")
pointer1 == pointer2 && continue
if stuff[pointer1] == stuff[pointer2]
# items match, remove them
deleteat!(stuff, pointer1)
deleteat!(stuff, pointer2)
end
end
end
end
processstuff(collect(rand(1:5, 20)))
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 3, 2, 1, 1]
[4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 3, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
ERROR: LoadError: BoundsError: attempt to access 16-element Array{Int64,1} at index [17]
(Obviously this example is just comparing two numbers, the real comparison isn't.)
The idea of updating the collection of stuff by removing both elements that have been processed looks like it works, because I think Julia updates the iteration thing each time through. But only for a while...?
You can use the following approach (assuming you want to remove pairs):
function processstuff!(stuff)
pointer1 = 1
while pointer1 < length(stuff)
for pointer2 in pointer1+1:length(stuff)
if stuff[pointer1] == stuff[pointer2]
deleteat!(stuff, (pointer1, pointer2))
pointer1 -= 1 # correct pointer location as we later add 1 to it
break
end
end
pointer1 += 1
end
end
In your code there were several problems:
you called deleteat! twice, which could invalidate indexing
your inner loop tried to delete pointer1 several times
in outer loop I use while to dynamically track changing size of stuff

Python Numpy repeating an arange array

so say I do this
x = np.arange(0, 3)
which gives
array([0, 1, 2])
but what can I do like
x = np.arange(0, 3)*repeat(N=3)times
to get
array([0, 1, 2, 0, 1, 2, 0, 1, 2])
I've seen several recent questions about resize. It isn't used often, but here's one case where it does just what you want:
In [66]: np.resize(np.arange(3),3*3)
Out[66]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
There are many other ways of doing this.
In [67]: np.tile(np.arange(3),3)
Out[67]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
In [68]: (np.arange(3)+np.zeros((3,1),int)).ravel()
Out[68]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
np.repeat doesn't repeat in the way we want
In [70]: np.repeat(np.arange(3),3)
Out[70]: array([0, 0, 0, 1, 1, 1, 2, 2, 2])
but even that can be reworked (this is a bit advanced):
In [73]: np.repeat(np.arange(3),3).reshape(3,3,order='F').ravel()
Out[73]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
EDIT: Refer to hpaulj's answer. It is frankly better.
The simplest way is to convert back into a list and use:
list(np.arange(0,3))*3
Which gives:
>> [0, 1, 2, 0, 1, 2, 0, 1, 2]
Or if you want it as a numpy array:
np.array(list(np.arange(0,3))*3)
Which gives:
>> array([0, 1, 2, 0, 1, 2, 0, 1, 2])
how about this one?
arr = np.arange(3)
res = np.hstack((arr, ) * 3)
Output
array([0, 1, 2, 0, 1, 2, 0, 1, 2])
Not much overhead I would say.

How to create a random mask array?

I've an array with 128 values, each value is 1:
length = 128
partials = Array.new length
partials.each_index do |i|
partials[i] = 1
end
I want to set value 0 on some (random) position (for example, on pos 1,6,50,70,100,112,120).
Of course, the number of position could be different every time, and if I choose 7 different position, I want to end with 7 different pos changed.
What's the faster way to do this in Ruby?
Assuming you want to have n elements with value 0, you can do the below:
n = 5
partials[0,n] = [0]*n
partials.shuffle
Alternatively, can also be written as:
partials.tap{|p| p[0,n] = [0]*n}.shuffle
You can incorporate the zeros into the array creation:
length = 128
zeros = 7
partials = Array.new(length) { |i| i < zeros ? 0 : 1 }.shuffle
#=> [1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
A way:
array = 128.times.map{1}
Or with randomly sprayed 0s:
array = 128.times.map{rand(2)}
or put a number of 0s later:
10.times{array[rand(128)]=0}
etc... Play with it and see what you need
Another alternative:
length = 10
zeros = 2
([0]*(length-zeros)+[1]*zeros).shuffle

Why is Flash doing array operations wrongly.

Its was runnning fine and then it through me this error
1125 Error #: 117 index is beyond the scope of 115.
It doesn't list a row number but the function below is the only place where a long array is referred to.
The error means its trying to access between end of the vector array- It shouldn't be possible.
Relevant code parts (the rest-public functions and other functions not include all work fine).
public class Main extends Sprite
{
internal var oneoff:Boolean = true;
internal var kanaList:Vector.<String> = new <String>["あ/ア", "あ/ア", "え/え", "え/え", "い/イ", "い/イ", "お/オ", "お/オ", "う/ウ", "う/ウ", "う/ウ", "う/ウ", "か/カ", "か/カ", "け/ケ", "け/ケ", "き/キ", "き/キ", "く/ク", "く/ク", "こ/コ", "こ/コ", "さ/サ", "さ/サ", " し/シ", " し/シ", "す/ス", "す/ス", "そ/ソ", "そ/ソ", "す/ス", "す/ス", "た/タ", "た/タ", "て/テ", "て/テ", " ち/チ", " ち/チ", "と/ト", "と/ト", "つ/ツ", "つ/ツ", "ら/ラ", "ら/ラ", "れ/レ", "れ/レ", "り/リ", "り/リ", "ろ/ロ", "ろ/ロ", "る/ル", "る/ル", "だ/ダ", "で/デ", "じ/ジ", "ど/ド", "ず/ズ", "ざ/ザ", "ぜ/ゼ", "ぞ/ゾ", "な/ナ", "ね/ネ", "に/二", "の/ノ", "ぬ/ヌ", "じゃ/ジャ", "じゅ/ジュ", "じょ/ジョ", "ん/ン", "しゃ/シャ", "しゅ/シュ", "しょ/ショ", "や/ヤ", "ゆ/ユ", "よ/ヨ", "は/ハ", "ひ/ヒ", "ふ/フ", "へ/ヘ", "ほ/ホ", "ば/バ", "ば/バ", "ぶ/ブ", "ぶ/ブ", "び/ビ", "び/ビ", "ぼ/ボ", "ぼ/ボ", "べ/ベ", "べ/ベ", "ぱ/パ", "ぴ/ピ", "ぷ/プ", "ぺ/ペ", "ぽ/ポ", "ま/マ", "み/ミ", " む/ム", "め/メ", "も/モ", "を/ヲ", "みゃ/ミャ", "みゅ/ミャ", "みょ/ミョ", "きゃ/キャ", "きゅ/キュ", "きょ/キョ", "にゃ/ニャ", "にゅ/ニュ", "にょ/ニョ", "びゃ/びゃ", "びゅ/ビュ", "びょ/ビョ", "  ひゃ/ヒャ", "ひゅ/ヒュ", "ひょ/ヒョ", "ぴゃ/ピャ", "ぴゅ/ピュ", "ぴょ/ピョ", "っ/ッ", "っ/ッ"];
internal var valueList:Vector.<uint>= new <uint>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 10, 10, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 20, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1, 1];
// Lists of Kana that can be replaced in the replace mode and the substitute Kana and Values
internal var selectghostList:Vector.<String>=new<String>["ま/マ","む/ム","も/モ","か/カ","く/ク","こ/コ","な/ナ","ぬ/ヌ","の/ノ","ば/バ","ぶ/ブ","ぼ/ボ","は/ハ","ふ/フ","ほ/ホ","ぱ/パ","ぷ/プ","ぽ/ポ"];
internal var selectkanaList:Vector.<String>=new <String>["みゃ/ミャ", "みゅ/ミャ", "みょ/ミョ", "きゃ/キャ", "きゅ/キュ", "きょ/キョ", "にゃ/ニャ", "にゅ/ニュ", "にょ/ニョ", "びゃ/びゃ", "びゅ/ビュ", "びょ/ビョ", "  ひゃ/ヒャ", "ひゅ/ヒュ", "ひょ/ヒョ", "ぴゃ/ピャ", "ぴゅ/ピュ", "ぴょ/ピョ"];
internal var selectghostvalueList:Vector.<uint>=new <uint>[2, 2, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2];
//Start list of playerHand contents as I don't know if Null is 0
internal var playernumber:uint;
internal var allplayersHand:Array = [[0], [0], [0], [0],[0], [0]];
internal var playerRound:uint = 1;
internal var round:uint = 1;
internal var aplayersHand:Array;
internal function create():void
{ var listLength:uint;
var row:uint
listLength = kanaList.length;
aplayersHand = allplayersHand[playerRound];
for (var i:uint = (aplayersHand.length); i <= 7; i+=1)
{row = int(Math.random() * listLength);  
trace (row);
trace(i);
aplayersHand[i] = [0, kanaList[row], valueList[row],]
trace (aplayersHand);
trace (aplayersHand[i]);
kanaList.splice(row,1);
valueList.splice(row,1);
}
deal();
}
I'm assuming it's throwing the error intermittently. The reason I think it's happening is that you stored long array's length in listLength, but didn't decrement its value after
kanaList.splice(row,1);
valueList.splice(row,1);
which is why, I think, row value calculated like
row = int(Math.random() * listLength);
would sometimes return a value which is greater than array's length at that iteration.
On a sidenote, it'd be great to have what all was traced till the point you got the error. Also, the exception should show stack trace, if you compile a debug version of swf and run it in a debug flash player. The stack trace is very very useful to track down bugs like these.

Leading zeros calculation with intrinsic function

I'm trying to optimize some code working in an embedded system (FLAC decoding, Windows CE, ARM 926 MCU).
The default implementation uses a macro and a lookup table:
/* counts the # of zero MSBs in a word */
#define COUNT_ZERO_MSBS(word) ( \
(word) <= 0xffff ? \
( (word) <= 0xff? byte_to_unary_table[word] + 24 : \
byte_to_unary_table[(word) >> 8] + 16 ) : \
( (word) <= 0xffffff? byte_to_unary_table[word >> 16] + 8 : \
byte_to_unary_table[(word) >> 24] ) \
)
static const unsigned char byte_to_unary_table[] = {
8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
However most CPU already have a dedicated instruction, bsr on x86 and clz on ARM (http://www.devmaster.net/articles/fixed-point-optimizations/), that should be more efficient.
On Windows CE we have the intrinsic function _CountLeadingZeros, that should just call that value. However it is 4 times slower than the macro (measured on 10 million of iterations).
How is possible that an intrinsic function, that (should) rely on a dedicated ASM instruction, is 4 times slower?
Check the disassembly. Are you sure that the compiler inserted the instruction? In the Remarks section there is this text:
This function can be implemented by
calling a runtime function.
I suspect that's what's happening in your case.
Note that the CLZ instruction is only available in ARMv5 and later. You need to tell the compiler if you want ARMv5 code:
/QRarch5 ARM5 Architecture
/QRarch5T ARM5T Architecture
(Microsoft incorrectly uses "ARM5" instead of "ARMv5")

Resources