Pre-allocate logical array with unassigned elements (not true or false) - arrays

I'm looking for the most efficient method of pre-allocating a logical array in MATLAB without specifying true or false at the time of pre-allocation.
When pre-allocating e.g. a 1×5 numeric array I can use nan(1,5). To my mind, this is better than using zeros(1,5), since I can easily tell which slots have been filled with data versus those that are yet to be filled. If using the zeros() solution it's hard to know whether any 0s are intentional 0s or just unfilled slots in the array.
I'm aware that I can pre-alloate a logical array using true(1,5) or false(1,5). The problem with these is similar to the use of zeros() in the numeric example; there's no way of knowing whether a slot is filled or not.
I know that one solution to this problem is to treat the array as numeric and pre-allocate using nan(1,5), and only converting to a logical array later when all the slots are filled. But this strikes me as inefficient.
Is there some smart way to pre-allocate a logical array in MATLAB and remain agnostic as to the actual content of that array until it is ready to be filled?

The short answer is no, the point of a logical array is that each element takes a single byte, and the implementation is only capable of storing only two states (true=1 or false=0). You might assume that logicals only need a single bit, but in fact they need 8 bits (a byte) to avoid compromising on performance.
If memory is a concern, you could use a single array instead of a double array, moving from 64-bit to 32-bit numbers and still capable of storing NaN. Then you can cast to logical whenever required (assuming you have no NaNs by that point, otherwise it will error).
If it was really important to track whether a value was ever assigned whilst also reducing memory, you could have a 2nd logical array which you update at the same time as the first, and stores simply whether a value was ever assigned. Then this can be used as a check on whether you have any default values left after assignments. Now we've dropped from 32-bit singles to two 8-bit logicals, which is worse than one logical but still twice as efficient than using floating point numbers for the sake of the NaN. Obviously assignment operations now take twice as long as using a single logical array, I don't know how they compare to float assignments.
Going off-piste, you could make your own class to do this assignment-tracking for you, and display the logical array as if it was capable of storing NaNs. This isn't really recommended but I've written the below code to complete the thought experiment.
Note you originally ask for "the most efficient method", in terms of execution time this is definitely not going to be as efficient than the native implementation of logical arrays.
classdef nanBool
properties
assigned % Tracks whether element of "value" was ever assigned
value % Tracks boolean array
end
methods
function obj = nanBool(varargin)
% Constructor: initialise main and tracking arrays to false
% handles same inputs as using "false()" normally
obj.value = false(varargin{:});
obj.assigned = false(size(obj.value));
end
function b = subsref(obj,S)
% Override the indexing operator so that indexing works like it
% would for a logical array unless accessing object properties
if strcmp(S.type,'.')
b = obj.(S.subs);
else
b = builtin('subsref',obj.value,S);
end
end
function obj = subsasgn(obj,S,B)
% Override the assignement operator so that the value array is
% updated when normal array indexing is used. In sync, update
% the assigned state for the corresponding elements
obj.value = builtin('subsasgn',obj.value,S,B);
obj.assigned = builtin('subsasgn',obj.assigned,S,true(size(B)));
end
function disp(obj)
% Override the disp function so printing to the command window
% renders NaN for elements which haven't been assigned
a = double(obj.value);
a(~obj.assigned) = NaN;
disp(a);
end
end
end
Test cases:
>> a = nanBool(3,1)
a =
NaN
NaN
NaN
>> a(2) = true
a =
NaN
1
NaN
>> a(3) = false
a =
NaN
1
0
>> a(:) = true
a =
1
1
1
>> whos a
Name Size Bytes Class Attributes
a 1x1 6 nanBool
>> b = false(3,1); whos b
Name Size Bytes Class Attributes
b 3x1 3 logical
Note the whos test shows this custom class has the same memory footprint as two logical arrays the same size. It also shows that the size is reported incorrectly, indicating we'd also have to override the size function in our custom class, I'm sure there are lots of other similar edge cases you'd want to handle.
you could check whether there's any "logical NaNs" (unassigned values) with something like this, or add a function which does this to the class:
fullyAssigned = all(a.assigned);
In 21b and newer you can do some more controlled indexing overrides for custom classes instead of subsref and subsasgn, but I can't test this:
https://uk.mathworks.com/help/matlab/customize-object-indexing.html

Related

Matrices multiplication [duplicate]

Why does the indexing in an array start with zero in C and not with 1?
In C, the name of an array is essentially a pointer [but see the comments], a reference to a memory location, and so the expression array[n] refers to a memory location n elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0].
For more info:
http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html
This question was posted over a year ago, but here goes...
About the above reasons
While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective, it isn't as relevant when it comes to programming.
The decision taken by the language specification & compiler-designers is based on the
decision made by computer system-designers to start count at 0.
The probable reason
Quoting from a Plea for Peace by Danny Cohen.
IEEE Link
IEN-137
For any base b, the first b^N
non-negative integers are represented by exactly N digits (including
leading zeros) only if numbering starts at 0.
This can be tested quite easily. In base-2, take 2^3 = 8
The 8th number is:
8 (binary: 1000) if we start count at 1
7 (binary: 111) if we start count at 0
111 can be represented using 3 bits, while 1000 will require an extra bit (4 bits).
Why is this relevant
Computer memory addresses have 2^N cells addressed by N bits. Now if we start counting at 1, 2^N cells would need N+1 address lines. The extra-bit is needed to access exactly 1 address. (1000 in the above case.). Another way to solve it would be to leave the last address inaccessible, and use N address lines.
Both are sub-optimal solutions, compared to starting count at 0, which would keep all addresses accessible, using exactly N address lines!
Conclusion
The decision to start count at 0, has since permeated all digital systems, including the software running on them, because it makes it simpler for the code to translate to what the underlying system can interpret. If it weren't so, there would be one unnecessary translation operation between the machine and programmer, for every array access. It makes compilation easier.
Quoting from the paper:
Who's on first? Zero or one?
People start counting from the number one. The very word first is abbreviated as 1st, which indicates one. This, however, is a very modern notation. The older concepts do not necessarily support this relationship. In English and French the word first is not derived from the word one, but from an old word for prince, which means foremost. Similarly, The English word second is not derived from the number two but from an old word which means "to follow." Obviously, there is a close relation between third and three, fourth and four, and so on. These relationships occur in other language families, also. In Hebrew, for example, first is derived from the word head, meaning "the foremost." The Hebrew word for second is derived from the word two, thisrelationship of ordinal and cardinal names holds for all the other numbers. For a very long time, people have counted from one, not from zero, As a matter of fact, the inclusion of zero as a full-fledged member of the set of all numbers is a relatively modern concept, even though it is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer. A nice mathematical theorem states that for any basis b the first bⁿ positive integers are represented by exactly n digits (leading zeros included). This is true if and only if the count starts with zero (hence, 0 through bⁿ-1), not with one (for 1 through bⁿ). This theorem is the basis of computer memory ad dressing. Typically, 2ⁿ cells are addressed by an N-bit addressing scheme. A count starting from one rather than zero would cause the loss of either one memory cell or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation that starts with zero. Good for them! This is probably the reason why all memories start at address-0, even those of systems that count bits from B1 up. The designers of the 1401 were probably ashamed to have address-0. They hid it from the users and pretended that the memory starts at address-1. Communication engineers, like most people, start counting from one. They never have to suffer the loss of a memory cell, for example. Therefore, they happily count one-to-eight, not zero-to-seven, as computer people do. ref
Because 0 is how far from the pointer to the head of the array to the array's first element.
Consider:
int foo[5] = {1,2,3,4,5};
To access 0 we do:
foo[0]
But foo decomposes to a pointer, and the above access has analogous pointer arithmetic way of accessing it
*(foo + 0)
These days pointer arithmetic isn't used as frequently. Way back when though, it was a convenient way to take an address and move X "ints" away from that starting point. Of course if you wanted to just stay where you are, you just add 0!
Because 0-based index allows...
array[index]
...to be implemented as...
*(array + index)
If index were 1-based, compiler would need to generate: *(array + index - 1), and this "-1" would hurt the performance.
Because it made the compiler and linker simpler (easier to write).
Reference:
"...Referencing memory by an address and an offset is represented directly in hardware on virtually all computer architectures, so this design detail in C makes compilation easier"
and
"...this makes for a simpler implementation..."
Array index always starts with zero.Let assume base address is 2000. Now arr[i] = *(arr+i). Now if i= 0, this means *(2000+0)is equal to base address or address of first element in array. this index is treated as offset, so bydeafault index starts from zero.
For the same reason that, when it's Wednesday and somebody asks you how many days til Wednesday, you say 0 rather than 1, and that when it's Wednesday and somebody asks you how many days until Thursday, you say 1 rather than 2.
I am from a Java background. I Have presented answer to this question in the diagram below which i have written in a piece of paper which is self explanatory
Main Steps:
Creating Reference
Instantiation of Array
Allocation of Data to array
Also note when array is just instantiated .... Zero is allocated to
all the blocks by default until we assign value for it
Array starts with zero because first address will be pointing to the
reference (i:e - X102+0 in image)
Note: Blocks shown in the image is memory representation
The most elegant explanation I've read for zero-based numbering is an observation that values aren't stored at the marked places on the number line, but rather in the spaces between them. The first item is stored between zero and one, the next between one and two, etc. The Nth item is stored between N-1 and N. A range of items may be described using the numbers on either side. Individual items are by convention described using the numbers below it. If one is given a range (X,Y), identifying individual numbers using the number below means that one can identify the first item without using any arithmetic (it's item X) but one must subtract one from Y to identify the last item (Y-1). Identifying items using the number above would make it easier to identify the last item in a range (it would be item Y), but harder to identify the first (X+1).
Although it wouldn't be horrible to identify items based upon the number above them, defining the first item in the range (X,Y) as being the one above X generally works out more nicely than defining it as the one below (X+1).
It is because the address has to point to the right element in the array. Let us assume the below array:
let arr = [10, 20, 40, 60];
Let us now consider the start of the address being 12 and the size of the element be 4 bytes.
address of arr[0] = 12 + (0 * 4) => 12
address of arr[1] = 12 + (1 * 4) => 16
address of arr[2] = 12 + (2 * 4) => 20
address of arr[3] = 12 + (3 * 4) => 24
If it was not zero-based, technically our first element address in the array would be 16 which is wrong as it's location is 12.
The technical reason might derive from the fact that the pointer to a memory location of an array is the contents of the first element of the array. If you declare the pointer with an index of one, programs would normally add that value of one to the pointer to access the content which is not what you want, of course.
Try to access a pixel screen using X,Y coordinates on a 1-based matrix. The formula is utterly complex. Why is complex? Because you end up converting the X,Y coords into one number, the offset. Why you need to convert X,Y to an offset? Because that's how memory is organized inside computers, as a continuous stream of memory cells (arrays). How computers deals with array cells? Using offsets (displacements from the first cell, a zero-based indexing model).
So at some point in the code you need (or the compiler needs) to convert the 1-base formula to a 0-based formula because that's how computers deal with memory.
In array, the index tells the distance from the starting element. So, the first element is at 0 distance from the starting element. So, that's why array start from 0.
Suppose we want to create an array of size 5
int array[5] = [2,3,5,9,8]
let the 1st element of the array is pointed at location 100
and let we consider the indexing starts from 1 not from 0.
now we have to find the location of the 1st element with the help of index
(remember the location of 1st element is 100)
since the size of an integer is 4-bit
therefore --> considering index 1 the position would be
size of index(1) * size of integer(4) = 4
so the actual position it will show us is
100 + 4 = 104
which is not true because the initial location was at 100.
it should be pointing to 100 not at 104
this is wrong
now suppose we have taken the indexing from 0
then the position of 1st element should be the size of index(0) * size of integer(4) = 0
therefore -->
location of 1st element is 100 + 0 = 100
and that was the actual location of the element
this is why indexing starts at 0;
first of all you need to know that arrays are internally considered as pointers because the "name of array itself contains the address of the first element of array "
ex. int arr[2] = {5,4};
consider that array starts at address 100
so element first element will be at address 100 and second will be at 104
now,
consider that if array index starts from 1, so
arr[1]:-
this can be written in the pointers expression like this-
arr[1] = *(arr + 1 * (size of single element of array));
consider size of int is 4bytes, now,
arr[1] = *(arr + 1 * (4) );
arr[1] = *(arr + 4);
as we know array name contains the address of its first element so arr = 100
now,
arr[1] = *(100 + 4);
arr[1] = *(104);
which gives,
arr[1] = 4;
because of this expression we are unable to access the element at address 100 which is official first element,
now consider array index starts from 0, so
arr[0]:-
this will be resolved as
arr[0] = *(arr + 0 + (size of type of array));
arr[0] = *(arr + 0 * 4);
arr[0] = *(arr + 0);
arr[0] = *(arr);
now, we know that array name contains the address of its first element
so,
arr[0] = *(100);
which gives correct result
arr[0] = 5;
therefore array index always starts from 0 in c.
reference: all details are written in book "The C programming language by brian kerninghan and dennis ritchie"
Array name is a constant pointer pointing to the base address.When you use arr[i] the compiler manipulates it as *(arr+i).Since int range is -128 to 127,the compiler thinks that -128 to -1 are negative numbers and 0 to 128 are positive numbers.So array index always starts with zero.

What am I doing wrong here? Why the code is not giving the required output? [duplicate]

Why does the indexing in an array start with zero in C and not with 1?
In C, the name of an array is essentially a pointer [but see the comments], a reference to a memory location, and so the expression array[n] refers to a memory location n elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0].
For more info:
http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html
This question was posted over a year ago, but here goes...
About the above reasons
While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective, it isn't as relevant when it comes to programming.
The decision taken by the language specification & compiler-designers is based on the
decision made by computer system-designers to start count at 0.
The probable reason
Quoting from a Plea for Peace by Danny Cohen.
IEEE Link
IEN-137
For any base b, the first b^N
non-negative integers are represented by exactly N digits (including
leading zeros) only if numbering starts at 0.
This can be tested quite easily. In base-2, take 2^3 = 8
The 8th number is:
8 (binary: 1000) if we start count at 1
7 (binary: 111) if we start count at 0
111 can be represented using 3 bits, while 1000 will require an extra bit (4 bits).
Why is this relevant
Computer memory addresses have 2^N cells addressed by N bits. Now if we start counting at 1, 2^N cells would need N+1 address lines. The extra-bit is needed to access exactly 1 address. (1000 in the above case.). Another way to solve it would be to leave the last address inaccessible, and use N address lines.
Both are sub-optimal solutions, compared to starting count at 0, which would keep all addresses accessible, using exactly N address lines!
Conclusion
The decision to start count at 0, has since permeated all digital systems, including the software running on them, because it makes it simpler for the code to translate to what the underlying system can interpret. If it weren't so, there would be one unnecessary translation operation between the machine and programmer, for every array access. It makes compilation easier.
Quoting from the paper:
Who's on first? Zero or one?
People start counting from the number one. The very word first is abbreviated as 1st, which indicates one. This, however, is a very modern notation. The older concepts do not necessarily support this relationship. In English and French the word first is not derived from the word one, but from an old word for prince, which means foremost. Similarly, The English word second is not derived from the number two but from an old word which means "to follow." Obviously, there is a close relation between third and three, fourth and four, and so on. These relationships occur in other language families, also. In Hebrew, for example, first is derived from the word head, meaning "the foremost." The Hebrew word for second is derived from the word two, thisrelationship of ordinal and cardinal names holds for all the other numbers. For a very long time, people have counted from one, not from zero, As a matter of fact, the inclusion of zero as a full-fledged member of the set of all numbers is a relatively modern concept, even though it is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer. A nice mathematical theorem states that for any basis b the first bⁿ positive integers are represented by exactly n digits (leading zeros included). This is true if and only if the count starts with zero (hence, 0 through bⁿ-1), not with one (for 1 through bⁿ). This theorem is the basis of computer memory ad dressing. Typically, 2ⁿ cells are addressed by an N-bit addressing scheme. A count starting from one rather than zero would cause the loss of either one memory cell or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation that starts with zero. Good for them! This is probably the reason why all memories start at address-0, even those of systems that count bits from B1 up. The designers of the 1401 were probably ashamed to have address-0. They hid it from the users and pretended that the memory starts at address-1. Communication engineers, like most people, start counting from one. They never have to suffer the loss of a memory cell, for example. Therefore, they happily count one-to-eight, not zero-to-seven, as computer people do. ref
Because 0 is how far from the pointer to the head of the array to the array's first element.
Consider:
int foo[5] = {1,2,3,4,5};
To access 0 we do:
foo[0]
But foo decomposes to a pointer, and the above access has analogous pointer arithmetic way of accessing it
*(foo + 0)
These days pointer arithmetic isn't used as frequently. Way back when though, it was a convenient way to take an address and move X "ints" away from that starting point. Of course if you wanted to just stay where you are, you just add 0!
Because 0-based index allows...
array[index]
...to be implemented as...
*(array + index)
If index were 1-based, compiler would need to generate: *(array + index - 1), and this "-1" would hurt the performance.
Because it made the compiler and linker simpler (easier to write).
Reference:
"...Referencing memory by an address and an offset is represented directly in hardware on virtually all computer architectures, so this design detail in C makes compilation easier"
and
"...this makes for a simpler implementation..."
Array index always starts with zero.Let assume base address is 2000. Now arr[i] = *(arr+i). Now if i= 0, this means *(2000+0)is equal to base address or address of first element in array. this index is treated as offset, so bydeafault index starts from zero.
For the same reason that, when it's Wednesday and somebody asks you how many days til Wednesday, you say 0 rather than 1, and that when it's Wednesday and somebody asks you how many days until Thursday, you say 1 rather than 2.
I am from a Java background. I Have presented answer to this question in the diagram below which i have written in a piece of paper which is self explanatory
Main Steps:
Creating Reference
Instantiation of Array
Allocation of Data to array
Also note when array is just instantiated .... Zero is allocated to
all the blocks by default until we assign value for it
Array starts with zero because first address will be pointing to the
reference (i:e - X102+0 in image)
Note: Blocks shown in the image is memory representation
The most elegant explanation I've read for zero-based numbering is an observation that values aren't stored at the marked places on the number line, but rather in the spaces between them. The first item is stored between zero and one, the next between one and two, etc. The Nth item is stored between N-1 and N. A range of items may be described using the numbers on either side. Individual items are by convention described using the numbers below it. If one is given a range (X,Y), identifying individual numbers using the number below means that one can identify the first item without using any arithmetic (it's item X) but one must subtract one from Y to identify the last item (Y-1). Identifying items using the number above would make it easier to identify the last item in a range (it would be item Y), but harder to identify the first (X+1).
Although it wouldn't be horrible to identify items based upon the number above them, defining the first item in the range (X,Y) as being the one above X generally works out more nicely than defining it as the one below (X+1).
It is because the address has to point to the right element in the array. Let us assume the below array:
let arr = [10, 20, 40, 60];
Let us now consider the start of the address being 12 and the size of the element be 4 bytes.
address of arr[0] = 12 + (0 * 4) => 12
address of arr[1] = 12 + (1 * 4) => 16
address of arr[2] = 12 + (2 * 4) => 20
address of arr[3] = 12 + (3 * 4) => 24
If it was not zero-based, technically our first element address in the array would be 16 which is wrong as it's location is 12.
The technical reason might derive from the fact that the pointer to a memory location of an array is the contents of the first element of the array. If you declare the pointer with an index of one, programs would normally add that value of one to the pointer to access the content which is not what you want, of course.
Try to access a pixel screen using X,Y coordinates on a 1-based matrix. The formula is utterly complex. Why is complex? Because you end up converting the X,Y coords into one number, the offset. Why you need to convert X,Y to an offset? Because that's how memory is organized inside computers, as a continuous stream of memory cells (arrays). How computers deals with array cells? Using offsets (displacements from the first cell, a zero-based indexing model).
So at some point in the code you need (or the compiler needs) to convert the 1-base formula to a 0-based formula because that's how computers deal with memory.
In array, the index tells the distance from the starting element. So, the first element is at 0 distance from the starting element. So, that's why array start from 0.
Suppose we want to create an array of size 5
int array[5] = [2,3,5,9,8]
let the 1st element of the array is pointed at location 100
and let we consider the indexing starts from 1 not from 0.
now we have to find the location of the 1st element with the help of index
(remember the location of 1st element is 100)
since the size of an integer is 4-bit
therefore --> considering index 1 the position would be
size of index(1) * size of integer(4) = 4
so the actual position it will show us is
100 + 4 = 104
which is not true because the initial location was at 100.
it should be pointing to 100 not at 104
this is wrong
now suppose we have taken the indexing from 0
then the position of 1st element should be the size of index(0) * size of integer(4) = 0
therefore -->
location of 1st element is 100 + 0 = 100
and that was the actual location of the element
this is why indexing starts at 0;
first of all you need to know that arrays are internally considered as pointers because the "name of array itself contains the address of the first element of array "
ex. int arr[2] = {5,4};
consider that array starts at address 100
so element first element will be at address 100 and second will be at 104
now,
consider that if array index starts from 1, so
arr[1]:-
this can be written in the pointers expression like this-
arr[1] = *(arr + 1 * (size of single element of array));
consider size of int is 4bytes, now,
arr[1] = *(arr + 1 * (4) );
arr[1] = *(arr + 4);
as we know array name contains the address of its first element so arr = 100
now,
arr[1] = *(100 + 4);
arr[1] = *(104);
which gives,
arr[1] = 4;
because of this expression we are unable to access the element at address 100 which is official first element,
now consider array index starts from 0, so
arr[0]:-
this will be resolved as
arr[0] = *(arr + 0 + (size of type of array));
arr[0] = *(arr + 0 * 4);
arr[0] = *(arr + 0);
arr[0] = *(arr);
now, we know that array name contains the address of its first element
so,
arr[0] = *(100);
which gives correct result
arr[0] = 5;
therefore array index always starts from 0 in c.
reference: all details are written in book "The C programming language by brian kerninghan and dennis ritchie"
Array name is a constant pointer pointing to the base address.When you use arr[i] the compiler manipulates it as *(arr+i).Since int range is -128 to 127,the compiler thinks that -128 to -1 are negative numbers and 0 to 128 are positive numbers.So array index always starts with zero.

Is there a more lightweight alternative to array?

I need to create an array with 3 billion boolean variables. My memory is only 4GB, therefore I need this array to be very tight (at most one byte per variable). Theoretically this should be possible. But I found that Ruby uses way too much space for one boolean variable in an array.
ObjectSpace.memsize_of(Array.new(100, false)) #=> 840
That's more than 8 bytes per variable. I would like to know if there's a more lightweight implementation of C-arrays in Ruby.
Apart from a small profile, I also need each boolean this array to be fast accessible, because I need to flip them as fast as possible on demand.
Ruby isn't a well performing language, especially in memory use. As other said, you should put your booleans in numbers. You'll lose a lot of memory due to ruby's 'objetification'. If it is a bad scenario to you, you may store into strings of a large length and store the strings in a array, losing less memory.
http://calleerlandsson.com/2014/02/06/rubys-bitwise-operators/
You also can implement your own gem in C++, that can naturally use bits and doubles, losing less memory. And array of doubles means 64 booleans in each position, more than sufficient to your application.
Extremely large objects are always a problem and will require you to implement a lot to make easier to work with your large collection of objects. Surely you'll have to at least implement some kind of method to acess some position in an array of objects that store more than one boolean, and other to flip them.
The following class may not be exactly what you're looking for. It will store 1's or 0's into an array using bits and shifting. Entries default to 0. If you need three states for each entry, 0, 1, or nil, then you'd need to change it to use two bits for each entry, rather than one.
class BitArray < Array
BITS_PER_WORD = 0.size * 8
MASK = eval("0x#{'FF' * (BITS_PER_WORD/8)}") - 1
def []=(n, value_0_or_1)
word = word_at(n / BITS_PER_WORD) || 0
word &= MASK << n % BITS_PER_WORD
super(n / BITS_PER_WORD, value_0_or_1 << (n % BITS_PER_WORD) | word)
end
def [](n)
return 0 if word_at(n / BITS_PER_WORD).nil?
(super(n / BITS_PER_WORD) >> (n % BITS_PER_WORD)) & 1
end
def word_at(n)
Array.instance_method('[]').bind(self).call(n)
end
end

Matlab wrong in array multiplication?

I have this simple program:
% Read Image:
I=imread('Bureau.bmp');
% calculate Hist:
G= unique(I); % Calculate the different gray values
Hist= zeros(size(G)); % initialize an array with the same size as G
% For each different gray value, loop all the image, and each time you find
% a value that equals the gray value, increment the hist by 1
for j=1:numel(G)
for i= 1:numel (I)
if G(j)== I(i)
Hist(j)=Hist(j)+1;
end
end
end
Now look at this multiplication:
>> G(2)
ans =
1
>> Hist(2)
ans =
550
>> Hist(2)*G(2)
ans =
255
And it's giving me 255 not only for the index 2, but for any combination of indexes!
Two things for your problem.
First, here is the reason of your problem of multiplication: different types. I and so Gare of type uint8. H is of type double. When you perform the multiplication, Matlab seems to use the most restrictive type, so here uint8. So the result of Hist(2)*G(2) is of type uint8, comprised between 0 and 255.
Second: please DON'T compute an histogram this way. Matlab has numerous functions for that (hist and histc for the most common ones), so please read the doc and use it instead of creating your own code. If you want nevertheless write your own function (learning purpose), this code is far too slow. You go through the image about 256 times, it is useless. Instead of that, a classic way would be:
Hist = zeros(1,256);
for i=1:numel(I)
Hist(int32(I(i))+1) = Hist(int32(I(i))+1)+1
end
You use directly the value of the current pixel (+1 because index starts at 1 in Matlab) to access the corresponding slot of your histogram. Also, you must cast the pixel value to int32, to avoid the problem of value 255 (with uint8 variables, 255+1=0).
I don't want here to be pedantic, but Matlab comes with thousands of functions (without mentioning the dozens of toolboxes) and a very well-written doc, so please read it and use every suitable you can find inside, that's the best advice I could give to anybody who starts learning Matlab.

How to define 2-bit numbers in C, if possible?

For my university process I'm simulating a process called random sequential adsorption.
One of the things I have to do involves randomly depositing squares (which cannot overlap) onto a lattice until there is no more room left, repeating the process several times in order to find the average 'jamming' coverage %.
Basically I'm performing operations on a large array of integers, of which 3 possible values exist: 0, 1 and 2. The sites marked with '0' are empty, the sites marked with '1' are full. Initially the array is defined like this:
int i, j;
int n = 1000000000;
int array[n][n];
for(j = 0; j < n; j++)
{
for(i = 0; i < n; i++)
{
array[i][j] = 0;
}
}
Say I want to deposit 5*5 squares randomly on the array (that cannot overlap), so that the squares are represented by '1's. This would be done by choosing the x and y coordinates randomly and then creating a 5*5 square of '1's with the topleft point of the square starting at that point. I would then mark sites near the square as '2's. These represent the sites that are unavailable since depositing a square at those sites would cause it to overlap an existing square. This process would continue until there is no more room left to deposit squares on the array (basically, no more '0's left on the array)
Anyway, to the point. I would like to make this process as efficient as possible, by using bitwise operations. This would be easy if I didn't have to mark sites near the squares. I was wondering whether creating a 2-bit number would be possible, so that I can account for the sites marked with '2'.
Sorry if this sounds really complicated, I just wanted to explain why I want to do this.
You can't create a datatype that is 2-bits in size since it wouldn't be addressable. What you can do is pack several 2-bit numbers into a larger cell:
struct Cell {
a : 2;
b : 2;
c : 2;
d : 2;
};
This specifies that each of the members a, b, c and d should occupy two bits in memory.
EDIT: This is just an example of how to create 2-bit variables, for the actual problem in question the most efficient implementation would probably be to create an array of int and wrap up the bit fiddling in a couple of set/get methods.
Instead of a two-bit array you could use two separate 1-bit arrays. One holds filled squares and one holds adjacent squares (or available squares if this is more efficient).
I'm not really sure that this has any benefit though over packing 2-bit fields into words.
I'd go for byte arrays unless you are really short of memory.
The basic idea
Unfortunately, there is no way to do this in C. You can create arrays of 1 byte, 2 bytes, etc., but you can't create areas of bits.
The best thing you can do, then, is to write a new library for yourself, which makes it look like you're dealing with arrays of 2 bits, but in reality does a lot of hard work. The same way that the string libraries give you functions that work on "strings" (which in C are just arrays), you'll be creating a new library which works on "bit arrays" (which in reality will be arrays of integers, with a few special functions to deal with them as-if they were arrays of bits).
NOTE: If you're new to C, and haven't learned the ideas of "creating a new library/module", or the concept of "abstraction", then I'd recommend learning about them before you continue with this project. Understanding them is IMO more important than optimizing your program to use a little less space.
How to implement this new "library" or module
For your needs, I'd create a new module called "2-bit array", which exports functions for dealing with the 2-bit arrays, as you need them.
It would have a few functions that deal with setting/reading bits, so that you can work with it as if you have an actual array of bits (you'll actually have an array of integers or something, but the module will make it seem like you have an array of bits).
Using this module would like something like this:
// This is just an example of how to use the functions in the twoBitArray library.
twoB my_array = Create2BitArray(size); // This will "create" a twoBitArray and return it.
SetBit(twoB, 5, 1); // Set bit 5 to 1 //
bit b = GetBit(twoB, 5); // Where bit is typedefed to an int by your module.
What the module will actually do is implement all these functions using regular-old arrays of integers.
For example, the function GetBit(), for GetBit(my_arr, 17), will calculate that it's the 1st bit in the 4th integer of your array (depending on sizeof(int), obviously), and you'd return it by using bitwise operations.
You can compact one dimension of array into sub-integer cells. To convert coordinate (lets say x for example) to position inside byte:
byte cell = array[i][ x / 4 ];
byte mask = 0x0004 << (x % 4);
byte data = (cell & mask) >> (x % 4);
to write data do reverse

Resources