Why is the answer to: How many ways to write a 15 bit string with at least 3 1s - permutation

I was going over my textbook to review permutations and combinatorics, which I have great difficulty comprehending despite seeming simple and came across this problem.
How many ways are there to write a length 15 string using binary if there must be exactly 3 "1's" and 12 "0's".
The answer to the problem was C(15, 3) or C(15, 12). Now, I understand why there are two possible solutions to the problem, but I'm puzzled as to why the answer is C(15, 12) || C(15, 3)
From my understanding, we're choosing three (or twelve) of the digits to be 1 (or 0), which is good and all, but how does that ensure that the remaining digits are the remainings 0's or 1's?
tl;dr: By using C(15,3) we ensure that we have the # of ways three digits will be 1, but how does that guarantee the remaining 12 will be 0s?

Go back to first principals:
Start with all 15 bits set to 0 [1 way to do this]
Choose 1 bit and flip it [15 ways to do this]
Choose a different bit and flip it [14 ways to do this]
Choose yet another bit and flip it [13 ways to do this]
It should be clear that exactly 3 bits are 1's and the remaining 12 are 0's
Total number of ways to do this: 1 x 15 x 14 x 13 = C(15, 3)

Related

Integer compression method

How can I compress a row of integers into something shorter ?
Like:
Input: '1 2 4 5 3 5 2 3 1 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 5 3 5 2 3 1 2 3 4')
Note:Input will only contain numbers between 1-5 and the total string of number will be 10-16
Is there any way I can compress it to 3-5 numbers?
Here is one way. First, subtract one from each of your little numbers. For your example input that results in
0 1 3 4 2 4 1 2 0 1 2 3
Now treat that as the base-5 representation of an integer. (You can choose either most significant digit first or last.) Calculate the number in binary that means the same thing. Now you have a single integer that "compressed" your string of little numbers. Since you have shown no code of your own, I'll just stop here. You should be able to implement this easily.
Since you will have at most 16 little numbers, the maximum resulting value from that algorithm will be 5^16 which is 152,587,890,625. This fits into 38 bits. If you need to store smaller numbers than that, convert your resulting value into another, larger number base, such as 2^16 or 2^32. The former would result in 3 numbers, the latter in 2.
#SergGr points out in a comment that this method does not show the number of integers encoded. If that is not stored separately, that can be a problem, since the method does not distinguish between leading zeros and coded zeros. There are several ways to handle that, if you need the number of integers included in the compression. You could require the most significant digit to be 1 (first or last depends on where the most significant number is.) This increases the number of bits by one, so you now may need 39 bits.
Here is a toy example of variable length encoding. Assume we want to encode two strings: 1 2 3 and 1 2 3 0 0. How the results will be different? Let's consider two base-5 numbers 321 and 00321. They represent the same value but still let's convert them into base-2 preserving the padding.
1 + 2*5 + 3*5^2 = 86 dec = 1010110 bin
1 + 2*5 + 3*5^2 + 0*5^3 + 0*5^4 = 000001010110 bin
Those additional 0 in the second line mean that the biggest 5-digit base-5 number 44444 has a base-2 representation of 110000110100 so the binary representation of the number is padded to the same size.
Note that there is no need to pad the first line because the biggest 3-digit base-5 number 444 has a base-2 representation of 1111100 i.e. of the same length. For an initial string 3 2 1 some padding will be required in this case as well, so padding might be required even if the top digits are not 0.
Now lets add the most significant 1 to the binary representations and that will be our encoded values
1 2 3 => 11010110 binary = 214 dec
1 2 3 0 0 => 1000001010110 binary = 4182 dec
There are many ways to decode those values back. One of the simplest (but not the most efficient) is to first calculate the number of base-5 digits by calculating floor(log5(encoded)) and then remove the top bit and fill the digits one by one using mod 5 and divide by 5 operations.
Obviously such encoding of variable-length always adds exactly 1 bit of overhead.
Its call : polidatacompressor.js but license will be cost you, you have to ask author about prices LOL
https://github.com/polidatacompressor/polidatacompressor
Ncomp(65535) will output: 255, 255 and when you store this in database as bytes you got 2 char
another way is to use "Hexadecimal aka base16" in javascript (1231).toString(16) give you '4cf' in 60% situation it compress char by -1
Or use base10 to base64 https://github.com/base62/base62.js/
4131 --> 14D
413131 --> 1Jtp

How should I selectively sum multiple axes of an array?

What is the preferred approach in J for selectively summing multiple axes of an array?
For instance, suppose that a is the following rank 3 array:
]a =: i. 2 3 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
My goal is to define a dyad "sumAxes" to sum over multiple axes of my choosing:
0 1 sumAxes a NB. 0+4+8+12+16+20 ...
60 66 72 78
0 2 sumAxes a NB. 0+1+2+3+12+13+14+15 ...
60 92 124
1 2 sumAxes a NB. 0+1+2+3+4+5+6+7+8+9+10+11 ...
66 210
The way that I am currently trying to implement this verb is to use the dyad |: to first permute the axes of a, and then ravel the items of the necessary rank using ,"n (where n is the number axes I want to sum over) before summing the resulting items:
sumAxes =: dyad : '(+/ # ,"(#x)) x |: y'
This appears to work as I want, but as a beginner in J I am unsure if I am overlooking some aspect of rank or particular verbs that would enable a cleaner definition. More generally I wonder whether permuting axes, ravelling and summing is idiomatic or efficient in this language.
For context, most of my previous experience with array programming is with Python's NumPy library.
NumPy does not have J's concept of rank and instead expects the user to explicitly label the axes of an array to reduce over:
>>> import numpy
>>> a = numpy.arange(2*3*4).reshape(2, 3, 4) # a =: i. 2 3 4
>>> a.sum(axis=(0, 2)) # sum over specified axes
array([ 60, 92, 124])
As a footnote, my current implementation of sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified (as rank is not interchangeable with "axis").
Motivation
J has incredible facilities for handling arbitrarily-ranked arrays. But there's one facet of the language which is simultaneously almost universally useful as well as justified, but also somewhat antithetical to this dimensionality-agnostic nature.
The major axis (in fact, leading axes in general) are implicitly privileged. This is the concept that underlies, e.g. # being the count of items (i.e. the dimension of the first axis), the understated elegance and generality of +/ without further modification, and a host of other beautiful parts of the language.
But it's also what accounts for the obstacles you're meeting in trying to solve this problem.
Standard approach
So the general approach to solving the problem is just as you have it: transpose or otherwise rearrange the data so the axes that interest you become leading axes. Your approach is classic and unimpeachable. You can use it in good conscience.
Alternative approaches
But, like you, it niggles me a bit that we are forced to jump through such hoops in similar circumstances. One clue that we're kind of working against the grain of the language is the dynamic argument to the conjunction "(#x); usually arguments to conjunctions are fixed, and calculating them at runtime often forces us to use either explicit code (as in your example) or dramatically more complicated code. When the language makes something hard to do, it's usually a sign you're cutting against the grain.
Another is that ravel (,). It's not just that we want to transpose some axes; it's that we want to focus on one specific axis, and then run all the elements trailing it into a flat vector. Though I actually think this reflects more a constraint imposed by how we're framing the problem, rather than one in the notation. More on in the final section of this post.
With that, we might feel justified in our desire to address a non-leading axis directly. And, here and there, J provides primitives that allow us to do exactly that, which might be a hint that the language's designers also felt the need to include certain exceptions to the primacy of leading axes.
Introductory examples
For example, dyadic |. (rotate) has ranks 1 _, i.e. it takes a vector on the left.
This is sometimes surprising to people who have been using it for years, never having passed more than a scalar on the left. That, along with the unbound right rank, is another subtle consequence of J's leading-axis bias: we think of the right argument as a vector of items, and the left argument as a simple, scalar rotation value of that vector.
Thus:
3 |. 1 2 3 4 5 6
4 5 6 1 2 3
and
1 |. 1 2 , 3 4 ,: 5 6
3 4
5 6
1 2
But in this latter case, what if we didn't want to treat the table as a vector of rows, but as a vector of columns?
Of course, the classic approach is to use rank, to explicitly denote the the axis we're interested in (because leaving it implicit always selects the leading axis):
1 |."1 ] 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Now, this is perfectly idiomatic, standard, and ubiquitous in J code: J encourages us to think in terms of rank. No one would blink an eye on reading this code.
But, as described at the outset, in another sense it can feel like a cop-out, or manual adjustment. Especially when we want to dynamically choose the rank at runtime. Notationally, we are now no longer addressing the array as a whole, but addressing each row.
And this is where the left rank of |. comes in: it's one of those few primitives which can address non-leading axes directly.
0 1 |. 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Look ma, no rank! Of course, we now have to specify a rotation value for each axis independently, but that's not only ok, it's useful, because now that left argument smells much more like something which can be calculated from the input, in true J spirit.
Summing non-leading axes directly
So, now that we know J lets us address non-leading axes in certain cases, we simply have to survey those cases and identify one which seems fit for our purpose here.
The primitive I've found most generally useful for non-leading-axis work is ;. with a boxed left-hand argument. So my instinct is to reach for that first.
Let's start with your examples, slightly modified to see what we're summing.
]a =: i. 2 3 4
sumAxes =: dyad : '(< # ,"(#x)) x |: y'
0 1 sumAxes a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
0 2 sumAxes a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
1 2 sumAxes a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
The relevant part of the definition of for dyads derived from ;.1 and friends is:
The frets in the dyadic cases 1, _1, 2 , and _2 are determined by the 1s in boolean vector x; an empty vector x and non-zero #y indicates the entire of y. If x is the atom 0 or 1 it is treated as (#y)#x. In general, boolean vector >j{x specifies how axis j is to be cut, with an atom treated as (j{$y)#>j{x.
What this means is: if we're just trying to slice an array along its dimensions with no internal segmentation, we can simply use dyad cut with a left argument consisting solely of 1s and a:s. The number of 1s in the vector (ie. the sum) determines the rank of the resulting array.
Thus, to reproduce the examples above:
('';'';1) <#:,;.1 a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
('';1;'') <#:,;.1 a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
(1;'';'') <#:,;.1 a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
Et voila. Also, notice the pattern in the left hand argument? The two aces are exactly at the indices of your original calls to sumAxe. See what I mean by the fact that providing a value for each dimension smelling like a good thing, in the J spirit?
So, to use this approach to provide an analog to sumAxe with the same interface:
sax =: dyad : 'y +/#:,;.1~ (1;a:#~r-1) |.~ - {. x -.~ i. r=.#$y' NB. Explicit
sax =: ] +/#:,;.1~ ( (] (-#{.#] |. 1 ; a: #~ <:#[) (-.~ i.) ) ##$) NB. Tacit
Results elided for brevity, but they're identical to your sumAxe.
Final considerations
There's one more thing I'd like to point out. The interface to your sumAxe call, calqued from Python, names the two axes you'd like "run together". That's definitely one way of looking at it.
Another way of looking at it, which draws upon the J philosophies I've touched on here, is to name the axis you want to sum along. The fact that this is our actual focus is confirmed by the fact that we ravel each "slice", because we do not care about its shape, only its values.
This change in perspective to talk about the thing you're interested in, has the advantage that it is always a single thing, and this singularity permits certain simplifications in our code (again, especially in J, where we usually talk about the [new, i.e. post-transpose] leading axis)¹.
Let's look again at our ones-and-aces vector arguments to ;., to illustrate what I mean:
('';'';1) <#:,;.1 a
('';1;'') <#:,;.1 a
(1;'';'') <#:,;.1 a
Now consider the three parenthesized arguments as a single matrix of three rows. What stands out to you? To me, it's the ones along the anti-diagonal. They are less numerous, and have values; by contrast the aces form the "background" of the matrix (the zeros). The ones are the true content.
Which is in contrast to how our sumAxe interface stands now: it asks us to specify the aces (zeros). How about instead we specify the 1, i.e. the axis that actually interests us?
If we do that, we can rewrite our functions thus:
xas =: dyad : 'y +/#:,;.1~ (-x) |. 1 ; a: #~ _1 + #$y' NB. Explicit
xas =: ] +/#:,;.1~ -#[ |. 1 ; a: #~ <:###$#] NB. Tacit
And instead of calling 0 1 sax a, you'd call 2 xas a, instead of 0 2 sax a, you'd call 1 xas a, etc.
The relative simplicity of these two verbs suggests J agrees with this inversion of focus.
¹ In this code I'm assuming you always want to collapse all axes except 1. This assumption is encoded in the approach I use to generate the ones-and-aces vector, using |..
However, your footnote sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified suggests sometimes you want to only collapse one axis.
That's perfectly possible and the ;. approach can take arbitrary (orthotopic) slices; we'd only need to alter the method by which we instruct it (generate the 1s-and-aces vector). If you provide a couple examples of generalizations you'd like, I'll update the post here. Probably just a matter of using (<1) x} a: #~ #$y or ((1;'') {~ (e.~ i.###$)) instead of (-x) |. 1 ; a:#~<:#$y.

Can big-endian order be associated with the way Englishmen say the numbers under 100 and small-endian with the way Germans say those numbers?

I was thinking that for me and most people around me big-endian order of the bytes in memory seems the most natural way of arranging numbers.
You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight
The most significant digit is written first and then you continue to write the next digits from the next most significant to the least significant This is the same way you say the numbers.
But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit.
I think this is a good analogy to endianness.
"I was thinking that for me... Big-endian order of the bytes in memory seems the most natural way of arranging numbers... You start with the most significant bytes, just like you write the numbers down"
Actually all binary data (zero/one bits) is written in MSB format. We always write the value as starting with MSD (Most-Significant Digit) on the left side, just like in real-life.
However, with having 8 slots within a byte to fill, we write the value itself starting from right side and increasing upwards by shifting to the left. PS: Endianness only applies at multi-byte level.
Summarily: In a single byte (holding a < 100 value like 28 or even 99)
The value 28 is written as 28 (but since it's binary format, it looks like : 11100).
To write value we start at right side : x x x 1 1 1 0 0 (where most-left 1is the MSD).
So the value itself is written in MSB style, but noted within the byte using LSB style of writing.
There is no concept of endiannes within a single-byte value
Example : Imagine bits were slots for holding 0-9 digits...
We still write 28 as : [0 0 0 0 0 0 2 8] so the twenties part is placed like MSB but the whole value starts from the right as if written in LSB style.
Since a single byte does not have endianness, writing value 28 is never going to look like : [0 0 0 0 0 0 8 2] and never as [2 8 0 0 0 0 0 0] since that would give an incorrect 82 or incorrect 28 million values.
"You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight... But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit. I think this is a good analogy to endianness."
Sorry. No it isn't. It stopped being a good analogy as a soon as you mentioned that it involves one byte. A verbally spoken eight-twenty phrase could mean a different thing compared to the written decimal value 820.
What about the English eight-ten (aka eight-teen) for value 18? By your logic the Germans also say eight-ten, right? What happens to eight-ten when a machine is told to simply "reverse" the input when converting between English and German style?

Different bases for radix sort in C

I am having a difficult time understanding radix sort. I have no problems implementing code to work with bases of 2 or 10. However, I have an assignment that requires a command line argument to specify the radix. The radix can be anywhere from 2 - 100,000. I have spent around 10 hours trying to understand this problem. I am not asking for a direct answer, because this is homework. However, if anyone can shed some light on this, please do.
A few things I don't understand. What is the point of having base 100,000? How would that even work. I understand having a base for every letter of the alphabet, or every number 1-9. I just can't seem to wrap my head around this concept.
I'm sorry if I haven't been specific enough.
A number N in any base B is just a series of digits in the range [0, B-1]. Since we don't have enough symbols to represent all the digits in a "normal" human writing system, don't think about how it's written in characters. You'll just need to know that the digits are stored/written separately
For example 255 in base 177 is a 2-digit number in which the first digit has value 1 and the second digit has value 78 since 25510 = 1×1771 + 78×1770. If some culture uses this base they'll have 177 distinct symbols for the digits and they write it in only 2 digits. Since we only have 10 symbols we'll need to define some symbol to delimit the digits, which is often :. As you can see from Wolfram Alpha, 25510 = 1:78177
Note that not all people count in base 10. There exists cultures that count in base 4, 5, 6, 8, 12, 15, 16, 20, 24, 27, 32, 36, 60... so they'll have more or less symbols than most of us. However among the non-decimal bases, only base 20, 12 and 60 are most commonly used nowadays.
In base 100000 it's the same. 1234567890987654321 will be a 4-digit number written as symbols with value 1234, 56789, 9876, 54321 in order
I was about to explain it in a comment, but basically you're talking about what we sometimes call "modular arithmetic." Each digit is {0...n-1} and represents that times nk, where k is the position. 255 in decimal is 5×100 + 5×101 + 2×102.
So, your 255 base 177 is hard to represent, but there's a 1 in the 177s place (177×101) and 78 in the 1s (177×100) place.
As a general pseudocode algorithm, you want something like...
n = input value
digits = []
while n > 1
quotient = n / base (as an integer)
digits += quotient
remainder = n - quotient * base
n = remainder
And you might need to check the final remainder, in case something has gone wrong.
Of course, how you represent those digits is another story. MIME is contains semi-standard way for handling up through Base-64, for example.
If it was me, I'd just delimit the digits and make it clear that's the representation, but there's all of Unicode, if you want to mess around with hexadecimal-like extensions...

Bit Hack - Round off to multiple of 8

can anyone please explain how this works (asz + 7) & ~7; It rounds off asz to the next higher multiple of 8.
It is easy to see that ~7 produces 11111000 (8bit representation) and hence switches off the last 3 bits ,thus any number which is produced is a multiple of 8.
My question is how does adding asz to 7 before masking [edit] produce the next higher[end edit] multiple of 8 ? I tried writing it down on paper
like :
1 + 7 = 8 = 1|000 (& ~7) -> 1000
2 + 7 = 9 = 1|001 (& ~7) -> 1000
3 + 7 = 10 = 1|010 (& ~7) -> 1000
4 + 7 = 11 = 1|011 (& ~7) -> 1000
5 + 7 = 12 = 1|100 (& ~7) -> 1000
6 + 7 = 13 = 1|101 (& ~7) -> 1000
7 + 7 = 14 = 1|110 (& ~7) -> 1000
8 + 7 = 15 = 1|111 (& ~7) -> 1000
A pattern clearly seems to emerge which has been exploited .Can anyone please help me it out ?
Thank You all for the answers.It helped confirm what I was thinking. I continued the writing the pattern above and when I crossed 10 , i could clearly see that the nos are promoted to the next "block of 8" if I can say so.
Thanks again.
Well, if you were trying to round down, you wouldn't need the addition. Just doing the masking step would clear out the bottom bits and you'd get rounded to the next lower multiple.
If you want to round up, first you have to add enough to "get past" the next multiple of 8. Then the same masking step takes you back down to the multiple of 8. The reason you choose 7 is that it's the only number guaranteed to be "big enough" to get you from any number up past the next multiple of 8 without going up an extra multiple if your original number were already a multiple of 8.
In general, to round up to a power of two:
unsigned int roundTo(unsigned int value, unsigned int roundTo)
{
return (value + (roundTo - 1)) & ~(roundTo - 1);
}
It's actually adding 7 to the number and rounding down.
This has the desired effect of rounding up to the next multiple of 8. (Adding +8 instead of +7 would bump a value of 8 to 16.)
The +7 isn't to produce an exact multiple of 8, it's to make sure you get the next highest multiple of eight.
edit: Beaten by 16 seconds and several orders of quality. Oh well, back to lurking.
Well, the mask would produce an exact multiple of 8 by itself. Adding 7 to asz ensures that you get the next higher multiple.
Without the +7 it will be the biggest multiple of 8 less or equal to your orig number
Adding 7 does not produce a multiple of 8. The multiple of 8 is produced by anding with ~7. ~7 is the complement of 7, which is 0xffff fff8 (except using however many bits are in an int). This truncates, or rounds down.
Adding 7 before doing that insures that no value lower than asz is returned. You've already worked out how that works.
Uhh, you just answered your own question??? by adding 7, you are guaranteeing the result will be at or above the next multiple of 8. truncating then gives you that multiple.

Resources