What are the chances to generate a v4 UUID with all numbers? - uuid

When generating a v4 UUID, what are the chances that the UUID is all numbers and contains no letters? I've only seen a handful of these in the wild and so was wondering.

I'll assume it's a variant 1 UUID, since a variant 2 UUID will never be all-numbers (the 17th hex-digit can only be c or d).
For variant 1, version 4, the 13th hex-digit will always be 4 (which is good), the 17th will be 8, 9, a, or b with equal probability (1/2 chance of being numeric), and the remaining 30 hex-digits are completely random, which gives each one a 10/16, or 5/8 chance of being numeric.
Since we need everything to be numeric, the probabilities simply multiply: 1/2 * (5/8)^30. And to get "1 in N" odds we can invert that fraction: 2 * (8/5)^30 ≅ 2,658,456. So not astronomically rare, but rarer than one in a million.

Related

Integer compression method

How can I compress a row of integers into something shorter ?
Like:
Input: '1 2 4 5 3 5 2 3 1 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 5 3 5 2 3 1 2 3 4')
Note:Input will only contain numbers between 1-5 and the total string of number will be 10-16
Is there any way I can compress it to 3-5 numbers?
Here is one way. First, subtract one from each of your little numbers. For your example input that results in
0 1 3 4 2 4 1 2 0 1 2 3
Now treat that as the base-5 representation of an integer. (You can choose either most significant digit first or last.) Calculate the number in binary that means the same thing. Now you have a single integer that "compressed" your string of little numbers. Since you have shown no code of your own, I'll just stop here. You should be able to implement this easily.
Since you will have at most 16 little numbers, the maximum resulting value from that algorithm will be 5^16 which is 152,587,890,625. This fits into 38 bits. If you need to store smaller numbers than that, convert your resulting value into another, larger number base, such as 2^16 or 2^32. The former would result in 3 numbers, the latter in 2.
#SergGr points out in a comment that this method does not show the number of integers encoded. If that is not stored separately, that can be a problem, since the method does not distinguish between leading zeros and coded zeros. There are several ways to handle that, if you need the number of integers included in the compression. You could require the most significant digit to be 1 (first or last depends on where the most significant number is.) This increases the number of bits by one, so you now may need 39 bits.
Here is a toy example of variable length encoding. Assume we want to encode two strings: 1 2 3 and 1 2 3 0 0. How the results will be different? Let's consider two base-5 numbers 321 and 00321. They represent the same value but still let's convert them into base-2 preserving the padding.
1 + 2*5 + 3*5^2 = 86 dec = 1010110 bin
1 + 2*5 + 3*5^2 + 0*5^3 + 0*5^4 = 000001010110 bin
Those additional 0 in the second line mean that the biggest 5-digit base-5 number 44444 has a base-2 representation of 110000110100 so the binary representation of the number is padded to the same size.
Note that there is no need to pad the first line because the biggest 3-digit base-5 number 444 has a base-2 representation of 1111100 i.e. of the same length. For an initial string 3 2 1 some padding will be required in this case as well, so padding might be required even if the top digits are not 0.
Now lets add the most significant 1 to the binary representations and that will be our encoded values
1 2 3 => 11010110 binary = 214 dec
1 2 3 0 0 => 1000001010110 binary = 4182 dec
There are many ways to decode those values back. One of the simplest (but not the most efficient) is to first calculate the number of base-5 digits by calculating floor(log5(encoded)) and then remove the top bit and fill the digits one by one using mod 5 and divide by 5 operations.
Obviously such encoding of variable-length always adds exactly 1 bit of overhead.
Its call : polidatacompressor.js but license will be cost you, you have to ask author about prices LOL
https://github.com/polidatacompressor/polidatacompressor
Ncomp(65535) will output: 255, 255 and when you store this in database as bytes you got 2 char
another way is to use "Hexadecimal aka base16" in javascript (1231).toString(16) give you '4cf' in 60% situation it compress char by -1
Or use base10 to base64 https://github.com/base62/base62.js/
4131 --> 14D
413131 --> 1Jtp

Total Comparisons of Sorting Algorithms to Complete Sort

I want to find the total comparisons for sorting n elements in an array using different sorting algorithms. I don't want to do it manually (in case the number of elements in the array is considerably large). Is there a "formula" to calculate the comparisons for each of the sorting algorithms listed below if for example there is 8 elements in an array containing the following elements [3,24,66,34,8,-5,42,80]? How can I find the comparisons for each?
1) Merge Sort
For example, if I use Merge sort manually in order to find the total numbers of
comparisons for 8 elements, this is what I get:
3, 24, 66, 34, 8, -5, 42, 80
3, 24, 66, 34 8, -5, 42, 80
3, 24 66, 34 8, -5 42, 80
3 24 66 34 8 -5 42 80
3, 24 34, 66 -5, 8 42, 80
3, 24, 34, 66 -5, 8, 42, 80
-5, 3, 8, 24, 34, 42, 66, 80
Total number of comparisons needed to sort this array = 15
I would like to be able to do this using a formula, if possible, not manually.
2) Insertion sort
This is not an easy task, as it can depend on details of the algorithm implementation, and also is not a pure function of n.
Actually, what you get is a distribution of values of the number of comparisons, depending on the permutation of the input. Usually, one distinguishes the best case (least number of comparison), the worst case (largest number) and the average case (mathematical expectation when you assume the respective probabilities of the input permutations).
These numbers can be obtained by reasoning on the program, but this is usually a difficult task (even daunting for the average case), often solved with approximations.
Anyway, you can obtain it empirically by instrumenting your program: declare a counter variable, and increment it at the same time as a comparison is made.
I recommend you to do what follows as an exercise:
instrument the code as I said,
take the sequence of the n first integers;
generate all possible permutations of the input (there will be exactly n! possibilities - as long as n remains small, say n up to 10, this remains manageable, 10!=3628800) and run the algorithm on each;
(alternatively you can fill the array with random numbers and repeat many times);
accumulate the histogram of the number of comparisons (for every possible number of comparisons count how many permutations achieve it),
observe and compare the histograms of the different algorithms.
Even though n will remain modest, you will observe the best and worst cases, and with more care, the central trend and the spread. This should be instructive.
Using the same methodology, you can also observe the number of element displacements.
It is impossible, as the exact number depends on the input. This why you have optimistic complexity, pessimistic, and average sometimes also called expected. The prominent example is basic implementation of quick sort which has pessimistic complexity O(n^2). On the other hand optimistic case for bubble sort is O(n). More examples: http://en.wikipedia.org/wiki/Best,_worst_and_average_case#Sorting_algorithms.
The only thing you can do is to compute it per problem instance, for example by tapping into comparison function. Although, I am not sure if per-instance values are very meaningful.
Usually people do not make this kind of calculation. They are interested in evaluating the complexity of the algorithm, i.e., "asymptotically
How the number of comparisons increases with the size of the input"
For instance, merge sort grows (in average) with O(n log n). This means that the number of comparisons of merge sort is not worse than n log n where n is the size of the input. There are some methods to arrive to this expression, namely master theorem or tree method.
Actually, one can prove that no algorithm based only on comparisons cannot make less comparisons than n log n. This is the so-called comparison model! comparison algorithms
However, sorting can be done in linear time, depending on the type of your set, for instance using counting sort - a kind of histogram.

Different bases for radix sort in C

I am having a difficult time understanding radix sort. I have no problems implementing code to work with bases of 2 or 10. However, I have an assignment that requires a command line argument to specify the radix. The radix can be anywhere from 2 - 100,000. I have spent around 10 hours trying to understand this problem. I am not asking for a direct answer, because this is homework. However, if anyone can shed some light on this, please do.
A few things I don't understand. What is the point of having base 100,000? How would that even work. I understand having a base for every letter of the alphabet, or every number 1-9. I just can't seem to wrap my head around this concept.
I'm sorry if I haven't been specific enough.
A number N in any base B is just a series of digits in the range [0, B-1]. Since we don't have enough symbols to represent all the digits in a "normal" human writing system, don't think about how it's written in characters. You'll just need to know that the digits are stored/written separately
For example 255 in base 177 is a 2-digit number in which the first digit has value 1 and the second digit has value 78 since 25510 = 1×1771 + 78×1770. If some culture uses this base they'll have 177 distinct symbols for the digits and they write it in only 2 digits. Since we only have 10 symbols we'll need to define some symbol to delimit the digits, which is often :. As you can see from Wolfram Alpha, 25510 = 1:78177
Note that not all people count in base 10. There exists cultures that count in base 4, 5, 6, 8, 12, 15, 16, 20, 24, 27, 32, 36, 60... so they'll have more or less symbols than most of us. However among the non-decimal bases, only base 20, 12 and 60 are most commonly used nowadays.
In base 100000 it's the same. 1234567890987654321 will be a 4-digit number written as symbols with value 1234, 56789, 9876, 54321 in order
I was about to explain it in a comment, but basically you're talking about what we sometimes call "modular arithmetic." Each digit is {0...n-1} and represents that times nk, where k is the position. 255 in decimal is 5×100 + 5×101 + 2×102.
So, your 255 base 177 is hard to represent, but there's a 1 in the 177s place (177×101) and 78 in the 1s (177×100) place.
As a general pseudocode algorithm, you want something like...
n = input value
digits = []
while n > 1
quotient = n / base (as an integer)
digits += quotient
remainder = n - quotient * base
n = remainder
And you might need to check the final remainder, in case something has gone wrong.
Of course, how you represent those digits is another story. MIME is contains semi-standard way for handling up through Base-64, for example.
If it was me, I'd just delimit the digits and make it clear that's the representation, but there's all of Unicode, if you want to mess around with hexadecimal-like extensions...

Making Minimal Changes to Change Range of the Array

Consider having an array filled with elements a0,a1,a2,....,a(n-1).
Consider that this array is sorted already; it will be easier to describe the problem.
Now the range of the array is defined as the biggest element - smallest element.
Say this range is some value x.
Now the problem I have is that, I want to change the elements in such a way that the range becomes less than/equal to some target value y.
I also have the additional constraint that I want to change minimal amount for each element. Consider an element a(i) that has value z. If I change it by r amount, this costsr^2.
Thus, what is an efficient algorithm to update this array to make the range less than or equal to target range y that minimizes the cost.
An example:
Array = [ 0, 3, 19, 20, 23 ] Target range is 17.
I would make the new array [ 3, 3, 19, 20, 20 ] . The cost is (3)^2 + (3)^2 = 18.
This is the minimal cost.
If you are adding/removing to some certain element a(i), you must add/remove that quantity q all at once. You can not remove 3 times 1 unit from a certain element, but must remove a quantity of 3 units once.
I think you can build two heaps from the array - one min-heap, one max-heap. Now you will take the top elements of both heaps and peek at the ones right under them and compare the differences. The one that has the bigger difference you will take and if that difference is bigger than you need, you will just take the required size and add the cost.
Now, if you had to take the whole difference and didn't achieve your goal, you will need to repeat this step. However, if you once again choose from the same heap, you have to remember to add the cost for the element you are taking out of the heap in that steps AND also for those that have been taken out of the processed heap before.
This yields an O(N*logN) algorithm, I'm not sure if it can be done faster.
Example:
Array [2,5,10,12] , I want difference 4.
First heap has 2 on top, second one 12. the 2 is 3 far from 5 and 12 is 2 far from 10 so I take the min-heap and the two will have to be changed by 3. So now we have a new situation:
[5, 10, 12]
The 12 is 2 far from 10 and we take it, subtract 2 and get new situation:
[5,10]
Now we can choose any heap, both differences are the same (the same numbers :-) ). We just need to change by 1 so we get subtract 1 from 10 and get the right result. Now, because we changed 5 to 6 we would also have to change the number that was originally 12 once more to 9 so the resulting cost:
[2 - changed to 5, 5 - unchanged, 10 - changed to 9, 12 - changed to 9].
Here is a linear-time algorithm that minimizes the piecewise quadratic objective function. Probably it can be simplified.
Let the range be [x, x + y], where x is a variable. For different choices of x, there are at most 2n + 1 possibilities for which points lie in the range, arising from 2n critical values a0 - y, a1 - y, ..., a(n-1) - y, a0, a1, ..., a(n-1). One linear-time merge yields the critical values in sorted order. For each of the 2n - 1 intervals [w, z] between critical values where the range contains at least one point, we can construct and minimize a quadratic function consisting of a sum where every point aj less than w yields a term (x - aj)^2 and every point aj greater than z + y yields a term (x + y - aj)^2. The global minimum lies at the mean of aj (for terms of the first type) or aj - y (for terms of the second type); the endpoints of the interval must be checked as well. Naively, this gives a quadratic-time algorithm.
To get down to linear time, it suffices to update the sum preceding the mean computation incrementally. Each of the critical values has an associated event indicating whether the point responsible for it is entering or leaving the interval, meaning that that point's term should enter or leave the sum.

Find all possible row-wise sums in a 2D array

Ideally I'm looking for a c# solution, but any help on the algorithm will do.
I have a 2-dimension array (x,y). The max columns (max x) varies between 2 and 10 but can be determined before the array is actually populated. Max rows (y) is fixed at 5, but each column can have a varying number of values, something like:
1 2 3 4 5 6 7...10
A 1 1 7 9 1 1
B 2 2 5 2 2
C 3 3
D 4
E 5
I need to come up with the total of all possible row-wise sums for the purpose of looking for a specific total. That is, a row-wise total could be the cells A1 + B2 + A3 + B5 + D6 + A7 (any combination of one value from each column).
This process will be repeated several hundred times with different cell values each time, so I'm looking for a somewhat elegant solution (better than what I've been able to come with). Thanks for your help.
The Problem Size
Let's first consider the worst case:
You have 10 columns and 5 (full) rows per column. It should be clear that you will be able to get (with the appropriate number population for each place) up to 5^10 ≅ 10^6 different results (solution space).
For example, the following matrix will give you the worst case for 3 columns:
| 1 10 100 |
| 2 20 200 |
| 3 30 300 |
| 4 40 400 |
| 5 50 500 |
resulting in 5^3=125 different results. Each result is in the form {a1 a2 a3} with ai ∈ {1,5}
It's quite easy to show that such a matrix will always exist for any number n of columns.
Now, to get each numerical result, you will need to do n-1 sums, adding up to a problem size of O(n 5^n). So, that's the worst case and I think nothing can be done about it, because to know the possible results you NEED to effectively perform the sums.
More benign incarnations:
The problem complexity may be cut off in two ways:
Less numbers (i.e. not all columns are full)
Repeated results (i.e. several partial sums give the same result, and you can join them in one thread). Much more in this later.
Let's see a simplified example of the later with two rows:
| 7 6 100 |
| 3 4 200 |
| 1 2 200 |
at first sight you will need to do 2 3^3 sums. But that's not the real case. As you add up the first column you don't get the expected 9 different results, but only 6 ({13,11,9,7,5,3}).
So you don't have to carry your nine results up to the third column, but only 6.
Of course, that is on the expense of deleting the repeating numbers from the list. The "Removal of Repeated Integer Elements" was posted before in SO and I'll not repeat the discussion here, but just cite that doing a mergesort O(m log m) in the list size (m) will remove the duplicates. If you want something easier, a double loop O(m^2) will do.
Anyway, I'll not try to calculate the size of the (mean) problem in this way for several reasons. One of them is that the "m" in the sort merge is not the size of the problem, but the size of the vector of results after adding up any two columns, and that operation is repeated (n-1) times ... and I really don't want to do the math :(.
The other reason is that as I implemented the algorithm, we will be able to use some experimental results and save us from my surely leaking theoretical considerations.
The Algorithm
With what we said before, it is clear that we should optimize for the benign cases, as the worst case is a lost one.
For doing so, we need to use lists (or variable dim vectors, or whatever can emulate those) for the columns and do a merge after every column add.
The merge may be replaced by several other algorithms (such as an insertion on a BTree) without modifying the results.
So the algorithm (procedural pseudocode) is something like:
Set result_vector to Column 1
For column i in (2 to n-1)
Remove repeated integers in the result_vector
Add every element of result_vector to every element of column i+1
giving a new result vector
Next column
Remove repeated integers in the result_vector
Or as you asked for it, a recursive version may work as follows:
function genResVector(a:list, b:list): returns list
local c:list
{
Set c = CartesianProduct (a x b)
Set c = Sum up each element {a[i],b[j]} of c </code>
Drop repeated elements of c
Return(c)
}
function ResursiveAdd(a:matrix, i integer): returns list
{
genResVector[Column i from a, RecursiveAdd[a, i-1]];
}
function ResursiveAdd(a:matrix, i==0 integer): returns list={0}
Algorithm Implementation (Recursive)
I choose a functional language, I guess it's no big deal to translate to any procedural one.
Our program has two functions:
genResVector, which sums two lists giving all possible results with repeated elements removed, and
recursiveAdd, which recurses on the matrix columns adding up all of them.
recursiveAdd, which recurses on the matrix columns adding up all of them.
The code is:
genResVector[x__, y__] := (* Header: A function that takes two lists as input *)
Union[ (* remove duplicates from resulting list *)
Apply (* distribute the following function on the lists *)
[Plus, (* "Add" is the function to be distributed *)
Tuples[{x, y}],2] (*generate all combinations of the two lists *)];
recursiveAdd[t_, i_] := genResVector[t[[i]], recursiveAdd[t, i - 1]];
(* Recursive add function *)
recursiveAdd[t_, 0] := {0}; (* With its stop pit *)
Test
If we take your example list
| 1 1 7 9 1 1 |
| 2 2 5 2 2 |
| 3 3 |
| 4 |
| 5 |
And run the program the result is:
{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
The maximum and minimum are very easy to verify since they correspond to taking the Min or Max from each column.
Some interesting results
Let's consider what happens when the numbers on each position of the matrix is bounded. For that we will take a full (10 x 5 ) matrix and populate it with Random Integers.
In the extreme case where the integers are only zeros or ones, we may expect two things:
A very small result set
Fast execution, since there will be a lot of duplicate intermediate results
If we increase the Range of our Random Integers we may expect increasing result sets and execution times.
Experiment 1: 5x10 matrix populated with varying range random integers
It's clear enough that for a result set near the maximum result set size (5^10 ≅ 10^6 ) the Calculation time and the "Number of != results" have an asymptote. The fact that we see increasing functions just denote that we are still far from that point.
Morale: The smaller your elements are, the better chances you have to get it fast. This is because you are likely to have a lot of repetitions!
Note that our MAX calculation time is near 20 secs for the worst case tested
Experiment 2: Optimizations that aren't
Having a lot of memory available, we can calculate by brute force, not removing the repeated results.
The result is interesting ... 10.6 secs! ... Wait! What happened ? Our little "remove repeated integers" trick is eating up a lot of time, and when there are not a lot of results to remove there is no gain, but looses in trying to get rid of the repetitions.
But we may get a lot of benefits from the optimization when the Max numbers in the matrix are well under 5 10^5. Remember that I'm doing these tests with the 5x10 matrix fully loaded.
The Morale of this experiment is: The repeated integer removal algorithm is critical.
HTH!
PS: I have a few more experiments to post, if I get the time to edit them.

Resources