I'm implementing karatsuba's method as part of an exercise. Karatsuba's method itself isn't terribly difficult, but one part of it is confusing me. Both numbers being multiplied have to be split into two halves, the high and the low bits. But I can't find much information about how this split is done.
I noticed most Karatsuba implementations use strings to represent huge numbers, but I'm doing something a bit different. I'm representing them as an array of ints, where each element is the next 30 bits of the huge number. Note that this means these arrays may be odd-length. If the huge number's size is not a multiple of 30, it gets leading zeros so it can still be represented as such.
So how can this be split into high and low halves? The main problem I'm running into is that since it can be odd-length, that means I can't just divide the arrays by their elements. Basically, how can I select the first and last bit halves of these int arrays so I can continue recursing in Karatsuba's method?
As long as I can retrieve the bits, I can create two smaller int arrays from them.
Related
Suppose I have an array of strings of different lengths.
It can be assumed that the strings have no repeating characters.
Using a brute-force algorithm, I can find the pair of strings that have the most number of identical letters (order does not matter - for example, "ABCDZFW" and "FBZ" have 3 identical letters) in n-squared time.
Is there a more efficient way to do this?
Attempt: I've tried to think of a solution using the trie data structure, but this won't work since a trie would only group together strings with similar prefixes.
I can find the pair of strings that have the most number of identical
letters (order does not matter - for example, "ABCDZFW" and "FBZ" have
3 identical letters) in n-squared time.
I think you can't as string comparison itself is O(max(length(s1), length(s2))) along with the O(n^2) loop for checking all pairs. However you can optimize the comparison of strings in some extent.
As you mentioned the strings don't have duplicates and I am assuming the strings consist of only uppercase letters according to your input. So, it turns into each string can be only 26 characters long.
For each string, we can use a bitmask. And for each character of a string, we can set the corresponding bit 1. For example:
ABCGH
11000111 (from LSB to MSB)
Thus, we have n bit-masks for n strings.
Way #1
Now you can check all possible pairs of strings using O(n^2) loop and compare the string by ANDing two corresponding mask and check the number of set bits (hamming weight). Obviously this is an improvement of your version because the string comparison is optimized now - Only an AND operation between two 32 bit integer which is a O(1) operation.
For example for any two strings comparison will be:
ABCDG
ABCEF
X1 = mask(ABCDG) => 1001111
X2 = mask(ABCEF) => 0110111
X1 AND X2 => 0000111
hamming weight(0000111) => 3 // number of set bits
Way #2
Now, one observation is the AND of same type bit is 1. So for every masks, we will try to maximize the Hamming weight (total number of set bits) of AND value of two string's masks as the string with most matched characters have same bit 1 and ANDing these two masks will make those bits 1.
Now build a Trie with all masks - every node of the trie will hold 0 or 1 based on the corresponding bit is set or not. Insert each mask from MSB ot LSB. Before inserting ith mask into Trie(already holding i - 1 masks), we will query to try maximizing the Hamming weight of AND recusively by going to same bit's branch (to make the bit 1 in final AND variable) and also to opposite bit's branch because in later levels you might get more set bits in this branch.
Regarding this Trie part, for nice pictorial explanation, you can find a similar thread here (this works with XOR).
Here in worst case, we will need to traverse many branches of trie for maximizing the hamming weight. And in worst case it will take around 6 * 10^6 operations (which will take ~1 sec in typical machine) and also we need additional space for building trie. But say the total number of strings is 10^5, then for O(n^2) algorithms, it will take 10^10 operations which is too much - so the trie approach is still far better.
Let me know if you're having problem with implementation. Unfortunately I can able to help you with code only if you're a C/C++ or Java guy.
Thanks #JimMischel for pointing out a major flaw. I slightly misunderstood the statement first.
If I want to combine two numbers (Int,Long,...) n1,n2in a non-commutative way, p*n1 + n2 where p is an arbitrary prime seems reasonable enough a choice.
As many hashing options return a byte array, though, I am now trying to substitute the numbers with byte arrays.
Assume a,b:Array[Byte] are of the same length.
+ simply becomes an xor
but what should I use as a "Multiplication"?
p:Long a(n arbitrary) prime, a:Array[Byte] of arbitrary length
I could, of course, convert a to a long, multiply, then convert the result back to an Array of Bytes. The problem with that is that I will need "p*a" to be of the same length as a for the subsequent xor to make sense. I could circumvent this by zero-extending the shorter of the two byte arrays, but then the byte arrays quickly grow in length.
I could, on the other hand, convert p to a byte array and xor it with a. Here, the issue is that then (p*(p*a+b)+c) becomes (a+b+c), which is commutative, which we don't want.
I could add p to every byte in the array (throwing away the overflow).
I could add p to every byte in the array (not throwing away the overflow).
I could circular shift a by some f(p) bits (and hope it doesn't end up becoming a again)
And I could think of a lot more nonsense. But what should I do? What actually makes sense?
If you want to mimic the original ideal of multiplying by a prime, the obvious generalization is to do arithmetic in the Galois field GF(2^8) - see https://en.wikipedia.org/wiki/Finite_field_arithmetic and note that you can essentially use log and antilog tables of size 256 to replace multiplication with not much more than table lookup - https://en.wikipedia.org/wiki/Finite_field_arithmetic#Implementation_tricks. Arithmetic over a finite field of any sort will have many of the nice properties of arithmetic modulo a prime - arithmetic modulo p is GP(p) or GF(p^1), if you prefer.
However this is all rather untried and perhaps a little high-flown. Other options include checksum algorithms such as https://en.wikipedia.org/wiki/Adler-32 or - if you already have a hash algorithm that maps long strings into a short array of bytes, simply concatenating the two arrays of bytes to be combined and running the result through the hash algorithm again, perhaps with some padding before and after to give you some parameters you can play with if you need to vary or tune things.
tl;dr: What is the fastest way to sort an uint8x16_t?
I need to sort many arrays of exactly 16 unsigned bytes (in descending order, which doesn't matter, of course), and i'm trying to optimize sorting by means of ARM NEON vectorization.
And i find it to be quite a fancy puzzle, as it seems that there "must" exist a short combination of NEON instructions (such as vmax/vpmax/vmin/vpmin, vzip/vuzp) that reliably results in a sorted array.
For example, if we transform a pair (A, B) of two 8-byte arrays into (vpmax(A,B), vpmin(A,B)), we obtain same 16 values, just in different order. If we repeat this operation four times, we reliably have the array maximum in the first cell and the array minimum in the last cell; we cannot be sure about the middle elements though.
Another example: if we first do (C,D)=(vmax(A,B),vmin(A,B)), then we do (E,F)=(vpmax(C,D),vpmin(C,D)), then we do (G,H)=vzip(E,F), then we get our array split into four parts of four bytes, in each part we already know the largest element and the smallest element. Probably the next naive step would be to deinterleave this array to have top four bytes at start of the array (which won't necessary be the top 4 elements of the array, just top bytes of their respective groups) and repeat, not yet sure where it leads at the end.
Is there any known method for this particular problem or for other similar problems (for different array sizes or whatever)? Any ideas are appreciated :)
I'm writing a utility to calculate π to a million digits after the decimal. On a 32- or 64-bit consumer desktop system, what is the most efficient way to store and work with such a large number accurate to the millionth digit?
clarification: The language would be C.
Forget floating point, you need bit strings that represent integers
This takes a bit less than 1/2 megabyte per number. "Efficient" can mean a number of things. Space-efficient? Time-efficient? Easy-to-program with?
Your question is tagged floating-point, but I'm quite sure you do not want floating point at all. The entire idea of floating point is that our data is only known to a few significant figures and even the famous constants of physics and chemistry are known precisely to only a handful or two of digits. So there it makes sense to keep a reasonable number of digits and then simply record the exponent.
But your task is quite different. You must account for every single bit. Given that, no floating point or decimal arithmetic package is going to work unless it's a template you can arbitrarily size, and then the exponent will be useless. So you may as well use integers.
What you really really need is a string of bits. This is simply an array of convenient types. I suggest <stdint.h> and simply using uint32_t[125000] (or 64) to get started. This actually might be a great use of the more obscure constants from that header that pick out bit sizes that are fast on a given platform.
To be more specific we would need to know more about your goals. Is this for practice in a specific language? For some investigation into number theory? If the latter, why not just use a language that already supports Bignum's, like Ruby?
Then the storage is someone else's problem. But, if what you really want to do is implement a big number package, then I might suggest using bcd (4-bit) strings or even ordinary ascii 8-bit strings with printable digits, simply because things will be easier to write and debug and maximum space and time efficiency may not matter so much.
I'd recommend storing it as an array of short ints, one per digit, and then carefully write utility classes to add and subtract portions of the number. You'll end up moving from this array of ints to floats and back, but you need a 'perfect' way of storing the number - so use its exact representation. This isn't the most efficient way in terms of space, but a million ints isn't very big.
It's all in the way you use the representation. Decide how you're going to 'work with' this number, and write some good utility functions.
If you're willing to tolerate computing pi in hex instead of decimal, there's a very cute algorithm that allows you to compute a given hexadecimal digit without knowing the previous digits. This means, by extension, that you don't need to store (or be able to do computation with) million digit numbers.
Of course, if you want to get the nth decimal digit, you will need to know all of the hex digits up to that precision in order to do the base conversion, so depending on your needs, this may not save you much (if anything) in the end.
Unless you're writing this purely for fun and/or learning, I'd recommend using a library such as GNU Multiprecision. Look into the mpf_t data type and its associated functions for storing arbitrary-precision floating-point numbers.
If you are just doing this for fun/learning, then represent numbers as an array of chars, which each array element storing one decimal digit. You'll have to implement long addition, long multiplication, etc.
Try PARI/GP, see wikipedia.
You could store its decimals digits as text in a file and mmap it to an array.
i once worked on an application that used really large numbers (but didnt need good precision). What we did was store the numbers as logarithms since you can store a pretty big number as a log10 within an int.
Think along this lines before resorting to bit stuffing or some complex bit representations.
I am not too good with complex math, but i reckon there are solutions which are elegant when storing numbers with millions of bits of precision.
IMO, any programmer of arbitrary precision arithmetics needs understanding of base conversion. This solves anyway two problems: being able to calculate pi in hex digits and converting the stuff to decimal representation and as well finding the optimal container.
The dominant constraint is the number of correct bits in the multiplication instruction.
In Javascript one has always 53-bits of accuracy, meaning that a Uint32Array with numbers having max 26 bits can be processed natively. (waste of 6 bits per word).
In 32-bit architecture with C/C++ one can easily get A*B mod 2^32, suggesting basic element of 16 bits. (Those can be parallelized in many SIMD architectures starting from MMX). Also each 16-bit result can contain 4-digit decimal numbers (wasting about 2.5 bits) per word.
I have a big number (integer, unsigned) stored in 2 variables (as you can see, the high and low part of number):
unsigned long long int high;
unsigned long long int low;
I know how to add or subtract some other that-kind of variable.
But I need to divide that-kind of numbers. How to do it? I know, I can subtract N times, but, maybe, there are more better solutions. ;-)
Language: C
Yes. It will involve shifts, and I don't recommend doing that in C. This is one of those rare examples where assembler can still prove its value, easily making things run hundreds of times faster (And I don't think I'm exaggerating this.)
I don't claim total correctness, but the following should get you going :
(1) Initialize result to zero.
(2) Shift divisor as many bits as possible to the left, without letting it become greater than the dividend.
(3) Subtract shifted divisor from dividend and add one to result.
(4) Now shift divisor to the right until once again, it is less than the remaining dividend, and for each right-shift, left-shift result by one bit. Go back to (3) unless stopping condition is satisfied. (Stopping condition must be something like "divisor has become zero", but I'm not certain about that.)
It really feels great to get back to some REAL programming problems :-)
Have you looked at any large-number libraries, such as GNU MP BigNum?
I know, I can subtract N times, but, maybe, there are more better solutions.
Subtracting N times may be slow when N is large.
Better (i.e. more complicated but faster) would be shift-and-subtract, using the algorithm you learned to do long division of decimal numbers in elementary school.
[There may also be 3rd-party library and/or compiler-specific support for such numbers.]
Hmm. I suppose if you have some headroom in "high", you could shift it all up one digit, divide high by the number, then add the remainder to the top remaining digit in low and divide low by the number, then shift everything back.
Here's another library doing 128 bit arithmetic. GnuCash: Math128.
Per my commenters below, my previous answer was stupid.
Quickly, my new answer would be that when I've tried to do this in the past, it almost always involved shifting, because it's the only operation that can be applied across multiple "words", if you will, and have it look the same as if it were one large word (with the exception of having to track carryover bits).
There are a couple different approaches to it, but I don't know of any better general direction than using shifts, unless your hardware has some special operations.
You could implement a "BigInt" type algorithm that does divisions on string arrays. Create 1 string array for each high,low pair and do the division. Store the result in another string array, then convert back to high,low integer pair.
Since the language is C, the array would probably be a character array. Consider it analogous to the "string array" I was mentioning above.
You can do addition and subtraction of arbitrarily large binary objects using the assembler looping and "add/subtract with carry (adc/sbb)" instructions. You can implement the other operations using them. I've never investigated doing anything beyond those two personally.
If your processor (or your C library) has a fast 64-bit divide, you can break the 128-bit divide into pieces (the same way you'd do a 32-bit divide on processors that had 16-bit divisions).
By the way, there are all sorts of tricks you can use if you know what typical values will be for the dividend and divisor. What is the source of these numbers? If a lot of your cases can be solved quickly, it might be OK the occasional case takes a long time.
Also, if you can find cases where an approximate answer is OK, that opens the door to a lot of speedy approximations.