I have a number of very large length may be upto 50 digits. I am taking that as string input. However, I need to perform operations on it. So, I need to convert them to a proper base, lets say, 256.
What will be the best algorithm to do so?
Multiple-precision arithmetic (a.k.a. bignums) is a difficult subject, and the good algorithms are non intuitive (there are books about that).
There exist several libraries handling bignums, like e.g. the GMP library (and there are other ones). And most of them take profit from some hardware instructions (e.g. add with carry) with carefully tuned small chunks of assembler code. So they perform better than what you would be able to code in a couple of months.
I strongly recommend using existing bignum libraries. Writing your own would take you years of work, if you want it to be competitive.
See also answers to this question.
Related
I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.
I wrote two differents algorithms that resolve some particular case of strings matching (implemented in C). I know that the theoretical O of this algorithms are equals but I think that in practical, one is better than the oder.
My question is, someone could recommend me some paper or some reading where shows how to compare algorithms with a practical approach?
I have several test set, I'm interested in measure execute time and memory size. I need take this values as independently as possible of the operating system and others program that could be runing concurrently.
Thanks!!!
you could compare your algorithms by generating the assembly code and compare them.
You could generate the assembly code with the gcc -S mycode.c command
I find that "looking at the code" is a good start. If one uses more variables and is more complicated than the other, it is probably slower.
However, there are of course clever tricks that can make a more complicated function actually run faster (for example code that reads 8 bytes at a time - but of course, once you find a difference, the code is more complex - for long strings that are largely similar, there is a big win tho').
So, in the end, there is no substitute for actually running the code, using clock-cycle timing (RDTSC instruction on x86 processors, for example), or running a large loop to execute the code many times to give a reasonable length runtime.
If your code isn't supposed to run on a single embedded target, you probably want to run the code on a set of different hardware to determine if the code that is faster on processor A is also faster on B, C and D type processors. Often this does work, but sometimes you can find that a particular processor model is faster for SOME operations, and another is faster for another (for example based on cache-size, etc).
It would also be very important, in the case of string operations, to try with different size inputs, different points of difference (e.g. a long string, but different "early", vs. long string with difference "late"). Sometimes, the different approaches will show different results for short/long strings or early/late point of difference (and of course "equal" strings that are long or short).
In order to complete all comments, I found a book called "A guide to experimental algorithmics" by Catherine C. Mcgeoch Amazon and a profesor recommend me a practical paper pdf.
What would be the best programming language for very large arrays and very large numbers?
With arrays over 30,000 indexes
And numbers over 100 digits
Also it needs to be efficient, or easy to make efficient.
Thanks.
Almost any programming language worth its salt should have these characteristics, and frankly I don't think I'd want to use any language that can't handle arrays of 30,000 elements. I'll list a few that have good support for very large numbers:
python. Python 3 has automatic support for large numbers as the default number type grows as necessary, and has some really awesome math libraries. Other languages may be ever so slightly faster, but unless for some reason you know for sure that python won't be good enough I'd start there.
C#. This will mostly bind you to windows, but its very popular, fast, and meets your requirements.
Java. Cross platform, mature support with BigInteger.
Haskell. Pretty seamless conversions to large numbers and powerful math support. If you have a strong mathematics background Haskell will feel pretty natural. If you already know functional programming or don't mind devoting a fascinating few hours to learning it, this is a good choice.
C/C++. Very fast, but a little more complex to develop in. You'll probably get better results in large number support with something else. I'd only look into C++ if you've tried optimizing code in another languages and its still not fast enough, unless you have a specific reason to not use an intermediately compiled language.
The truth of the matter is that its hard to find a programming language that doesn't support these things, and if you could I probably wouldn't use it for anything because its probably not that mature. Do you have any other requirements that would help us narrow it down further for you? :D
The array is not the issue. Numbers consisting of 100 numerals (digits) is a huge issue. I don't have a good answer to the question (out of date as it is) but as this comes up readily in Google I'll mention that most languages only support between 32 to 64 bit numbers.
(I know that the C family of languages, PHP, as3 and Java don't support massive numbers.)
For example a 32 bit number would allow a range of 0 to 4,294,967,295 (2^32-1) which is only 10 numerals (Actually more like 9 because the limit is by size, not numerals), a whole order of magnitude less than the required 100 digits the questioner was after.
That said I know that there are cases of people implementing support for large numbers in C and AS3...
Python with NumPy is probably what you want.
I always found Fortran to be quite nice when dealing with arrays, esp. with multi-dimensional ones. If you are dealing with very large numbers, you will probably need to define your own data type or live with a loss of precision, though. Or use this: http://www.fortran.com/big_integer_module.f95 .
But it depends a bit on what you want to do. Fortran is nice for numerical computations, and not so nice for about everything else.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm doing some Project Euler problems and most of the time, the computations involve large numbers beyond int, float, double etc.
Firstly, I know that I should be looking for more efficient ways of calculation so as to avoid the large number problem. I've heard of the Bignum libraries.
But, for academics interests, I'd like to know how to code my own solution to this problem.
Can any expert please help me out? (My language is C)
You need to store the big numbers in a base that your computer can easily handle with its native types, and then store the digits in a variable length array. I'd suggest that for simplicity you start by storing the numbers in base 10 just to get the hang of how to do this. It will make debugging a lot easier.
Once you have a class that can store the numbers in this form, it's just a matter of implementing the operations add, subtract, multiply, etc. on this class. Each operation will have to iterate over digits of its operands and combine them, being careful to carry correctly so that your digits are never larger than the base. Addition and subtraction are simple. Multiplication requires a bit more work as the naive algorithm requires nested loops. Then once you have that working, you can try implementing exponentiation in an efficient manner (e.g. repeated squaring).
If you are planning to write a serious bignum implementation, base 10 won't cut it. It's wasteful of memory and it will be slow. You should choose a base that is natural for the computer, such as 256 or the word size (2**32). However this will make simple operations more difficult as you will get overflows if you naively add two digits, so you will need to handle that very carefully.
C is not a good choice for Project Euler. The benefits of C are raw speed, machine portability (to an extent, with standard C), language interoperability (if some language communicates with another, C is a popular first choice), sticking close to a specific library or platform's API (because C is common, e.g. OS API), and a stable language & stdlib. None of these benefits apply to solving Project Euler problems. Not even raw speed, because most of the problems aren't about raw computation, but understanding the algorithm required, and you can sit there all day and wait before submission.
If you are attempting Project Euler problems to broaden your experience with C, that's perfectly fine, just realize this experience doesn't necessarily apply to long-lived and real-world C projects you may work on.
For this kind of short, one-off problem those languages commonly described as "scripting languages" will work better, faster (in dev time), and easier. Try Python, it stays close to C in many ways, including a C API, and out of the various popular "scripting languages" is possibly the one for which you will find the most use in conjunction with C projects.
This may become an unpopular answer, but it isn't a rant—plus I really like C and use C/C++ often—and there is an explicit answer here to your problem: "don't use C", with your final large number solution depending on which alternative you choose. Again picking on Python, integers do not have an upper bound (note below), and I use this to naturally code answers to Project Euler problems, where in other languages I have to use a painful-by-comparison alternative number library.
(Python integers: There are two integer types in 2.x, 'int' and 'long' (which have been completely unified in 3.x). The conversion between them is practically seamless, and 'long' allows arbitrarily large values, instead of just being a bigger 'int' type as C's long is.)
A popular bignum library for C/C++ is the GNU MP Bignum Library. I've used it for several Project Euler problems, but fact remains that C isn't a very suitable language for Euler-problems. If performance was more important C would have more to give, but now you're much better off using a language which built in bignum support, such as Ruby (there are lots of others).
A simple way is to think of the number as its string representation in base b. Suppose b=10, simple arithmetic operation like addition on two such strings can be done using the same method we use when adding numbers by pen and paper. The same goes for other simple operations. For better results, you can take a larger base.
A simple bignum implementation like that should be enough for most Project Euler problems (probably all, but I haven't solved much at Euler so can't be sure), but there are ways of using much faster algorithms for operations such as multiplication and division/mod.
Although I recommend writing your own bignum for practice, if you are really stuck you can take ideas from the code of already implemented bigint libraries. For a serious implementation something like gmp is the obvious choice. But you cana also find small bigints coded by other people when solving similar practice problem online (e.g. Abednego's bigint.cpp).
Here's a nice and simple bignum module for C. You can learn from it for ideas. The C code isn't the highest quality, but the algorithm is well implemented and quite common.
For more advanced stuff, look up GMP.
If you want a nice C++ version (I know, you said C, but this is really interesting code), take a look at the internals of CGAL: http://www.cgal.org/
i am building a compiler similar to c , but i want it to parse integers bigger than 2^32 . hows it possible?how has been big integers been implemented in python and ruby like languages ..!!
There are libraries to do this sort of thing.
Check out gmplib.
There are lots of big number libraries, see this wikipedia article for a complete list.
GMP(GNU Multiple Precision Arithmetic Library) is sufficient for everything I have encountered. NTL is more of the same but is object orientated.
Generally these libraries represent the numbers with arrays with each digit of a number as a character if you want to roll your own but it is a lot of work.
If you want to write it yourself, follow my trip through memory lane ;-).
In the old days, when computers used 8 bits. We often needed to calculate with big numbers (like > 255). And we all had to write the routines. For example the addition.
If we needed to add numbers of two bytes to each other we used the following algorithm:
Add the least significant bytes.
If the result exceeded 8 bits, the carry bit was set.
Add the most significant bytes and the carry flag (if set).
If the result exceeded 8 bits you produced an overflow error (but you don't need to do this if you want more that 2 bytes.
You can extend this to more bytes/words/dwords/qwords and to other operators.
I believe you'll need some sort of bigint library, which are available on the net, just do a bit of searching and you may find one that's suitable for your project.
Because, simply parsing the integers, I believe, will not be enough. Your users will want not only to store, but also, probably, perform operation with such numbers.
There is a slide by Felix von Leitner that covers some bignum basics. Personally i think it is quite informative and technical.
C++ Big Integer Library from Matt McCutchen
https://mattmccutchen.net/bigint/
C++ source code only. Very simple to use.
You would have to use some sort of struct in c to achieve this. You will find this is difficult if you are on and x86 platform and not x64 as well. If you're on x86, prepare to get very familiar with assembly and the carry flag.
Good luck!