How to convert very large string to number in C?

How to convert very large string to number in C? - c

I am doing my 3x program and developing 3x I found this article about handling big numbers but final result is string and with string I can't do math operations. I use gcc compiler.
Also this program is not meant to solve problem, I created just to test performance.

In fact, you can't. C supports integers of limited length (64 bits, about 20 digits), and floats (about 15 significant digits), not larger.
So there is no better way than using a representation in a large radix (power of 2 or power of 10) and implementing the operations based on this representation. E.g. addition can be done digit by digit, with occasional carries.

Related

Specify Very large numbers use in C

I'm thinking of using big large integers one way i can think is using Gmplib, i worked with small examples, but can it work with numbers like 2 ^ (2 ^ (2 ^ 1024)) ??
My question is how to represent that big number because (not sure) calculators too might get an overflow.

I'm thinking of using big large integers one way i can think is using Gmplib, i worked with small examples, but can it work with numbers like 2 ^ (2 ^ (2 ^ 1024)) ??
No. GMP has two operating modes: large integers and large floating-point numbers. The first one can only operate on numbers whose integer value can be fully represented in memory; the second is limited to exponents that can be represented within about 64 bits. The number you're describing does not fit within either of those limits. (The exponent alone is too large to fit into memory!)
My approach : I'll try to reduce noise by storing them as binary numbers / bitvectors because it'll give me get away with > one 2^ step.
It's not entirely clear what you're trying to say here or in the following paragraph, but what you're describing sounds like a typical multiprecision integer implementation. It's no different from what GMP does to store large integers, and it won't work for this application.
Numbers of the scale you're describing are not easy to work with. Whether you find a library to work with them or write one yourself, it'll likely need to be designed specifically for the purpose of operating on numbers with this particular structure. They're simply too large to do anything else with.

Handling Decimals on Embedded C

I have my code below and I want to ask what's the best way in solving numbers (division, multiplication, logarithm, exponents) up to 4 decimals places? I'm using PIC16F1789 as my device.
float sensorValue;
float sensorAverage;
void main(){
//Get an average data by testing 100 times
for(int x = 0; x < 100; x++){
// Get the total sum of all 100 data
sensorValue = (sensorValue + ADC_GetConversion(SENSOR));
}
// Get the average
sensorAverage = sensorValue/100.0;
}

In general, on MCUs, floating point types are more costly (clocks, code) to process than integer types. While this is often true for devices which have a hardware floating point unit, it becomes a vital information on devices without, like the PIC16/18 controllers. These have to emulate all floating point operations in software. This can easily cost >100 clock cycles per addition (much more for multiplication) and bloats the code.
So, best is to avoid float (not to speak of double on such systems.
For your example, the ADC returns an integer type anyway, so the summation can be done purely with integer types. You just have to make sure the summand does not overflow, so it has to hold ~100 * for your code.
Finally, to calculate the average, you can either divide the integer by the number of iterations (round to zero), or - better - apply a simple "round to nearest" by:
#define NUMBER_OF_ITERATIONS 100
sensorAverage = (sensorValue + NUMBER_OF_ITERATIONS / 2) / NUMBER_OF_ITERATIONS;
If you really want to speed up your code, set NUMBER_OF_ITERATIONS to a power of two (64 or 128 here), if your code can tolerate this.
Finally: To get not only the integer part of the division, you can treat the sum (sensoreValue) as a fractional value. For the given 100 iterations, you can treat it as decimal fraction: when converting to a string, just print a decimal point left of the lower 2 digits. As you divide by 100, there will be no more than two significal digits of decimal fraction. If you really need 4 digits, e.g. for other operations, you can multiply the sum by 100 (actually, it is 10000, but you already have multipiled it by 100 by the loop).
This is called decimal fixed point. Faster for processing (replaces multiplication by shifts) would be to use binary fixed point, as I stated above.
On PIC16, I would strongly suggest to think of using binary fraction as multiplication and division are very costly on this platform. In general, they are not well suited for signal processing. If you need to sustain some performance, an ARM Cortex-M0 or M4 would be the far better choice - at similar prices.

In your example it is trivial to avoid non-integer representations altogether, however to answer your question more generally an ISO compliant compiler will support floating point arithmetic and the library, but for performance and code size reasons you may want to avoid that.
Fixed-point arithmetic is what you probably need. For simple calculations an ad-hoc approach to fixed point can be used whereby for example you treat the units of sensorAverage in your example as hundredths (1/100), and avoid the expensive division altogether. However if you want to perform full maths library operations, then a better approach is to use a fixed-point library. One such library is presented in Optimizing Applications with Fixed-Point Arithmetic by Anthony Williams. The code is C++ and PIC16 may lack a decent C++ compiler, but the methods can be ported somewhat less elegantly to C. It also uses a huge 64bit fixed-point 36Q28 format, which would be expensive and slow on PIC16; you might want to adapt it to use 16Q16 perhaps.

If you are really concerned about performance, stick to integer arithmetics, try to make the number of samples to average a power of two so the division can be made by means of bit shifts, however if it is not a power of two lets say 100 (as Olaf point out for fixed point) you can also use bit shifts and additions: How can I multiply and divide using only bit shifting and adding?
If you are not concerned about performace and still want to work with floats (you already got warned this may not be very fast in a PIC16 and may use a lot of flash), math.h has the following functions: http://en.cppreference.com/w/c/numeric/math including exponeciation: pow(base,exp) and logarithms* only base 2, base 10 and base e, for arbitrary base use the change of base logarithmic property

What is a convenient base for a bignum library & primality testing algorithm?

I am to program the Solovay-Strassen primality test presented in the original paper on RSA.
Additionally I will need to write a small bignum library, and so when searching for a convenient representation for bignum I came across this specification:
struct {
int sign;
int size;
int *tab;
} bignum;
I will also be writing a multiplication routine using the Karatsuba method.
So, for my question:
What base would be convenient to store integer data in the bignum struct?
Note: I am not allowed to use third party or built-in implementations for bignum such as GMP.
Thank you.

A power of 2.
For a simple implementation, probably half the size of a word on your machine, so that you can multiply two digits without overflow. So 65536 or 4294967296. Or possibly half the size of the largest integer type, for the same reason but maybe better performance over all.
But I've never actually implemented such a library: if you're using best known algorithms then you won't be doing school-style long multiplication. Karatsuba multiplication (and whatever other clever tricks you use) might benefit from being done in an integer that's more than twice the size of the digits, I really don't know how the performance works out. If so, then you'd be best off using 256 and 32 bit arithmetic, or 65536 and 64 bit arithmetic.
In any case if your representation is binary, then you can pick and choose larger power-of-two bases as convenient for each operation. For instance, you could treat the data as base 2^16 for multiplication, but base 2^32 for addition. It's all the same thing provided you're careful about endian-ness. I'd probably start with base 2^16 (since that forces me to get the endian-ness right to begin with, while 2^8 wouldn't), and see how I get on - as each operation is optimised, part of the optimisation is to identify the best base.
Using a size which isn't a multiple of bytes is a possibility, but then you have to use the same base for everything, because there are unused bits in the storage in specific places according to the base.

You will be doing the following operation a whole lot:
ab+cd...;
Either choose 1/4 the largest word size, or 1/2 the largest word size less a bit or two. That would be either 2^16 or 2^30 for 64 bit systems and 2^8 or 2^14 for 32 bit systems. Use the largest size the compiler supports, not the hardware.
If you choose 2^31 on a 64 bit system, that means you can add 4 products without overflow. If you choose 2^30 then you can add 16 products without overflow. The more you can add without overflow, the larger interim blocks you can use.
If you choose 1/4 the word size you will still have a native type so it will be easier to store results back out. You can pretty much ignore overflow too. This will basically make writing code faster and less error prone, and is slightly more memory efficient. I would recommend this unless you like lots of bit manipulation along with your math.
Choosing a larger base will make the big O numbers look better. In practice, while it would probably be faster to have a larger base, it will not be the 4x speed bump that you might hope for.

The base you use should be a power of 2. Since it looks like you're going to keep track of sign separately, you can use unsigned ints for storing the numbers themselves. You're going to need the ability to multiply 2 pieces/digits/units of these numbers at a time, so the size must be no more than half the word size you've got available. i.e. on x86 an unsigned int is 32 bits, so you'd want your digits to be not more than 16 bits. You may also use "long long" for the intermediate results of products of unsigned ints. Then you're looking at 2^32 for your base. One last thing to consider is that you may want to add sums of products, which will overflow unless you use fewer bits.
If performance is not a major concern, I'd just use base 256 and call it a day. You may want to use typedefs and defined constants so you can later change these parameters easily.

The integers in the tab array should be unsigned. They should be largest possible size (base) that you can multiply and still represent the product. If your compiler/processor supports 64 bit unsigned long long, for example, you might use uint32_t for the array of "digits." If your compiler/processor can only natively produce 32 bit products, you should use uint16_t.
When you sum two arrays you will need to deal with overflow; in assembly this is easy. In C you may opt to use one less bit (31 or 15) to make the overflow detection easier.
Also consider endianess, and the effect it and the algorithm will have on cache behavior.

What is the most efficient way to store and work with a floating point number with 1,000,000 significant digits in C?

I'm writing a utility to calculate π to a million digits after the decimal. On a 32- or 64-bit consumer desktop system, what is the most efficient way to store and work with such a large number accurate to the millionth digit?
clarification: The language would be C.

Forget floating point, you need bit strings that represent integers
This takes a bit less than 1/2 megabyte per number. "Efficient" can mean a number of things. Space-efficient? Time-efficient? Easy-to-program with?
Your question is tagged floating-point, but I'm quite sure you do not want floating point at all. The entire idea of floating point is that our data is only known to a few significant figures and even the famous constants of physics and chemistry are known precisely to only a handful or two of digits. So there it makes sense to keep a reasonable number of digits and then simply record the exponent.
But your task is quite different. You must account for every single bit. Given that, no floating point or decimal arithmetic package is going to work unless it's a template you can arbitrarily size, and then the exponent will be useless. So you may as well use integers.
What you really really need is a string of bits. This is simply an array of convenient types. I suggest <stdint.h> and simply using uint32_t[125000] (or 64) to get started. This actually might be a great use of the more obscure constants from that header that pick out bit sizes that are fast on a given platform.
To be more specific we would need to know more about your goals. Is this for practice in a specific language? For some investigation into number theory? If the latter, why not just use a language that already supports Bignum's, like Ruby?
Then the storage is someone else's problem. But, if what you really want to do is implement a big number package, then I might suggest using bcd (4-bit) strings or even ordinary ascii 8-bit strings with printable digits, simply because things will be easier to write and debug and maximum space and time efficiency may not matter so much.

I'd recommend storing it as an array of short ints, one per digit, and then carefully write utility classes to add and subtract portions of the number. You'll end up moving from this array of ints to floats and back, but you need a 'perfect' way of storing the number - so use its exact representation. This isn't the most efficient way in terms of space, but a million ints isn't very big.
It's all in the way you use the representation. Decide how you're going to 'work with' this number, and write some good utility functions.

If you're willing to tolerate computing pi in hex instead of decimal, there's a very cute algorithm that allows you to compute a given hexadecimal digit without knowing the previous digits. This means, by extension, that you don't need to store (or be able to do computation with) million digit numbers.
Of course, if you want to get the nth decimal digit, you will need to know all of the hex digits up to that precision in order to do the base conversion, so depending on your needs, this may not save you much (if anything) in the end.

Unless you're writing this purely for fun and/or learning, I'd recommend using a library such as GNU Multiprecision. Look into the mpf_t data type and its associated functions for storing arbitrary-precision floating-point numbers.
If you are just doing this for fun/learning, then represent numbers as an array of chars, which each array element storing one decimal digit. You'll have to implement long addition, long multiplication, etc.

Try PARI/GP, see wikipedia.

You could store its decimals digits as text in a file and mmap it to an array.

i once worked on an application that used really large numbers (but didnt need good precision). What we did was store the numbers as logarithms since you can store a pretty big number as a log10 within an int.
Think along this lines before resorting to bit stuffing or some complex bit representations.
I am not too good with complex math, but i reckon there are solutions which are elegant when storing numbers with millions of bits of precision.

IMO, any programmer of arbitrary precision arithmetics needs understanding of base conversion. This solves anyway two problems: being able to calculate pi in hex digits and converting the stuff to decimal representation and as well finding the optimal container.
The dominant constraint is the number of correct bits in the multiplication instruction.
In Javascript one has always 53-bits of accuracy, meaning that a Uint32Array with numbers having max 26 bits can be processed natively. (waste of 6 bits per word).
In 32-bit architecture with C/C++ one can easily get A*B mod 2^32, suggesting basic element of 16 bits. (Those can be parallelized in many SIMD architectures starting from MMX). Also each 16-bit result can contain 4-digit decimal numbers (wasting about 2.5 bits) per word.

Convert really big number from binary to decimal and print it

I know how to convert binary to decimal. I know at least 2 methods: table and power ;-)
I want to convert binary to decimal and print this decimal. Moreover, I'm not interested in this `decimal'; I want just to print it.
But, as I wrote above, I know only 2 methods to convert binary to decimal and both of them required addition. So, I'm computing some value for 1 or 0 in binary and add it to the remembered value. This is a thin place. I have a really-really big number (1 and 64 zeros). While converting I need to place some intermediate result in some 'variable'. In C, I have an `int' type, which is 4 bytes only and not more than 10^11.
So, I don't have enough memory to store intermedite result while converting from binary to decimal. As I wrote above, I'm not interested in THAT decimal, I just want to print the result. But, I don't see any other ways to solve it ;-( Is there any solution to "just print" from binary?
Or, maybe, I should use something like BCD (Binary Coded Decimal) for intermediate representation? I really don't want to use this, 'cause it is not so cross-platform (Intel's processors have a built-in feature, but for other I'll need to write own implementation).
I would glad to hear your thoughts. Thanks for patience.
Language: C.

I highly recommend using a library such as GMP (GNU multiprecision library). You can use the mpz_t data type for large integers, the various import/export routines to get your data into an mpz_t, and then use mpz_out_str() to print it out in base 10.

Biggest standard integral data type is unsigned long long int - on my system (32-bit Linux on x86) it has range 0 - 1.8*10^20 which is not enough for you, so you need to create your own type (struct or array) and write basic math (basically you just need an addition) for that type.
If I were you (and memory is not an issue), I'd use an array - one byte per decimal digit rather then BCD. BCD is more compact as it stores 2 decimal digits per byte but you need to put much more effort working with high and low nibbles separately.
And to print you just add '0' (character, not digit) to every byte of your array and you get a printable string.

Well, when converting from binary to decimal, you really don't need ALL the binary bits at the same time. You just need the bits you are currently calculating the power of and probably a double variable to hold the results.
You could put the binary value in an array, lets say i[64], iterate through it, get the power depending on its position and keep adding it to the double.

Converting to decimal really means calculating each power of ten, so why not just store these in an array of bytes? Then printing is just looping through the array.

Couldn't you allocate memory for, say, 5 int's, and store your number at the beginning of the array? Then manually iterate over the array in int-sized chunks. Perhaps something like:
int* big = new int[5];
*big = <my big number>;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight