Best programming language for very large arrays and very large numbers? - arrays

What would be the best programming language for very large arrays and very large numbers?
With arrays over 30,000 indexes
And numbers over 100 digits
Also it needs to be efficient, or easy to make efficient.
Thanks.

Almost any programming language worth its salt should have these characteristics, and frankly I don't think I'd want to use any language that can't handle arrays of 30,000 elements. I'll list a few that have good support for very large numbers:
python. Python 3 has automatic support for large numbers as the default number type grows as necessary, and has some really awesome math libraries. Other languages may be ever so slightly faster, but unless for some reason you know for sure that python won't be good enough I'd start there.
C#. This will mostly bind you to windows, but its very popular, fast, and meets your requirements.
Java. Cross platform, mature support with BigInteger.
Haskell. Pretty seamless conversions to large numbers and powerful math support. If you have a strong mathematics background Haskell will feel pretty natural. If you already know functional programming or don't mind devoting a fascinating few hours to learning it, this is a good choice.
C/C++. Very fast, but a little more complex to develop in. You'll probably get better results in large number support with something else. I'd only look into C++ if you've tried optimizing code in another languages and its still not fast enough, unless you have a specific reason to not use an intermediately compiled language.
The truth of the matter is that its hard to find a programming language that doesn't support these things, and if you could I probably wouldn't use it for anything because its probably not that mature. Do you have any other requirements that would help us narrow it down further for you? :D

The array is not the issue. Numbers consisting of 100 numerals (digits) is a huge issue. I don't have a good answer to the question (out of date as it is) but as this comes up readily in Google I'll mention that most languages only support between 32 to 64 bit numbers.
(I know that the C family of languages, PHP, as3 and Java don't support massive numbers.)
For example a 32 bit number would allow a range of 0 to 4,294,967,295 (2^32-1) which is only 10 numerals (Actually more like 9 because the limit is by size, not numerals), a whole order of magnitude less than the required 100 digits the questioner was after.
That said I know that there are cases of people implementing support for large numbers in C and AS3...

Python with NumPy is probably what you want.

I always found Fortran to be quite nice when dealing with arrays, esp. with multi-dimensional ones. If you are dealing with very large numbers, you will probably need to define your own data type or live with a loss of precision, though. Or use this: http://www.fortran.com/big_integer_module.f95 .
But it depends a bit on what you want to do. Fortran is nice for numerical computations, and not so nice for about everything else.

Related

which encoding to use for genetic algorithm?

I want to code a genetic algorithm in C for optimizing a function of 10 variables (x1 to x10). However I am not able to figure out which encoding I should use. I have mostly seen binary encoding being used in example but the variables in my case can take real values. Also, is value encoding a good option for these types of problems?
For real valued problems I would suggest to try CMA-ES or another ES variant. CMA-ES certainly is the current state of the art for real-valued problems. It is designed to find good solutions in multidimensional problems quickly. There are implementations available on Hansen's page. There's also a C# implementation in the work for HeuristicLab. Evolution strategies are algorithms that were specifically designed for real-valued optimization problems. They are very similar to genetic algorithms (both were invented around the same time, but in different places). The main distinction is that for ES the main driver is mutation and it features a clever adaption of the mutation strength. Without this adaption the (local) optimum cannot be located in time. CMA-ES is easy to configure, all it needs is the initial standard deviation and optionally the population size (otherwise there's a formula that estimates this given the problem size).
Genetic algorithms can of course also be applied, but you have to use some specific operators which are able to mutate variables only with very small degree. For example there's the Breeder Genetic Algorithm from Mühlenbein. In general however genetic algorithms are more suited for problems that need a right combination of things. E.g. which items to include in a knapsack problem or which functions and terminals to combine to a formula (genetic programming). Less for problems, where you need to find the right value for something. Although of course there are variants of the genetic algorithm to solve these, look for Real coded Genetic Algorithm (RCGA or RGA).
Another algorithm suited for real-valued problems is Particle Swarm Optimization, but in my opinion it is harder to configure. I'd start with SPSO-2011 the 2011 standard PSO.
If your problem contains integer variables choices become more difficult. Evolution strategies do not perform so well when variables are discrete, because the adaptation schemes for integer variables are different. A genetic algorithm becomes an interesting first-choice algorithm again.
A genetic algorithm is best used when two answers that are pretty close to optimal will make something else pretty close to optimal when combined. The problem with a pure binary encoding is that if you don't check your crossover you end up getting two answers which may not have all that much to do with the original answers.
That said, this is only really an issue if your number of variables is very small and the amount of data in your variables is large. As far as picking an encoding, it's more of an art than a science and it depends on your problem. I would suggest going with an encoding that fits the amount of precision you want. With 10 variables you won't got that far wrong however you encode it, an 8-bit ASCII encoder would probably work fine.
Hope that helps.

Converting an arbitrary large number to base 256

I have a number of very large length may be upto 50 digits. I am taking that as string input. However, I need to perform operations on it. So, I need to convert them to a proper base, lets say, 256.
What will be the best algorithm to do so?
Multiple-precision arithmetic (a.k.a. bignums) is a difficult subject, and the good algorithms are non intuitive (there are books about that).
There exist several libraries handling bignums, like e.g. the GMP library (and there are other ones). And most of them take profit from some hardware instructions (e.g. add with carry) with carefully tuned small chunks of assembler code. So they perform better than what you would be able to code in a couple of months.
I strongly recommend using existing bignum libraries. Writing your own would take you years of work, if you want it to be competitive.
See also answers to this question.

How do algorithms differ from design patterns?

I am new to C programming; coming from an OOP PHP background.
I find C to be (no wonder) a much more difficult language. I had particularly lots of problems figuring out a couple of things on arrays at first: like there is no native associative array.
Now, this part I guess I'm figuring out little by little, but now I have a question regarding a conversation I had just yesterday with a C developer. She was explaining the binary search algorithm to me because I asked her whether there were libraries to do array related stuff in C or not because it seemed like a smarter solution than always re-inventing the wheel.
I would really love to learn more about algorithms in C, in particular what differences are there between algorithms and the design patterns I'm used to using in PHP?
Taking things in order: the extent of C's support for anything like an associative array would be qsort to sort an array of structures based on a key, and bsearch to find one based on a key. There are, of course, quite a few alternatives -- various other libraries have hash tables, balanced trees, etc. Exactly which will suit your purposes is hard to guess though.
Offhand, I don't know of many good books covering algorithms that use C as their primary vehicle for demonstration. A few obvious recommendations for books on algorithms in general (mostly language independent) would be:
The Art of Computer Programming by Donald Knuth. This is pretty much the class algorithms book. It's now (finally) up to four volumes. Knuth originally started on it in 1967, planning to write 7 volumes. Only three volumes were available for a long time. A fourth was added quite recently. At the rate he's going, it's only going to make it to 7 if Knuth lives to be well past 100 years old. Nonetheless, the parts that are there are extremely good -- but (warning!) he analyzes the algorithms in considerable detail; if you don't know at least a little calculus, a fair amount will probably be hard to follow.
Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein. IIRC, there's now a newer edition than I have, which adds yet another author. This is a large book (dropping it on your toes would be quite painful). It uses a fair amount of mathematical notation and such throughout, but if you're willing to work a little at looking up the notation, it's really pretty understandable. It covers quite a bit of important ground (e.g., graph algorithms) that are scheduled for later volumes of Knuth, but not (at least yet) available there.
Algorithms and Data Structures by Aho, Hopcraft and Ullman. This is (by a pretty fair margin) the smallest, lightest, and at least for most people probably the easiest of these to follow.
Though it's only available used anymore, if you can find a copy of Algorithms + Data Structures = Programs by Niklaus Wirth, that's what I'd really suggest. It uses Pascal (no surprise -- Niklaus Wirth invented Pascal), but that's enough like C that it doesn't cause a real problem. It doesn't go into as much depth as Knuth about each algorithm, but still enough to give a good feel for when one is likely to be a good choice versus another. For somebody in your position (some background in programming, but little in this area) it's my top recommendation.
Though I've said it before, I think it bears repeating: IMO, all of Robert Sedgewick's books on algorithms should be avoided. Algorithms in C++ is probably the worst of them, but the others are only marginally better. The code they include (again, especially the C++ version) is truly execrable, and the descriptions of algorithms are often incomplete and/or misleading. The most recent editions have fixed some of the problems, but (IMO) not nearly enough to qualify as something that should ever be recommended. If there was no alternative, you could probably get by with these, but given the number of alternatives that are dramatically superior, the only reason to read these at all is if somebody gives them to you, and you absolutely can't afford anything else.
As far as algorithms versus design patterns goes, the line can get blurry in places, but generally an algorithm is much more tightly defined. An algorithm will normally have a specific, tightly defined input which it processes in a specific way to produce an equally specific result/output. A design pattern tends to be more loosely defined, more generic. An algorithm can be generic as well (e.g., a sorting algorithms might require a type that defines a strict, weak ordering) but still has specific requirements on the type.
A design pattern tends to be somewhat more loosely defined. For example, the visitor pattern involves processing groups of objects -- but we don't want to modify the types of those objects when we decide we need to process them in a new and different way. We do that by defining the processes separately from the objects to be processed, along with how we'll traverse the groups of objects, and allow a process to work with each.
To look at it from a rather different direction, you can usually implement an algorithm with a function or a small group of functions. A design pattern tends to be oriented more toward the style in which you write your code, rather than just "here's a function, use it."
"Algorithms in C, Parts 1-5 (Bundle): Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition)"
Cannot stress how good that series is.

Higher level languages with C functions

Maybe this has been asked before, but I couldn't find it. My question is simple: Does it make sense to write an application in higher level languages (Java, C#, Python) and time/performance-critical functions in C? Or at this point unless you do very low level OS/game/sensor programming it is all the same to have a full, say, Java application?
It makes sense if you a) notice a performance issue, AND b) use performance measurements to locate where the problem occurs, AND c) can't achieve the desired performance by modifying the existing code.
If any of these items don't apply, then it's probably premature optimization.
If you are fluent and productive in a higher level language such a Python and Lua, then by all means start writing in that language. Look for bottlenecks if and when they exist.
speed can be quite similar with things like C#.
What is tricky is latency. So if you want to write something which you know takes < 10ms then C is reasonably predictable (ignoring whatever variability your operating system might introduce).
Having said that for very tight long loops (image processing for example), things like C/C++ can offer some speed up. You can get quite reasonable performance out of C#, you do have to be careful how you program it though, but I have found in general, you can still squeeze more out of C/C++
Usually your preferred language will do whatever you need it to in acceptable time (er, blazing fast).
Sure, critical time/performance functions can be written in a "more optimal/suitable" language like C or assembly - but whether it will actually make things faster is another story. There are laws that govern how much actual/overall speed-up that you'll get, specifically Amdahs Law and (diminishing returns) .
To answer your question, it only makes sense to rewrite these critical functions in lower languages if there is good enough speed-up to warrant the extra work.
I suggest you read Cliff Click's excellent Java vs. C Performance....Again.. It outlines many points of comparison between Java and C++.
Predictably, the conclusion is that it depends, but it's a worthwhile read.
You can only really answer this on a case by case basis and without reference to what you are doing it's impossible to answer.
But maybe what you actually want here is some kind of sanity check to ensure that this approach isn't crazy to consider. I have worked on tools ranging from a very large graphics application (~ million lines) to relatively small physical simulation engines (~10000 lines) there were written just as you describe: Python on the outside for interface (both for API and for GUI), C/C++ on the inside for the heavy lifting. They all benefited from this division of responsibility.
This is done especially with scripting languages. Things that come to mind are games made in Python. Most of the time Python is too slow for some of the more number crunching aspects of games and they make this a C module for speed. Be sure that you actually need the speed though and that number crunching is your performance bottleneck and not a general algorithm issue. Doing a brute-force search over a list is going to be slow in both C and Python.
I'd say that it depends a lot on your application.
What sort of performance is important? short startup-time? high throughput? low latency? Is it important that response time is always predictable?
Is the application short lived or does it run for long periods of time?
Java can give you high throughput, but occasional short freezes while doing garbage collection. C# is probably similar.
Python, well the performance there will often lag behind the others for anything not written in C (some things ARE written in C, even if you didn't do it yourself).
So as others said. It depends.
But as always with performance: Measure first, optimize when you know you need to.

How to Code a Solution To Deal With Large Numbers? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm doing some Project Euler problems and most of the time, the computations involve large numbers beyond int, float, double etc.
Firstly, I know that I should be looking for more efficient ways of calculation so as to avoid the large number problem. I've heard of the Bignum libraries.
But, for academics interests, I'd like to know how to code my own solution to this problem.
Can any expert please help me out? (My language is C)
You need to store the big numbers in a base that your computer can easily handle with its native types, and then store the digits in a variable length array. I'd suggest that for simplicity you start by storing the numbers in base 10 just to get the hang of how to do this. It will make debugging a lot easier.
Once you have a class that can store the numbers in this form, it's just a matter of implementing the operations add, subtract, multiply, etc. on this class. Each operation will have to iterate over digits of its operands and combine them, being careful to carry correctly so that your digits are never larger than the base. Addition and subtraction are simple. Multiplication requires a bit more work as the naive algorithm requires nested loops. Then once you have that working, you can try implementing exponentiation in an efficient manner (e.g. repeated squaring).
If you are planning to write a serious bignum implementation, base 10 won't cut it. It's wasteful of memory and it will be slow. You should choose a base that is natural for the computer, such as 256 or the word size (2**32). However this will make simple operations more difficult as you will get overflows if you naively add two digits, so you will need to handle that very carefully.
C is not a good choice for Project Euler. The benefits of C are raw speed, machine portability (to an extent, with standard C), language interoperability (if some language communicates with another, C is a popular first choice), sticking close to a specific library or platform's API (because C is common, e.g. OS API), and a stable language & stdlib. None of these benefits apply to solving Project Euler problems. Not even raw speed, because most of the problems aren't about raw computation, but understanding the algorithm required, and you can sit there all day and wait before submission.
If you are attempting Project Euler problems to broaden your experience with C, that's perfectly fine, just realize this experience doesn't necessarily apply to long-lived and real-world C projects you may work on.
For this kind of short, one-off problem those languages commonly described as "scripting languages" will work better, faster (in dev time), and easier. Try Python, it stays close to C in many ways, including a C API, and out of the various popular "scripting languages" is possibly the one for which you will find the most use in conjunction with C projects.
This may become an unpopular answer, but it isn't a rant—plus I really like C and use C/C++ often—and there is an explicit answer here to your problem: "don't use C", with your final large number solution depending on which alternative you choose. Again picking on Python, integers do not have an upper bound (note below), and I use this to naturally code answers to Project Euler problems, where in other languages I have to use a painful-by-comparison alternative number library.
(Python integers: There are two integer types in 2.x, 'int' and 'long' (which have been completely unified in 3.x). The conversion between them is practically seamless, and 'long' allows arbitrarily large values, instead of just being a bigger 'int' type as C's long is.)
A popular bignum library for C/C++ is the GNU MP Bignum Library. I've used it for several Project Euler problems, but fact remains that C isn't a very suitable language for Euler-problems. If performance was more important C would have more to give, but now you're much better off using a language which built in bignum support, such as Ruby (there are lots of others).
A simple way is to think of the number as its string representation in base b. Suppose b=10, simple arithmetic operation like addition on two such strings can be done using the same method we use when adding numbers by pen and paper. The same goes for other simple operations. For better results, you can take a larger base.
A simple bignum implementation like that should be enough for most Project Euler problems (probably all, but I haven't solved much at Euler so can't be sure), but there are ways of using much faster algorithms for operations such as multiplication and division/mod.
Although I recommend writing your own bignum for practice, if you are really stuck you can take ideas from the code of already implemented bigint libraries. For a serious implementation something like gmp is the obvious choice. But you cana also find small bigints coded by other people when solving similar practice problem online (e.g. Abednego's bigint.cpp).
Here's a nice and simple bignum module for C. You can learn from it for ideas. The C code isn't the highest quality, but the algorithm is well implemented and quite common.
For more advanced stuff, look up GMP.
If you want a nice C++ version (I know, you said C, but this is really interesting code), take a look at the internals of CGAL: http://www.cgal.org/

Resources