Looping through all character combinations with increasing number of elements - arrays

What I want to achieve:
I have a function where I want to loop through all possible combinations of printable ascii-characters, starting with a single character, then two characters, then three etc.
The part that makes this difficult for me is that I want this to work for as many characters as I can (leave it overnight).
For the record: I know that abc really is 97 98 99, so a numeric representation is fine if that's easier.
This works for few characters:
I could create a list of all possible combinations for n characters, and just loop through it, but that would require a huge amount of memory already when n = 4. This approach is literally impossible for n > 5 (at least on a normal desktop computer).
In the script below, all I do is increment a counter for each combination. My real function does more advanced stuff.
If I had unlimited memory I could do (thanks to Luis Mendo):
counter = 0;
some_function = #(x) 1;
number_of_characters = 1;
max_time = 60;
max_number_of_characters = 8;
tic;
while toc < max_time && number_of_characters < max_number_of_characters
number_of_characters = number_of_characters + 1;
vectors = [repmat({' ':'~'}, 1, number_of_characters)];
n = numel(vectors);
combs = cell(1,n);
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1});
combs = cat(n+1, combs{:});
combs = reshape(combs, [], n);
for ii = 1:size(combs, 1)
counter = counter + some_function(combs(ii, :));
end
end
Now, I want to loop through as many combinations as possible in a certain amount of time, 5 seconds, 10 seconds, 2 minutes, 30 minutes, so I'm hoping to create a function that's only limited by the available time, and uses only some reasonable amount of memory.
Attempts I've made (and failed at) for more characters:
I've considered pre-computing the combinations for two or three letters using one of the approaches above, and use a loop only for the last characters. This would not require much memory, since it's only one (relatively small) array, plus one or more additional characters that gets looped through.
I manage to scale this up to 4 characters, but beyond that I start getting into trouble.
I've tried to use an iterator that just counts upwards. Every time I hit any(mod(number_of_ascii .^ 1:n, iterator) == 0) I increment the m'th character by one. So, the last character just repeats the cycle !"# ... ~, and every time it hits tilde, the second character increments. Every time the second character hits tilde, the third character increments etc.
Do you have any suggestions for how I can solve this?

It looks like you're basically trying to count in base-26 (or base 52 if you need CAPS). Each number in that base will account for a specific string of character. For example,
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,10,11,12,...
Here, cap A through P are just symbols that are used to represent number symbols for base-26 system. The above simply represent this string of characters.
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,ba,bb,bc,...
Then, you can simply do this:
symbols = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E',...
'F','G','H','I','J','K','L','M','N','O','P']
characters = ['a','b','c','d','e','f','g','h','i','j','k','l',...
'm','n','o','p','q','r','s','t','u','v','w','x','y','z']
count=0;
while(true)
str_base26 = dec2base(count,26)
actual_str = % char-by-char-lookup-of-str26 to chracter string
count=count+1;
end
Of course, it does not represent characters that begin with trailing 0's. But that should be pretty simple.

You were not far with your idea of just getting an iterator that just counts upward.
What you need with this idea is a map from the integers to ASCII characters. As StewieGriffin suggested, you'd just need to work in base 95 (94 characters plus whitespace).
Why whitespace : You need something that will be mapped to 0 and be equivalent to it. Whitespace is the perfect candidate. You'd then just skip the strings containing any whitespace. If you don't do that and start directly at !, you'll not be able to represent strings like !! or !ab.
First let's define a function that will map (1:1) integers to string :
function [outstring,toskip]=dec2ASCII(m)
out=[];
while m~=0
out=[mod(m,95) out];
m=(m-out(1))/95;
end
if any(out==0)
toskip=1;
else
toskip=0;
end
outstring=char(out+32);
end
And then in your main script :
counter=1;
some_function = #(x) 1;
max_time = 60;
max_number_of_characters = 8;
currString='';
tic;
while numel(currString)<=max_number_of_characters&&toc<max_time
[currString,toskip]=dec2ASCII(counter);
if ~toskip
some_function(currString);
end
counter=counter+1;
end
Some random outputs of the dec2ASCII function :
dec2ASCII(47)
ans =
O
dec2ASCII(145273)
ans =
0)2
In terms of performance I can't really elaborate as I don't know what you want to do with your some_function. The only thing I can say is that the running time of dec2ASCII is around 2*10^(-5) s
Side note : iterating like this will be very limited in terms of speed. With the function some_function doing nothing, you'd just be able to cycle through 4 characters in around 40 minutes, and 5 characters would already take up to 64 hours. Maybe you'd want to reduce the amount of stuff you want to pass through the function you iterate on.
This code, though, is easily parallelizable, so if you want to check more combinations, I'd suggest trying to do it in a parallel manner.

Related

Looping through a set of sequences satisfying a certain property, without storing them

Below is a MATLAB code (recursion) which inputs a vector (l_1,l_2,...,l_r) of non negative integers and an integer m prints all sequences (m_1,m_2,...,m_r) satisfying:
0 <= m_i <= l_i for all 1 <= i <= r and m_1 + m_2 + ... + m_r = m
The r is captured in the function definition by calling the size of the (l_i) array below:
function arr=sumseq(m,lims)
arr=[];
r=size(lims,2);
if r==0 || m<0
arr=[];
elseif r==1 && lims(1)>=m
arr=[m]; %#ok<NBRAK>
else
for i=0:lims(1)
if(lims(1)<0)
arr=[];
else
v=sumseq(m-i,lims(2:end));
arr=[arr;[i*ones(size(v,1),1) v]];
end
end
end
end
Here what I have done is, stored a whole array of them and made it my output. Instead I want to only print them one by one and not store them in an array. This seems simple enough as there is not much choice in which line(s) I need to change (I believe it is the contents of the else block inside the for loop), but I get into a fix every time I try to achieve it.
(Also, MATLAB warned me that if I kept re-initializing the array with a larger array like in the statement:
arr=[arr;[i*ones(size(v,1),1) v]];
it reallocates a fresh array for all the contents of arr and spends a 'lot' of time doing so.)
In short: recursion or not, I want to save the trouble of storing it, and need an algorithm which is as efficient as or more efficient than what I have here.

Can you preallocate an array of random size?

The essential part of the code in question can be distilled into:
list=rand(1,x); % where x is some arbitrarily large integer
hitlist=[];
for n=1:1:x
if rand(1) < list(n)
hitlist=[hitlist n];
end
end
list(hitlist)=[];
This program is running quite slowly and I suspect this is why, however I'm unaware how to fix it. The length of the hitlist will necessarily vary in a random way, so I can't simply preallocate a 'zeros' of the proper size. I contemplated making the hitlist a zeros the length of my list, but then I would have to remove all the superfluous zeros, and I don't know how to do that without having the same problem.
How can I preallocate an array of random size?
I'm unsure about preallocating 'random size', but you can preallocate in large chunks, e.g. 1e3, or however is useful for your use case:
list=rand(1,x); % where x is some arbitrarily large integer
a = 1e3; % Increment of preallocation
hitlist=zeros(1,a);
k=1; % counter
for n=1:1:x
if rand(1) < list(n)
hitlist(k) = n;
k=k+1;
end
if mod(k-1,a)==0 % if a has been reached
hitlist = [hitlist zeros(1,a)]; % extend
end
end
hitlist = hitlist(1:k-1); % trim excess
% hitlist(k:end) = []; % alternative trim, but might error
list(hitlist)=[];
This won't be the fastest possible, but at least a whole lot faster than incrementing each iteration. Make sure to choose a suitable; you can even base it somehow on the available amount of RAM using memory, and trim the excess afterwards, that way you don't have to do the in-loop trick at all.
As an aside: MATLAB works column-major, so running through matrices that way is faster. I.e. first the first column, then the second and so on. For a 1D array this doesn't matter, but for matrices it does. Hence I prefer to use list = rand(x,1), i.e. as column.
For this specific case, don't use this looped approach anyway, but use logical indexing:
list = rand(x,1);
list = list(list<rand(size(list)));

Implementing Radix sort in java - quite a few questions

Although it is not clearly stated in my excercise, I am supposed to implement Radix sort recursively. I've been working on the task for days, but yet, I only managed to produce garbage, unfortunately. We are required to work with two methods. The sort method receives a certain array with numbers ranging from 0 to 999 and the digit we are looking at. We are supposed to generate a two-dimensional matrix here in order to distribute the numbers inside the array. So, for example, 523 is positioned at the fifth row and 27 is positioned at the 0th row since it is interpreted as 027.
I tried to do this with the help of a switch-case-construct, dividing the numbers inside the array by 100, checking for the remainder and then position the number with respect to the remainder. Then, I somehow tried to build buckets that include only the numbers with the same digit, so for example, 237 and 247 would be thrown in the same bucket in the first "round". I tried to do this by taking the whole row of the "fields"-matrix where we put in the values before.
In the putInBucket-method, I am required to extent the bucket (which I managed to do right, I guess) and then returning it.
I am sorry, I know that the code is total garbage, but maybe there's someone out there who understands what I am up to and can help me a little bit.
I simply don't see how I need to work with the buckets here, I even don't understand why I have to extent them, and I don't see any way to returning it back to the sort-method (which, I think, I am required to do).
Further description:
The whole thing is meant to work as follows: We take an array with integers ranging from 0 to 999. Every number is then sorted by its first digit, as mentioned above. Imagine you have buckets denoted with the numbers ranging from 0 to 9. You start the sorting by putting 523 in bucket 5, 672 in bucket 6 and so on. This is easy when there is only one number (or no number at all) in one of the buckets. But it gets harder (and that's where recursion might come in hand) when you want to put more than one number in one bucket. The mechanism now goes as follows: We put two numbers with the same first digit in one bucket, for example 237 and 245. Now, we want to sort these numbers again by the same algorithm, meaning we call the sort-method (somehow) again with an array that only contains these two numbers and sorting them again, but now my we do by looking at the second digit, so we would compare 3 and 4. We sort every number inside the array like this, and at the end, in order to get a sorted array, we start at the end, meaning at bucket 9, and then just put everything together. If we would be at bucket 2, the algorithm would look into the recursive step and already receive the sorted array [237, 245] and deliver it in order to complete the whole thing.
My own problems:
I don't understand why we need to extent a bucket and I can't figure it out from the description. It is simply stated that we are supposed to do so. I'd imagine that we would to it to copy another element inside it, because if we have the buckets from 0 to 9, putting in two numbers inside the same bucket would just mean that we would overwrite the first value. This might be the reason why we need to return the new, extended bucket, but I am not sure about that. Plus, I don't know how to go further from there. Even if I have an extened bucket now, it's not like I can simply stick it to the old matrix and copy another element into it again.
public static int[] sort(int[] array, int digit) {
if (array.length == 0)
return array;
int[][] fields = new int[10][array.length];
int[] bucket = new int[array.length];
int i = 0;
for (int j = 0; j < array.length; j++) {
switch (array[j] / 100) {
case 0: i = 0; break;
case 1: i = 1; break;
...
}
fields[i][j] = array[j]
bucket[i] = fields[i][j];
}
return bucket;
}
private static int[] putInBucket(int [] bucket, int number) {
int[] bucket_new = int[bucket.length+1];
for (int i = 1; i < bucket_new.length; i++) {
bucket_new[i] = bucket[i-1];
}
return bucket_new;
}
public static void main (String [] argv) {
int[] array = readInts("Please type in the numbers: ");
int digit = 0;
int[] bucket = sort(array, digit);
}
You don't use digit in sort, that's quite suspicious
The switch/case looks like a quite convoluted way to write i = array[j] / 100
I'd recommend to read the wikipedia description of radix sort.
The expression to extract a digit from a base 10 number is (number / Math.pow(10, digit)) % 10.
Note that you can count digits from left to right or right to left, make sure you get this right.
I suppose you first want to sort for digit 0, then for digit 1, then for digit 2. So there should be a recursive call at the end of sort that does this.
Your buckets array needs to be 2-dimensional. You'll need to call it this way: buckets[i] = putInBucket(buckets[i], array[j]). If you handle null in putInBuckets, you don't need to initialize it.
The reason why you need a 2d bucket array and putInBucket (instead of your fixed size field) is that you don't know how many numbers will end up in each bucket
The second phase (reading back from the buckets to the array) is missing before the recursive call
make sure to stop the recursion after 3 digits
Good luck

very large loop counts in c

How can I run a loop in c for a very large count in c for eg. 2^1000 times?
Also, using two loops that run a and b no. of times, we get a resultant block that runs a*b no. of times. Is there any smart method for running a loop a^b times?
You could loop recursively, e.g.
void loop( unsigned a, unsigned b ) {
unsigned int i;
if ( b == 0 ) {
printf( "." );
} else {
for ( i = 0; i < a; ++i ) {
loop( a, b - 1 );
}
}
}
...will print a^b . characters.
While I cannot answer your first question, (although look into libgmp, this might help you work with large numbers), a way to perform an action a^b times woul be using recursion.
function (a,b) {
if (b == 0) return;
while (i < a) {
function(a,b-1);
}
}
This will perform the loop a times for each step until b equals 0.
Regarding your answer to one of the comments: But if I have two lines of input and 2^n lines of trash between them, how do I skip past them? Can you tell me a real life scenario where you will see 2^1000 lines of trash that you have to monitor?
For a more reasonable (smaller) number of inputs, you may be able to solve what sounds to be your real need (i.e. handle only relevant lines of input), not by iterating an index, but rather by simply checking each line for the relevant component as it is processed in a while loop...
pseudo code:
BOOL criteriaMet = FALSE;
while(1)
{
while(!criteriaMet)
{
//test next line of input
//if criteria met, set criteriaMet = TRUE;
//if criteria met, handle line of input
//if EOF or similar, break out of loops
}
//criteria met, handle it here and continue
criteriaMet = FALSE;//reset for more searching...
}
Use a b-sized array i[] where each cell hold values from 0 to a-1. For example - for 2^3 use a 3-sized array of booleans.
On each iteration. Increment i[0]. If a==i[0], set i[0] to 0 and increment i[1]. If 0==i[1], set i[1] to 0 and increment i[2], and so on until you increment a cell without reaching a. This can easily be done in a loop:
for(int j=0;j<b;++j){
++i[j];
if(i[j]<a){
break;
}
}
After a iterations, i[0] will return to zero. After a^2 iterations, i[0],i[1] will both be zero. AFter a^b iterations, all cells will be 0 and you can exit the loop. You don't need to check the array each time - the moment you reset i[b-1] you know the all the array is back to zero.
Your question doesn't make sense. Even when your loop is empty you'd be hard pressed to do more than 2^32 iterations per second. Even in this best case scenario, processing 2^64 loop iterations which you can do with a simple uint64_t variable would take 136 years. This is when the loop does absolutely nothing.
Same thing goes for skipping lines as you later explained in the comments. Skipping or counting lines in text is a matter of counting newlines. In 2006 it was estimated that the world had around 10*2^64 bytes of storage. If we assume that all the data in the world is text (it isn't) and the average line is 10 characters including newline (it probably isn't), you'd still fit the count of numbers of lines in all the data in the world in one uint64_t. This processing would of course still take at least 136 years even if the cache of your cpu was fed straight from 4 10Gbps network interfaces (since it's inconceivable that your machine could have that much disk).
In other words, whatever problem you think you're solving is not a problem of looping more than a normal uint64_t in C can handle. The n in your 2^n can't reasonably be more than 50-55 on any hardware your code can be expected to run on.
So to answer your question: if looping a uint64_t is not enough for you, your best option is to wait at least 30 years until Moore's law has caught up with your problem and solve the problem then. It will go faster than trying to start running the program now. I'm sure we'll have a uint128_t at that time.

Generating Strings

I am about creating a distributed Password Cracker, in which I will use brute force technique, so I need every combination of string.
For the sake of distribution, Server will give Client a range of strings like from "aaaa" to "bxyz". I am supposing that string length will be of four. So I need to check every string between these two bounds.
I am trying to generate these strings in C. I am trying to make logic for this but I'm failing; I also searched on Google but no benefit. Any Idea?
EDIT
Sorry brothers, I would like to edit it
I want combination of string with in a range, lets suppose between aaaa and aazz that would be strings like aaaa aaab aaac aaad ..... aazx aazy aazz .. my character space is just upper and smaller English letters that would be like 52 characters. I want to check every combination of 4 characters. but Server will distribute range of strings among its clients. MY question was if one client gets range between aaaa and aazz so how will I generate strings between just these bounds.
If your strings will comprehend only the ASCII table, you'll have, as an upper limit, 256 characters, or 2^8 characters.
Since your strings are 4 characters length, you'll have 2^8 * 2^8 * 2^8 * 2^8 combinations,
or 2^8^4 = 2^32 combinations.
Simply split the range of numbers and start the combinations in each machine.
You'll probably be interested in this: Calculating Nth permutation step?
Edit:
Considering your edit, your space of combinations would be 52^4 = 7.311.616 combinations.
Then, you do simply need to divide these "tasks" for each machine to compute, so, 7.311.616 / n = r, having r as the amount of permutations calculated by each machine -- the last machine may compute r + (7.311.616 % n) combinations.
Since you know the amount of combinations to build in each machine, you'll have to execute the following, in each machine:
function check_permutations(begin, end, chars) {
for (i = begin; i < end; i++) {
nth_perm = nth_permutation(chars, i);
check_permutation(nth_perm); // your function of verification
}
}
The function nth_permutation() is not hard to derive, and I'm quite sure you can get it in the link I've posted.
After this, you would simply start a process with such a function as check_permutations, giving the begin, end, and the vector of characters chars.
You can generate a tree containing all the permutations. E.g., like in this pseudocode:
strings(root,len)
for(c = 'a' to 'z')
root->next[c] = c
strings(&root->next[c], len - 1)
Invoke by strings(root, 4).
After that you can traverse the tree to get all the permutations.

Resources