Number of possible sequences of 16 symbols are there given some restrictions - c

How many possible sequences can be formed that obey the following rules:
Each sequence is formed from the symbols 0-9a-f.
Each sequence is exactly 16 symbols long.
0123456789abcdef ok
0123456789abcde XXX
0123456789abcdeff XXX
Symbols may be repeated, but no more than 4 times.
00abcdefabcdef00 ok
00abcde0abcdef00 XXX
A symbol may not appear three times in a row.
00abcdefabcdef12 ok
000bcdefabcdef12 XXX
There can be at most two pairs.
00abcdefabcdef11 ok
00abcde88edcba11 XXX
Also, how long would it take to generate all of them?

In combinatorics, counting is usually pretty straight-forward, and can be accomplished much more rapidly than exhaustive generation of each alternative (or worse, exhaustive generative of a superset of possibilities, in order to filter them). One common technique is to reduce a given problem to combinatons of a small(ish) number of disjoint sub-problems, where it is possible to see how many times each subproblem contributes to the total. This sort of analysis can often result in dynamic programming solutions, or, as below, in a memoised recursive solution.
Because combinatoric results are usually enormous numbers, brute-force generation of every possibility, even if it can be done extremely rapidly for each sequence, is impractical in all but the most trivial of cases. For this particular question, for example, I made a rough back-of-the-envelope estimate in a comment (since deleted):
There are 18446744073709551616 possible 64-bit (16 hex-digit) numbers, which is a very large number, about 18 billion billion. So if I could generate and test one of them per second, it would take me 18 billion seconds, or about 571 years. So with access to a cluster of 1000 96-core servers, I could do it all in about 54 hours, just a bit over two days. Amazon will sell me one 96-core server for just under a dollar an hour (spot prices), so a thousand of them for 54 hours would cost a little under 50,000 US dollars. Perhaps that's within reason. (But that's just to generate.)
Undoubtedly, the original question has is part of an exploration of the possibility of trying every possible sequence by way of cracking a password, and it's not really necessary to produce a precise count of the number of possible passwords to demonstrate the impracticality of that approach (or its practicality for organizations which have a budget sufficient to pay for the necessary computing resources). As the above estimate shows, a password with 64 bits of entropy is not really that secure if what it is protecting is sufficiently valuable. Take that into account when generating a password for things you treasure.
Still, it can be interesting to compute precise combinatoric counts, if for no reason other than the intellectual challenge.
The following is mostly a proof-of-concept; I wrote it in Python because Python offers some facilities which would have been time-consuming to reproduce and debug in C: hash tables with tuple keys and arbitrary precision integer arithmetic. It could be rewritten in C (or, more easily, in C++), and the Python code could most certainly be improved, but given that it only takes 70 milliseconds to compute the count request in the original question, the effort seems unnecessary.
This program carefully groups the possible sequences into different partitions and caches the results in a memoisation table. For the case of sequences of length 16, as in the OP, the cache ends up with 2540 entries, which means that the core computation is only done 2540 times:
# The basis of the categorization are symbol usage vectors, which count the
# number of symbols used (that is, present in a prefix of the sequence)
# `i` times, for `i` ranging from 1 to the maximum number of symbol uses
# (4 in the case of this question). I tried to generalise the code for different
# parameters (length of the sequence, number of distinct symbols, maximum
# use count, maximum number of pairs). Increasing any of these parameters will,
# of course, increase the number of cases that need to be checked and thus slow
# the program down, but it seems to work for some quite large values.
# Because constantly adjusting the index was driving me crazy, I ended up
# using 1-based indexing for the usage vectors; the element with index 0 always
# has the value 0. This creates several inefficiencies but the practical
# consequences are insignificant.
### Functions to manipulate usage vectors
def noprev(used, prevcnt):
"""Decrements the use count of the previous symbol"""
return used[:prevcnt] + (used[prevcnt] - 1,) + used[prevcnt + 1:]
def bump1(used, count):
"""Registers that one symbol (with supplied count) is used once more."""
return ( used[:count]
+ (used[count] - 1, used[count + 1] + 1)
+ used[count + 2:]
)
def bump2(used, count):
"""Registers that one symbol (with supplied count) is used twice more."""
return ( used[:count]
+ (used[count] - 1, used[count + 1], used[count + 2] + 1)
+ used[count + 3:]
)
def add1(used):
"""Registers a new symbol used once."""
return (0, used[1] + 1) + used[2:]
def add2(used):
"""Registers a new symbol used twice."""
return (0, used[1], used[2] + 1) + used[3:]
def count(NSlots, NSyms, MaxUses, MaxPairs):
"""Counts the number of sequences of length NSlots over an alphabet
of NSyms symbols where no symbol is used more than MaxUses times,
no symbol appears three times in a row, and there are no more than
MaxPairs pairs of symbols.
"""
cache = {}
# Canonical description of the problem, used as a cache key
# pairs: the number of pairs in the prefix
# prevcnt: the use count of the last symbol in the prefix
# used: for i in [1, NSyms], the number of symbols used i times
# Note: used[0] is always 0. This problem is naturally 1-based
def helper(pairs, prevcnt, used):
key = (pairs, prevcnt, used)
if key not in cache:
# avail_slots: Number of remaining slots.
avail_slots = NSlots - sum(i * count for i, count in enumerate(used))
if avail_slots == 0:
total = 1
else:
# avail_syms: Number of unused symbols.
avail_syms = NSyms - sum(used)
# We can't use the previous symbol (which means we need
# to decrease the number of symbols with prevcnt uses).
adjusted = noprev(used, prevcnt)[:-1]
# First, add single repeats of already used symbols
total = sum(count * helper(pairs, i + 1, bump1(used, i))
for i, count in enumerate(adjusted)
if count)
# Then, a single instance of a new symbol
if avail_syms:
total += avail_syms * helper(pairs, 1, add1(used))
# If we can add pairs, add the already not-too-used symbols
if pairs and avail_slots > 1:
total += sum(count * helper(pairs - 1, i + 2, bump2(used, i))
for i, count in enumerate(adjusted[:-1])
if count)
# And a pair of a new symbol
if avail_syms:
total += avail_syms * helper(pairs - 1, 2, add2(used))
cache[key] = total
return cache[key]
rv = helper(MaxPairs, MaxUses, (0,)*(MaxUses + 1))
# print("Cache size: ", len(cache))
return rv
# From the command line, run this with the command:
# python3 SLOTS SYMBOLS USE_MAX PAIR_MAX
# There are defaults for all four argument.
if __name__ == "__main__":
from sys import argv
NSlots, NSyms, MaxUses, MaxPairs = 16, 16, 4, 2
if len(argv) > 1: NSlots = int(argv[1])
if len(argv) > 2: NSyms = int(argv[2])
if len(argv) > 3: MaxUses = int(argv[3])
if len(argv) > 4: MaxPairs = int(argv[4])
print (NSlots, NSyms, MaxUses, MaxPairs,
count(NSlots, NSyms, MaxUses, MaxPairs))
Here's the result of using this program to compute the count of all valid sequences (since a sequence longer than 64 is impossible given the constraints), taking less than 11 seconds in total:
$ time for i in $(seq 1 65); do python3 -m count $i 16 4; done
1 16 4 2 16
2 16 4 2 256
3 16 4 2 4080
4 16 4 2 65040
5 16 4 2 1036800
6 16 4 2 16524000
7 16 4 2 263239200
8 16 4 2 4190907600
9 16 4 2 66663777600
10 16 4 2 1059231378240
11 16 4 2 16807277588640
12 16 4 2 266248909553760
13 16 4 2 4209520662285120
14 16 4 2 66404063202640800
15 16 4 2 1044790948722393600
16 16 4 2 16390235567479693920
17 16 4 2 256273126082439298560
18 16 4 2 3992239682632407024000
19 16 4 2 61937222586063601795200
20 16 4 2 956591119531904748877440
21 16 4 2 14701107045788393912922240
22 16 4 2 224710650516510785696509440
23 16 4 2 3414592455661342007436384000
24 16 4 2 51555824538229409502827923200
25 16 4 2 773058043102197617863741843200
26 16 4 2 11505435580713064249590793862400
27 16 4 2 169863574496121086821681298457600
28 16 4 2 2486228772352331019060452730124800
29 16 4 2 36053699633157440642183732148192000
30 16 4 2 517650511567565591598163978874476800
31 16 4 2 7353538304042081751756339918288153600
32 16 4 2 103277843408210067510518893242552998400
33 16 4 2 1432943471827935940003777587852746035200
34 16 4 2 19624658467616639408457675812975159808000
35 16 4 2 265060115658802288611235565334010714521600
36 16 4 2 3527358829586230228770473319879741669580800
37 16 4 2 46204536626522631728453996238126656113459200
38 16 4 2 595094456544732751483475986076977832633088000
39 16 4 2 7527596027223722410480884495557694054538752000
40 16 4 2 93402951052248340658328049006200193398898022400
41 16 4 2 1135325942092947647158944525526875233118233702400
42 16 4 2 13499233156243746249781875272736634831519281254400
43 16 4 2 156762894800798673690487714464110515978059412992000
44 16 4 2 1774908625866508837753023260462716016827409668608000
45 16 4 2 19556269668280714729769444926596793510048970792448000
46 16 4 2 209250137714454234944952304185555699000268936613376000
47 16 4 2 2169234173368534856955926000562793170629056490849280000
48 16 4 2 21730999613085754709596718971411286413365188258316288000
49 16 4 2 209756078324313353775088590268126891517374425535395840000
50 16 4 2 1944321975918071063760157244341119456021429461885104128000
51 16 4 2 17242033559634684233385212588199122289377881249323872256000
52 16 4 2 145634772367323301463634877324516598329621152347129008128000
53 16 4 2 1165639372591494145461717861856832014651221024450263064576000
54 16 4 2 8786993110693628054377356115257445564685015517718871715840000
55 16 4 2 61931677369820445021334706794916410630936084274106426433536000
56 16 4 2 404473662028342481432803610109490421866960104314699801413632000
57 16 4 2 2420518371006088374060249179329765722052271121139667645435904000
58 16 4 2 13083579933158945327317577444119759305888865127012932088217600000
59 16 4 2 62671365871027968962625027691561817997506140958876900738150400000
60 16 4 2 259105543035583039429766038662433668998456660566416258886520832000
61 16 4 2 889428267668414961089138119575550372014240808053275769482575872000
62 16 4 2 2382172342138755521077314116848435721862984634708789861244239872000
63 16 4 2 4437213293644311557816587990199342976125765663655136187709235200000
64 16 4 2 4325017367677880742663367673632369189388101830634256108595793920000
65 16 4 2 0
real 0m10.924s
user 0m10.538s
sys 0m0.388s

This program counts 16,390,235,567,479,693,920 passwords.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
enum { RLength = 16 }; // Required length of password.
enum { NChars = 16 }; // Number of characters in alphabet.
typedef struct
{
/* N[i] counts how many instances of i are left to use, as constrained
by rule 3.
*/
unsigned N[NChars];
/* NPairs counts how many more pairs are allowed, as constrained by
rule 5.
*/
unsigned NPairs;
/* Used counts how many characters have been distinguished by choosing
them as a represenative. Symmetry remains unbroken for NChars - Used
characters.
*/
unsigned Used;
} Supply;
/* Count the number of passwords that can be formed starting with a string
(in String) of length Length, with state S.
*/
static uint64_t Count(int Length, Supply *S, char *String)
{
/* If we filled the string, we have one password that obeys the rules.
Return that. Otherwise, consider suffixing more characters.
*/
if (Length == RLength)
return 1;
// Initialize a count of the number of passwords beginning with String.
uint64_t C = 0;
// Consider suffixing each character distinguished so far.
for (unsigned Char = 0; Char < S->Used; ++Char)
{
/* If it would violate rule 3, limiting how many times the character
is used, do not suffix this character.
*/
if (S->N[Char] == 0) continue;
// Does the new character form a pair with the previous character?
unsigned IsPair = String[Length-1] == Char;
if (IsPair)
{
/* If it would violate rule 4, a character may not appear three
times in a row, do not suffix this character.
*/
if (String[Length-2] == Char) continue;
/* If it would violate rule 5, limiting how many times pairs may
appear, do not suffix this character.
*/
if (S->NPairs == 0) continue;
/* If it forms a pair, and our limit is not reached, count the
pair.
*/
--S->NPairs;
}
// Count the character.
--S->N[Char];
// Suffix the character.
String[Length] = Char;
// Add as many passwords as we can form by suffixing more characters.
C += Count(Length+1, S, String);
// Undo our changes to S.
++S->N[Char];
S->NPairs += IsPair;
}
/* Besides all the distinguished characters, select a representative from
the pool (we use the next unused character in numerical order), count
the passwords we can form from it, and multiply by the number of
characters that were in the pool.
*/
if (S->Used < NChars)
{
/* A new character cannot violate rule 3 (has not been used 4 times
yet, rule 4 (has not appeared three times in a row), or rule 5
(does not form a pair that could pass the pair limit). So we know,
without any tests, that we can suffix it.
*/
// Use the next unused character as a representative.
unsigned Char = S->Used;
/* By symmetry, we could use any of the remaining NChars - S->Used
characters here, so the total number of passwords that can be
formed from the current state is that number times the number that
can be formed by suffixing this particular representative.
*/
unsigned Multiplier = NChars - S->Used;
// Record another character is being distinguished.
++S->Used;
// Decrement the count for this character and suffix it to the string.
--S->N[Char];
String[Length] = Char;
// Add as many passwords as can be formed by suffixing a new character.
C += Multiplier * Count(Length+1, S, String);
// Undo our changes to S.
++S->N[Char];
--S->Used;
}
// Return the computed count.
return C;
}
int main(void)
{
/* Initialize our "supply" of characters. There are no distinguished
characters, two pairs may be used, and each character may be used at
most 4 times.
*/
Supply S = { .Used = 0, .NPairs = 2 };
for (unsigned Char = 0; Char < NChars; ++Char)
S.N[Char] = 4;
/* Prepare space for string of RLength characters preceded by a sentinel
(-1). The sentinel permits us to test for a repeated character without
worrying about whether the indexing goes outside array bounds.
*/
char String[RLength+1] = { -1 };
printf("There are %" PRIu64 " possible passwords.\n",
Count(0, &S, String+1));
}

The number of possibilities there are, is fixed. You could either come up with an algorithm to generate valid combinations, or you could just iterate over the entire problem space and check each combination using a simple function that checks for the validity of the combination.
How long it takes, depends on the computer and the efficiency. You could easily make it a multithreaded application.

Related

Why is the 17th digit of version 4 GUIDs limited to only 4 possibilities?

I understand that this doesn't take a significant chunk off of the entropy involved, and that even if a whole nother character of the GUID was reserved (for any purpose), we still would have more than enough for every insect to have one, so I'm not worried, just curious.
As this great answer shows, the Version 4 algorithm for generating GUIDs has the following format:
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
x is random
4 is constant, this represents the version number.
y is one of: 8, 9, A, or B
The RFC spec for UUIDs says that these bits must be set this way, but I don't see any reason given.
Why is the third bullet (the 17th digit) limited to only those four digits?
Bits, not hex
Focusing on hexadecimal digits is confusing you.
A UUID is not made of hex. A UUID is made of 128 bits.
Humans would resent reading a series of 128 bits presented as a long string of 1 and 0 characters. So for the benefit of reading and writing by humans, we present the 128-bits in hex.
Always keep in mind that when you see the series of 36 hex characters with hyphens, you are not looking at a UUID. You are looking at some text generated to represent the 128-bits of that are actually in the UUID.
Version & Variant
The first special meaning you mention, the “version” of UUID, is recorded using 4 bits. See section 4.1.3 of your linked spec.
The second special meaning you indicate is the “variant”. This value takes 1-3 bits. This See section 4.1.1 of your linked spec.
A hex character represents 4 bits (half an octet).
The Version number, being 4 bits, takes an entire a single hex character to itself.
Version 4 specifically uses the bits 01 00 which in hex is 4 as it is too in decimal (base 10) numbers.
The Variant, being 1-3 bits, does not take an entire hex character.
Outside the Microsoft world of GUIDs, the rest of the industry nowadays uses two bits: 10, for a decimal value of 2, as the variant. This pair of bits lands in the most significant bits of octet # 8. That octet looks like this, where ‘n’ means 0 or 1: 10 nn nn nn. A pair of hex characters represent each half of that octet. So your 17th hex digit, the first half of that 8th octet, 10 nn, can only have four possible values:
10 00 (hex 8)
10 01 (hex 9)
10 10 (hex A)
10 11 (hex B)
Quoting the estimable Mr. Lippert
First off, what bits are we talking about when we say “the bits”? We already know that in a “random” GUID the first hex digit of the third section is always 4....there is additional version information stored in the GUID in the bits in the fourth section as well; you’ll note that a GUID almost always has 8, 9, a or b as the first hex digit of the fourth section. So in total we have six bits reserved for version information, leaving 122 bits that can be chosen at random.
(from https://ericlippert.com/2012/05/07/guid-guide-part-three/)
tl;dr - it's more version information. To get more specific than that I suspect you're going to have to track down the author of the spec.
Based upon what I've tried to learn today, I've attempted to put together a C#/.NET 'LINQPad' snippet/script, to (for some small part) breakdown the GUID/UUID (- in case it helps):
void Main()
{
var guid =
Guid.Parse(
//#"08c8fbdc-ff38-402e-b0fd-353392a407af" // v4 - Microsoft/.NET
#"7c2a81c7-37ce-4bae-ba7d-11123200d59a" // v4
//#"493f6528-d76a-11ec-9d64-0242ac120002" // v1
//#"5f0ad0df-99d4-5b63-a267-f0f32cf4c2a2" // v5
);
Console.WriteLine(
$"UUID = '{guid}' :");
Console.WriteLine();
var guidBytes =
guid.ToByteArray();
// Version # - 8th octet
const int timeHiAndVersionOctetIdx = 7;
var timeHiAndVersionOctet =
guidBytes[timeHiAndVersionOctetIdx];
var versionNum =
(timeHiAndVersionOctet & 0b11110000) >> 4; // 0xF0
// Variant # - 9th octet
const int clkSeqHiResOctetIdx = 8;
var clkSeqHiResOctet =
guidBytes[clkSeqHiResOctetIdx];
var msVariantNum =
(clkSeqHiResOctet & 0b11100000) >> 5; // 0xE0/3 bits
var variantNum =
(clkSeqHiResOctet & 0b11000000) >> 5; // 0xC0/2bits - 0x8/0x9/0xA/0xB
//
Console.WriteLine(
$"\tVariant # = '{variantNum}' ('0x{variantNum:X}') - '0b{((variantNum & 0b00000100) > 0 ? '1' : '0')}{((variantNum & 0b00000010) > 0 ? '1' : '0')}{((variantNum & 0b00000001) > 0 ? '1' : '0')}'");
Console.WriteLine();
if (variantNum < 4)
{
Console.WriteLine(
$"\t\t'0 x x' - \"Reserved, NCS backward compatibility\"");
}
else
{
if (variantNum == 4 ||
variantNum == 5)
{
Console.WriteLine(
$"\t\t'1 0 x' - \"The variant specified in this {{RFC4122}} document\"");
}
else
{
if (variantNum == 6)
{
Console.WriteLine(
$"\t\t'1 1 0' - \"Reserved, Microsoft Corporation backward compatibility\"");
}
else
{
if (variantNum == 7)
{
Console.WriteLine(
$"\t\t'1 1 1' - \"Reserved for future definition\"");
}
}
}
}
Console.WriteLine();
Console.WriteLine(
$"\tVersion # = '{versionNum}' ('0x{versionNum:X}') - '0b{((versionNum & 0b00001000) > 0 ? '1' : '0')}{((versionNum & 0b00000100) > 0 ? '1' : '0')}{((versionNum & 0b00000010) > 0 ? '1' : '0')}{((versionNum & 0b00000001) > 0 ? '1' : '0')}'");
Console.WriteLine();
string[] versionDescriptions =
new[]
{
#"The time-based version specified in this {{RFC4122}} document",
#"DCE Security version, with embedded POSIX UIDs",
#"The name-based version specified in this {{RFC4122}} document that uses MD5 hashing",
#"The randomly or pseudo-randomly generated version specified in this {{RFC4122}} document",
#"The name-based version specified in this {{RFC4122}} document that uses SHA-1 hashing"
};
Console.WriteLine(
$"\t\t'{versionNum}' = \"{versionDescriptions[versionNum - 1]}\"");
Console.WriteLine();
Console.WriteLine(
$"'RFC4122' document - <https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.1>");
Console.WriteLine();
}

Converting from Source-based Indices to Destination-based Indices

I'm using AVX2 instructions in some C code.
The VPERMD instruction takes two 8-integer vectors a and idx and generates a third one, dst, by permuting a based on idx. This seems equivalent to dst[i] = a[idx[i]] for i in 0..7. I'm calling this source based, because the move is indexed based on the source.
However, I have my calculated indices in destination based form. This is natural for setting an array, and is equivalent to dst[idx[i]] = a[i] for i in 0..7.
How can I convert from source-based form to destination-based form? An example test case is:
{2 1 0 5 3 4 6 7} source-based form.
{2 1 0 4 5 3 6 7} destination-based equivalent
For this conversion, I'm staying in ymm registers, so that means that destination-based solutions don't work. Even if I were to insert each separately, since it only operates on constant indexes, you can't just set them.
I guess you're implicitly saying that you can't modify your code to calculate source-based indices in the first place? I can't think of anything you can do with x86 SIMD, other than AVX512 scatter instructions that take dst-based indices. (But those are not very fast on current CPUs, even compared to gather loads. https://uops.info/)
Storing to memory, inverting, and reloading a vector might actually be best. (Or transferring to integer registers directly, not through memory, maybe after a vextracti128 / packusdw so you only need two 64-bit transfers from vector to integer regs: movq and pextrq).
But anyway, then use them as indices to store a counter into an array in memory, and reload that as a vector. This is still slow and ugly, and includes a store-forwarding failure delay. So it's probably worth your while to change your index-generating code to generate source-based shuffle vectors if at all possible.
To benchmark the solution, I modified the code from my other answer to compare the performance of the scatter instruction (USE_SCATTER defined) with a store and sequential permute (USE_SCATTER undefined). I had to copy the result back to the permutation pattern perm in order to prevent the compiler from optimizing the loop body away:
#ifdef TEST_SCATTER
#define REPEATS 1000000001
#define USE_SCATTER
__m512i ident = _mm512_set_epi32(15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
__m512i perm = _mm512_set_epi32(7,9,3,0,5,8,13,11,4,2,15,1,12,6,10,14);
uint32_t outA[16] __attribute__ ((aligned(64)));
uint32_t id[16], in[16];
_mm512_storeu_si512(id, ident);
for (int i = 0; i < 16; i++) printf("%2d ", id[i]); puts("");
_mm512_storeu_si512(in, perm);
for (int i = 0; i < 16; i++) printf("%2d ", in[i]); puts("");
#ifdef USE_SCATTER
puts("scatter");
for (long t = 0; t < REPEATS; t++) {
_mm512_i32scatter_epi32(outA, perm, ident, 4);
perm = _mm512_load_si512(outA);
}
#else
puts("store & permute");
uint32_t permA[16] __attribute__ ((aligned(64)));
for (long t = 0; t < REPEATS; t++) {
_mm512_store_si512(permA, perm);
for (int i = 0; i < 16; i++) outA[permA[i]] = i;
perm = _mm512_load_si512(outA);
}
#endif
for (int i = 0; i < 16; i++) printf("%2d ", outA[i]); puts("");
#endif
Here's the output for the two cases (using the builtin time command of tcsh, the u output is user-space time in seconds):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
14 10 6 12 1 15 2 4 11 13 8 5 0 3 9 7
store & permute
12 4 6 13 7 11 2 15 10 14 1 8 3 9 0 5
10.765u 0.001s 0:11.22 95.9% 0+0k 0+0io 0pf+0w
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
14 10 6 12 1 15 2 4 11 13 8 5 0 3 9 7
scatter
12 4 6 13 7 11 2 15 10 14 1 8 3 9 0 5
10.740u 0.000s 0:11.19 95.9% 0+0k 40+0io 0pf+0w
The runtime is about the same (Intel(R) Xeon(R) W-2125 CPU # 4.00GHz, clang++-6.0, -O3 -funroll-loops -march=native). I checked the assembly code generated. With USE_SCATTER defined, the compiler generates vpscatterdd instructions, without it generates complex code using vpextrd, vpextrq, and vpextracti32x4.
Edit: I was worried that the compiler may have found a specific solution for the fixed permutation pattern I used. So I replaced it with a randomly generated pattern from std::random_shuffe(), but the time measurements are about the same.
Edit: Following the comment by Peter Cordes, I wrote a modified benchmark that hopefully measures something like throughput:
#define REPEATS 1000000
#define ARRAYSIZE 1000
#define USE_SCATTER
std::srand(unsigned(std::time(0)));
// build array with random permutations
uint32_t permA[ARRAYSIZE][16] __attribute__ ((aligned(64)));
for (int i = 0; i < ARRAYSIZE; i++)
_mm512_store_si512(permA[i], randPermZMM());
// vector register
__m512i perm;
#ifdef USE_SCATTER
puts("scatter");
__m512i ident = _mm512_set_epi32(15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
for (long t = 0; t < REPEATS; t++)
for (long i = 0; i < ARRAYSIZE; i++) {
perm = _mm512_load_si512(permA[i]);
_mm512_i32scatter_epi32(permA[i], perm, ident, 4);
}
#else
uint32_t permAsingle[16] __attribute__ ((aligned(64)));
puts("store & permute");
for (long t = 0; t < REPEATS; t++)
for (long i = 0; i < ARRAYSIZE; i++) {
perm = _mm512_load_si512(permA[i]);
_mm512_store_si512(permAsingle, perm);
uint32_t *permAVec = permA[i];
for (int e = 0; e < 16; e++)
permAVec[permAsingle[e]] = e;
}
#endif
FILE *f = fopen("testperm.dat", "w");
fwrite(permA, ARRAYSIZE, 64, f);
fclose(f);
I use an array of permutation patterns which are modified sequentially without dependencies.
These are the results:
scatter
4.241u 0.002s 0:04.26 99.5% 0+0k 80+128io 0pf+0w
store & permute
5.956u 0.002s 0:05.97 99.6% 0+0k 80+128io 0pf+0w
So throughput is better when using the scatter command.
I had the same problem, but in the opposite direction: destination indices were easy to compute, but source indices were required for the application of SIMD permute instructions. Here's a solution for AVX-512 using a scatter instruction as suggested by Peter Cordes; it should also apply to the opposite direction:
__m512i ident = _mm512_set_epi32(15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
__m512i perm = _mm512_set_epi32(7,9,3,0,5,8,13,11,4,2,15,1,12,6,10,14);
uint32_t id[16], in[16], out[16];
_mm512_storeu_si512(id, ident);
for (int i = 0; i < 16; i++) printf("%2d ", id[i]); puts("");
_mm512_storeu_si512(in, perm);
for (int i = 0; i < 16; i++) printf("%2d ", in[i]); puts("");
_mm512_i32scatter_epi32(out, perm, ident, 4);
for (int i = 0; i < 16; i++) printf("%2d ", out[i]); puts("");
An identity mapping ident is distributed to the out array according to the index pattern perm. The idea is basically the same as the one described for inverting a permutation. Here's the output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
14 10 6 12 1 15 2 4 11 13 8 5 0 3 9 7
12 4 6 13 7 11 2 15 10 14 1 8 3 9 0 5
Note that I have permutations in the mathematical sense (no duplicates). With duplicates, the out store needs to be initialized since some elements could remain unwritten.
I also see no easy way to accomplish this within registers. I thought about cycling through the given permutation by repeatedly applying a permute instruction. As soon as the identity pattern is reached, the one before is the inverse permutation (this goes back to the idea by EOF on unzip operations). However, the cycles can be long. The maximum number of cycles that may be required is given by Landau's function which for 16 elements is 140, see this table. I could show that it possible to shorten this to a maximum of 16 if the individual permutation subcycles are frozen as soon as they coincide with the identity elements. The shortens the average from 28 to 9 permute instructions for a test on random permutation patterns. However, it is still not an efficient solution (much slower than the scatter instruction in the throughput benchmark described in my other answer).

Incorrect output with counting and tallying bits in C

This is only my 2nd programming class. There are 30 rooms. We have to see what is in each room and tally it. I already used the for loop to go through the 30 rooms and I know I have tried to use a bit counter to see what is in each room. I am not getting the correct sample output after I redirect the sample output. When I printf("%d", itemCnt[loc]);, my output is 774778414trolls
When I printf("%d", itemCnt[0]);, my output is 0trolls. I'm just trying to get one output right so I can figure out how to get the rest of the 8 outputs. From the sample output, the first number is supposed to be 6, followed by 6, 1, 4, 4 ... and so on. Below are sample inputs/outputs and what I have so far in code.
Sample input:
1 20 ##
2 21 #A
3 22 ##
4 23 #1
5 22 ##
6 22 ##
7 22 ##
8 22 ##
9 23 #Z Here be trolls � not!
10 23 #+
12 23 ##
13 24 ##
11 22 ##
14 22 #2
15 21 #1
16 20 ##
17 19 ##
18 20 ##
19 19 ##
20 18 ##
21 17 #*
22 16 #*
23 15 #%
0 14 #7
0 gold_bar
1 silver_bar
2 diamond
3 copper_ring
4 jumpy_troll
5 air
6 angry_troll
7 plutonium_troll
Sample Output:
6 gold_bar
6 silver_bar
1 diamond
4 copper_ring
4 jumpy_troll
8 air
15 angry_troll
0 plutonium_troll
code
int main()
{
// contains x and y coordinate
int first, second;
char third[100];
char fourth[100];
char Map[30][30];
// map initialization
for(int x=0; x<30; x++){
for(int y=0; y<30; y++){
Map[x][y] = '.';
}
}
while(scanf("%d %d %s",&first, &second, third) != -1) {
// Condition 1: a zero coordinate
if (first==0 || second==0) exit(0);
// Condition 2: coordinate out of range
if (first<0 || first>30 || second<0 || second>30){
printf("Error: out of range 0-30!\n");
exit(1);
}
Map[second-1][first-1] = third[1];
fgets(fourth, 100, stdin);
// bit counter
int itemCnt[8] = {0}; // array to hold count of items, index is item type
unsigned char test; // holds contents of room.
int loc;
for(loc = 0; loc < 8; loc++) // loop over every bit and see if it is set
{
unsigned char bitPos = 1 << loc; // generate a bit-mask
if((test & bitPos) == bitPos)
++itemCnt[loc];
}
// print the map
for(int h=0; h<30; h++){
for(int v=0; v<30; v++){
printf("%c", Map[h][v]);
}
printf("\n");
}
// print values
printf("%d", itemCnt[0]);
}
return 0;
}
test is not initialized. It looks like you intended to assign 'third[1]' to test.
Also, 774778414 = 0x2E2E2E2E in hex, and 0x2E is the numeric value of ASCII '.', your initial value for map locations. (Tip: when you see wild decimals like that, try Google. I entered, "774778414 in hex" without the quotes.)
I would also suggest breaking down the code into two functions: the first reads from stdin to populate Map (like you do), and the second reads from stdin to populate 8 C strings to describe your objects. It's important to note, the first loop should not go until end of input, because your posted input continues with descriptions, not strictly 3 fields like the beginning.

how do I generate set of all tokens of length exactly equal to 8

I need to generate complete set of tokens of size exactly equal to 8.Each bit in the token can assume values from 0 - 9 and A -Z. For example-
The following are valid tokens:
00000000
0000000A
000000H1
Z00000XA
So basically i want to generate all tokens from 00000000 to ZZZZZZZZ.
How do i do this in C
You caught me on a day where I don't want to do what I'm supposed to be doing1.
The code below only generates token output; it doesn't attempt to store tokens anywhere. You can redirect the output to a file, but as others have pointed out, you're going to need a bigger boat hard drive to store 368 strings. There's probably a way to do this without the nested loops, but this method is pretty straightforward. The first loop updates the counter in each position, while the second maps the counter to a symbol and writes the symbol to standard output.
You can set LEN to a smaller value like 3 to verify that the program does what you want without generating terabytes of output. Alternately, you could use a smaller set of characters for your digits. Ideally, both LEN and digs should be command-line parameters rather than constants, but I've already spent too much time on this.
Edit
Okay, I lied. Apparently I haven't spent too much time on this, because I've cleaned up a minor bug (the first string isn't displayed correctly because I was doing the update before the display) and I've made the length and character set command-line inputs.
Note that this code assumes C99.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define DEFAULT_LEN 8
int main( int argc, char **argv )
{
const char *default_digs="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
size_t len = DEFAULT_LEN;
const char *digs = default_digs;
if ( argc > 2 )
digs = argv[2];
if ( argc > 1 )
len = strtoul( argv[1], NULL, 10 );
int idx[len];
memset( idx, 0, sizeof idx );
size_t diglen = strlen( digs );
for(;;)
{
int j = len;
while( j )
putchar( digs[idx[--j]] );
putchar( '\n' );
while ( j < len && idx[j] == diglen - 1 )
idx[j++] = 0;
if ( j == len )
break;
idx[j]++;
}
return 0;
}
Sample output:
[fbgo448#n9dvap997]~/prototypes/tokgen: ./tokgen 2 01
00
01
10
11
[fbgo448#n9dvap997]~/prototypes/tokgen: ./tokgen 3 01
000
001
010
011
100
101
110
111
[fbgo448#n9dvap997]~/prototypes/tokgen: ./tokgen 2 012
00
01
02
10
11
12
20
21
22
1. Which, to be fair, is pretty much any day ending in a 'y'.
You don't need recursion or a nested loop here at all.
You just need a counter from 0 through 368-1, followed by converting the results to an output in base 36.
That said, 368 strings, each with a length of 8 bytes is 22,568,879,259,648 bytes, or approximately 20.5 terabytes of data.
Assuming a sustained rate of 100 megabytes per second, it'll take approximately 63 hours to write all that data to some hard drives.

What is a fast way to compare a big list of pairs of integers?

Thing is, we have N pairs of integers, as an example:
23 65
45 66
22 65
80 20
30 11
11 20
We say one pair is bigger than another one if both numbers from one pair are greater than the other two, or if the first number is equal and the other one is bigger, or vice-versa. Otherwise, if you can't compare them that way, then you can't establish which one is bigger.
The idea is to know, for each pair, how many pairs it is bigger to (in the example, the first pair is bigger than the third and the last one, therefore the answer for the first is 2).
The trivial solution would be O(n2), which is simply comparing every pair to every other one and adding one to a counter for each positive match.
Can anybody come up with a faster idea?
I have implemented the simple solution (N2), works reading from "sumos.in":
#include <iostream>
#include <fstream>
#define forn(i, x, N) for(i=x; i<N; i++)
using namespace std;
ifstream fin("sumos.in");
ofstream fout("sumos.out");
struct sumo{
int peso, altura;
};
bool operator < (sumo A, sumo B) {
if( A.altura == B.altura )
if( A.peso < B.peso )
return true;
else
return false;
else
if( A.peso == B.peso )
if( A.altura < B.altura )
return true;
else
return false;
else
if( (A.altura < B.altura) && (A.peso < B.peso) )
return true;
else
return false;
}
int L;
sumo T[100000];
int C[100000];
int main()
{
int i, j;
fin >> L;
forn(i, 0, L)
fin >> T[i].peso >> T[i].altura;
forn(i, 0, L)
forn(j, 0, L)
if( j!=i )
if( T[j]<T[i] )
C[i]++;
forn(i, 0, L)
fout << C[i] << endl;
return 0;
}
Example of input:
10
300 1500
320 1500
299 1580
330 1690
330 1540
339 1500
298 1700
344 1570
276 1678
289 1499
Outputs:
1
2
1
6
3
3
2
5
0
0
I solved this problem by using a segment tree. If you wish to see the implementation: http://pastebin.com/Q3AEF1WY
I think I came up with a solution to this but it is rather complex. The basic idea is that there are these groups where the pairs can be arranged in dominated order for example:
11 20 30 11
22 65 80 20
23 65
45 65
If you start thinking about taking your pairs and trying to create these groupings you realize you will end up with a tree structure. For example imagine we added the pair 81 19 to the list and add a pair (-∞, -∞)
(-∞, -∞)
/ \
11 20 30 11 ---\
22 65 80 20 81 19
23 65
45 65
If you follow the path from a node to the root you will count how many pairs the current pair dominates. From this example it kind of looks like you can use binary search to figure out where to insert a pair into the structure. This is where the complexity troubles start. You can't do a binary search/insertion on a linked list. However there is a very neat data structure called a skip list you might use. You can basically search and insert in O(logn) time.
There's still one problem. What if there are tons of these groupings? Imagine a list like
11 20
12 19
13 18
14 17
You're tree structure will look like:
(-∞, -∞)
/ / \ \
11 20 12 19 13 18 14 17
Again use skip lists to order these nodes. I think this will require two different kinds of nodes in the tree, a horizontal type like above and a vertical type like in the first examples. When you are done constructing the tree, do a iterate the tree with DFS while recording the current depth to associate each pair with the number of nodes it dominates.
If the above algorithm is correct you could insert a the pair into the tree in O(logn) time and thus all the pairs in O(nlogn) time. The DFS part will take O(n) time thus constructing the tree and associating a pair with the number it dominates will take O(nlogn) time. You can sort the pairs based on the number of dominations in O(nlogn) time so the whole process will take O(nlogn) time.
Again there is probably a simpler way to do this. Good luck!
You can use. A sort. like this
int z = {23,65,45, 66,22,65,80,20,30,11,11, 20};
int i, j, k, tmp;
for (i=1; i<n; i++){
j= n-i-1;
for (k=0; k<=j; k++)
//Put attention on this block.
if (z[k]>z[k+1]){
tmp= z[k];
z[k]= z[k+1];
z[k+1]= tmp;
}
}

Resources