Data structure for fixed length string lookup - c

I have a bunch of strings as keys. Something like...
AAAA ABBA ACEA ALFG
...
...
ZURF [AAA _JFS aKDJ
They are all unique combination of any 4 characters and are all the same length. There are hundreds of thousands of these. I want to perform a lookup and retrieve the value associated with each string.
I currently have it implemented as a hash table, but the main concern is collisions (I've implemented all of the strategies on Wiki).
I am thinking of implementing this as a prefix tree. Given the parameters though (unique, fixed length), I'm wondering if there is a out-of-the-box data structure I can't think of that would be best suited for this...
EDIT: Additionally, all possible combinations are populated once by a data file. Afterwards, lookups happen at wire speed.

Since you know all of the strings ahead of time, you can use gperf to generate a perfect hash function, which has no collisions. For example, with the four input strings AAAA ABBA ACEA ALFG, it generated the following hash function (using the command line gperf -L ANSI-C input.txt):
static unsigned int
hash (register const char *str, register unsigned int len)
{
static unsigned char asso_values[] =
{
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 7, 2, 5, 12, 12,
12, 12, 12, 12, 12, 12, 0, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12
};
return len + asso_values[(unsigned char)str[1]];
}
const char *
in_word_set (register const char *str, register unsigned int len)
{
static const char * wordlist[] =
{
"", "", "", "",
"ALFG",
"",
"ABBA",
"", "",
"ACEA",
"",
"AAAA"
};
if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
{
register int key = hash (str, len);
if (key <= MAX_HASH_VALUE && key >= 0)
{
register const char *s = wordlist[key];
if (*str == *s && !strcmp (str + 1, s + 1))
return s;
}
}
return 0;
}
Which requires a single table lookup, a length comparison, and a string comparison. If you know for sure that the word you're hashing is one of your source words, then you can skip the string comparison.
Expanding the input size from 4 to 10000 randomly-generated strings increases the hash function to just 4 table lookups plus a length comparison and string comparison. But, since the string comparison has to store every source string in it, this comes out to a very large table in the compiled object file (1.4 MB). If you don't need to do the string comparison, you can omit that table.

A hash table, even with collisions, will outperform anything else, and you can tune it to reduce collisions.

First, transfer each string into an integer. If your alphabet contains 64 symbols (for example), you can use 4*6=24 bits integers as keys.
Now, if more than half of the possible keys are in use (as you say, there are hundreds of thousands of these), maybe to simplest solution will do: just build an array, an access it by index (the integer deduced from the string).
If possible, implement this with a single memory allocation. It may even save memory (The memory wasted due to 100,000's of small allocations).

Related

Get all days from a month

I wish to have all the days from the current month in an array. For example this month (April 2022) has 30 days so I wish to have an array of integers like so:
const monthDays = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 , 30 ]
My attempt :
Array.from(Array(moment('2022-04').daysInMonth()).keys())
And the output is :
//  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
I have and idea how moment works and 0 is always the first day or the first month , but how can i achieve the result that i want from the example above
So basically moment will generate automatically this array if I fetch the current month. How can we achieve that?
Create moment object
Set the month to the desired month
Use daysInMonth() to get the number of days
Create an array from 1 to the result of step 3
const mom = new moment();
mom.set('month', 3); // 0-indexed, so 3 --> 4 --> April
const daysInApril = mom.daysInMonth();
const aprilDays = Array.from({length: daysInApril}, (_, i) => i + 1);
console.log(aprilDays);
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.24.0/moment.min.js"></script>

Reverse-bit iteration in 2D

I use this reverse-bit method of iteration for rendering tasks in one dimension, the goal being to iterate through an array with the bits of the iterator reversed so that instead of computing an array slowly from left to right the order is spread out. I use this for instance when rendering the graph of a 1D function, because this reversed bit iteration first computes values at well-spaced intervals a representative image appears only after a very small fraction of all the values are computed.
So after only a partial rendering we already have a good idea of how the final graph will look. Now I want to apply the same principle to 2D rendering, think raytracing and such, the idea is having a good overall view of the image being rendered even from an early stage. The problem is that making the same idea work as a 2D iteration isn't trivial.
Here's how I do it in 1D:
#include <stdio.h>
#include <stdint.h>
#include <math.h>
static const uint8_t ffo_lut[2048] = {
0,
1,
2, 2,
3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11
};
int32_t log2_ffo32(uint32_t x) // returns the number of bits up to the most significant set bit so that 2^return > x >= 2^(return-1)
{
uint32_t y;
y = x >> 21; if (y) return ffo_lut[y] + 21;
y = x >> 10; if (y) return ffo_lut[y] + 10;
return ffo_lut[x];
}
uint32_t reverse_bits32(uint32_t v)
{
v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
return (v >> 16 ) | ( v << 16);
}
uint32_t reverse_n_bits32(uint32_t v, int n)
{
return reverse_bits32(v) >> (32 - n);
}
uint32_t reverse_iterator_bits32(int *i, uint32_t count) // i should not be the for loop iterator but rather a variable that starts from 0 and isn't touched outside of this function
{
uint32_t ir;
ir = reverse_n_bits32(*i, log2_ffo32(count-1)); // reverse the correct number of bits
(*i)++; // iterate i for the next call to this function
if (ir >= count) // if ir is too large
ir = reverse_iterator_bits32(i, count); // get the next ir in the sequence
return ir;
}
int main()
{
int i, i2, ir, count = 13;
for (i2 = i = 0; i < count; i++)
{
ir = reverse_iterator_bits32(&i2, count);
printf("%d -> %d\n", i, ir);
}
return 0;
}
So there's the main iterator i which iterates normally from 0 to count-1, there's i2 which iterates from 0 to the next power of 2 of count minus 1 with gaps, and there's ir which is the reverse bit iterator. For a count of 13 we get this ir sequence: 0 8 4 12 2 10 6 1 9 5 3 11 7. Note that it's the same sequence as if we had count of 16 with the 3 higher values missing.
But sadly a naive approach to 2D iteration leaves to be desired, one axis will complete entire lines in one stretch, whereas I want the points to be well spread out in 2D. I tried making a 1D iterator (over the full pixel count) that has its bits reversed and then using division and modulo turn this into 2D coordinates, but the quality of the results depend on the dimensions, with power-of-2 dimensions this solves nothing.
For an image of 8x8, ideally the first pixel calculated would be (0,0), then (4,4), (2,2), (2,6), (6,2), (6,6), (1,1), (1,5), (5,1) and so on, but I just can't figure out an elegant way to make a loop that iterates in 2D in such a sequence.
Reversing the bits achieves the expected effect in 1D, you could combine this shuffling technique with another one where you get the x and y coordinates be selecting the even, resp. odd, bits of the resulting number. Combining both methods in a single shuffle is highly desirable to avoid costly bit twiddling operations.
You could also use Gray Codes to shuffle values with n significant bits into a pseudo random order. Here is a trivial function to produce gray codes:
uint32_t gray(uint32_t x) { return x ^ (x - 1); }
Based on chqrlie's idea I used one iterator then distributed its bits in reverse order to x and y coordinates. I used a pretty dumb loop to do the shuffling and maybe that could be improved, but I can't think of anything obvious.
#include <stdio.h>
#include <stdint.h>
#include <math.h>
static const uint8_t ffo_lut[2048] = {
0,
1,
2, 2,
3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11 };
int32_t log2_ffo32(uint32_t x) // returns the number of bits up to the most significant set bit so that 2^return > x >= 2^(return-1)
{
uint32_t y;
y = x>>21; if (y) return ffo_lut[y]+21;
y = x>>10; if (y) return ffo_lut[y]+10;
return ffo_lut[x];
}
typedef struct { int x, y; } xyi_t;
#define MAXN(x, y) (((x) > (y)) ? (x) : (y))
#define get_bit(word, pos) (((word) >> (pos)) & 1)
xyi_t reverse_iterator_bits_2d(uint64_t *i, xyi_t dim)
{
xyi_t ir, dim_bits;
int ib, sh, shift;
shift = log2_ffo32(MAXN(dim.x, dim.y) - 1) - 1; // number of bits needed for each dimension
shuffle_start:
// Shuffle bits from i into ir.x and ir.y
ir.x = 0;
ir.y = 0;
sh = shift;
for (ib=0; *i >> ib; ib++, sh--)
{
ir.x |= get_bit(*i, ib) << sh;
ib++;
ir.y |= get_bit(*i, ib) << sh;
}
(*i)++; // iterate i for the next call to this function
if (ir.x >= dim.x || ir.y >= dim.y) // if ir is too large
goto shuffle_start; // get the next ir in the sequence
return ir;
}
int main()
{
xyi_t ip, dim = { 7 , 5 };
uint64_t i, i2, ir;
uint64_t count = dim.x*dim.y;
for (i2 = i = 0; i < count; i++)
{
ip = reverse_iterator_bits_2d(&i2, dim);
printf("(%d,%d) ", ip.x, ip.y);
if ((i&7)==7) printf("\n");
}
return 0;
}
The resulting pattern gives a fairly uniform image. Here's how it looks on a 320x240 render as it progresses:
Epilogue
I found it useful to fill in the blanks as it renders, by which I mean that every new pixel calculated is rendered as a block of the appropriate size given the current stride so that there is no visible black background unlike in the animation above. Here's an example of a loop that does this:
int max_bits = log2_ffo32(MAXN(dim.x, dim.y) - 1);
int stride = 4096; // initial value only used for first pixel at (0,0)
for (i2=i=0; i < pix_count; i++)
{
ip = reverse_iterator_bits_2d(&i2, dim);
source_pixel = pixel_rendering_function(ip);
int i2_bits = log2_ffo32(i2-2); // log2_ffo64 would be better
// When we change strides (block sizes)
if (i2-1 == (1 << i2_bits) && (i2_bits & 1) == 0)
{
int stride_shift = max_bits - (i2_bits>>1) - 1;
//stride_shift--; // a cool fractal pattern emerges
stride = 1 << stride_shift;
}
// Fill block of unset pixels around the new pixel
if (stride > 1)
{
xyi_t ib, start, end;
start = sub_xyi(ip, set_xyi(stride >> 1));
end = add_xyi(start, set_xyi(stride));
start = max_xyi(start, XYI0);
end = min_xyi(end, dim);
for (ib.y = start.y; ib.y < end.y; ib.y++)
for (ib.x = start.x; ib.x < end.x; ib.x++)
image_buffer[ib.y * dim.x + ib.x] = source_pixel;
}
}
Functions ending in _xyi to handle 2D vectors of int are not included but are fairly obvious.

fprintf results in a utf-16 text file when lines are over 100 characters long [duplicate]

If you type the following string into a text file encoded with utf8(without bom) and open it with notepad.exe,you will get some weired characters on screen. But notepad can actually decode this string well without the last 'a'. Very strange behavior. I am using Windows 10 1809.
[19, 16, 12, 14, 15, 15, 12, 17, 18, 15, 14, 15, 19, 13, 20, 18, 16, 19, 14, 16, 20, 16, 18, 12, 13, 14, 15, 20, 19, 17, 14, 17, 18, 16, 13, 12, 17, 14, 16, 13, 13, 12, 15, 20, 19, 15, 19, 13, 18, 19, 17, 14, 17, 18, 12, 15, 18, 12, 19, 15, 12, 19, 18, 12, 17, 20, 14, 16, 17, 18, 15, 12, 13, 19, 18, 17, 18, 14, 19, 18, 16, 15, 18, 17, 15, 15, 19, 16, 15, 14, 19, 13, 19, 15, 17, 16, 12, 12, 18, 12, 14, 12, 16, 19, 12, 19, 12, 17, 19, 20, 19, 17, 19, 20, 16, 19, 16, 19, 16, 12, 12, 18, 19, 17, 18, 16, 12, 17, 13, 18, 20, 19, 18, 20, 14, 16, 13, 12, 12, 14, 13, 19, 17, 20, 18, 15, 12, 15, 20, 14, 16, 15, 16, 19, 20, 20, 12, 17, 13, 20, 16, 20, 13a
I wonder if this is a windows bug or there is something I can do to solve this.
Did more research; figured it out.
Seems like a variation of the classic case of "Bush hid the facts".
https://en.wikipedia.org/wiki/Bush_hid_the_facts
It looks like Notepad has a different character encoding default for saving a file than it does for opening a file. Yes, this does seem like a bug.
But there is an actual explanation for what is occurring:
Notepad checks for a BOM byte sequence. If it does not find one, it has 2 options: the encoding is either UTF-16 Little Endian (without BOM) or plain ASCII. It checks for UTF-16 LE first using a function called IsTextUnicode.
IsTextUnicode runs a series of tests to guess whether the given text is Unicode or not. One of these tests is IS_TEXT_UNICODE_STATISTICS, which uses statistical analysis. If the test is true, then the given text is probably Unicode, but absolute certainty is not guaranteed.
https://learn.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-istextunicode
If IsTextUnicode returns true, Notepad encodes the file with UTF-16 LE, producing the strange output you saw.
We can confirm this with this character ㄠ. Its corresponding ASCII characters are ' 1' (space one); the corresponding hex values for those ASCII characters are 0x20 for space and 0x31 for one. Since the byte-ordering is Little Endian, the order for the Unicode code point would be '1 ', or U+3120, which you can confirm if you look up that code point.
https://unicode-table.com/en/3120/
If you want to solve the issue, you need to break the pattern which helps IsTextUnicode determine if the given text is Unicode. You can insert a newline before the text to break the pattern.
Hope that helped!

Maxima: define a function that returns a random integer in a range, such that the value is distinct from another value or a list of other values

N.B. the question in the title is addressed in the "Edit: Larger Problem" section, below
Is there a function that will return the type of a variable, in Maxima?
I'm not sure if type is the correct word (I'm very new to this but get the impression it may have a specific technical sense), what I'm looking for is a function that can return true or false if a variable x is a number or an array, e.g. if x : 6;
IsArray(x) = false
IsNumber(x) = true
or, e.g. if y : [1,2,3];
IsArray(y) = true;
IsNumber(y) = false;
I've tried searching the Maxima documentation but haven't been able to find anything. Any help would be appreciated.
Edit: Larger Problem.
I wrote a function that will return a random value y from a range, while ensuring that y is distinct from another value b:
DistinctValue(x,y,LowerLim,UpperLim):= block(
[newY:y],
if x = newY
then newY:DistinctValue(x,rand_range(LowerLim,UpperLim),LowerLim,UpperLim)
else newY:y, return(newY));
where rand_range(LowerLim,UpperLim) is another custom function that chooses a random integer LowerLim ≤ x ≤ UpperLim.
It didn't take long for me to realize that sometimes I will need several such distinct values, so I tweaked the above code so that it can take an array as argument:
DistinctValue(x,y,LowerLim,UpperLim):= block([newY:y],
for i:1 thru length(x)
do if x[i] = newY
then newY:DistinctValue(x,rand_range(LowerLim,UpperLim),LowerLim,UpperLim),
return(newY));
While I know the latter can be used for cases where there is a single number to exclude from the range, simply by placing it in square brackets, I was hoping to learn to write a function that could take x as either a number or an array. I figured the easiest way to do this would be to use an if / else statement that evaluated the type of variable x is, e.g.
DistinctValue(x,y,LowerLim,UpperLim):= block([newY:y],
/* if it's a list, run the list version of the function */
if IsList(x)
then
for i:1 thru length(x)
do if x[i] = newY
then newY : DistinctValue(x, rand_range(LowerLim,UpperLim), LowerLim, UpperLim)
/* otherwise run the number version of the function */
else
if x = newY
then newY : DistinctValue(x, rand_range(LowerLim,UpperLim), LowerLim,UpperLim)
else newY:y,,
return(newY));
While this may seem superfluous, we're implementing Maxima in another, fairly complicated environment, and it'll be used by folks who have even less experience than I. Moreover, I expect to encounter other cases where it will be more of a necessity, than an option, in the near future.
About the function DistinctValue, here's how I would implement a function which returns a random value from a range which is distinct from a single value or from all of a list of values.
DistinctValue(x, LowerLim, UpperLim) :=
if listp(x)
then block([y: rand_range(LowerLim, UpperLim)],
if member(y, x) /* need to try again */
then DistinctValue(x, LowerLim, UpperLim)
else y)
else DistinctValue([x], LowerLim, UpperLim);
This is somewhat different from what's shown above; that might mean I've misunderstood the requirements. I'll let you be the judge of that.
rand_range can be expressed as just
rand_range(LowerLim, UpperLim) := LowerLim + random(UpperLim - LowerLim + 1);
The UpperLim - LowerLim + 1 assures that UpperLim can be returned, otherwise the maximum random value from rand_range is UpperLim minus 1, assuming LowerLim and UpperLim are integers.
EDIT: Seems to work -- here I've already load(descriptive); to get discrete_freq.
(%i32) makelist (DistinctValue ([13, 15, 17], 12, 18), 100);
(%o32) [18, 18, 14, 16, 18, 16, 12, 12, 18, 18, 12, 14, 12, 12,
18, 18, 14, 12, 12, 14, 16, 18, 12, 16, 12, 16, 14, 18, 16, 12,
14, 16, 14, 16, 16, 12, 14, 18, 14, 14, 14, 12, 16, 18, 14, 18,
18, 14, 14, 18, 12, 16, 18, 12, 16, 16, 12, 14, 16, 18, 16, 14,
16, 12, 16, 12, 14, 18, 16, 14, 12, 18, 14, 12, 16, 18, 12, 12,
14, 14, 18, 16, 18, 14, 14, 18, 16, 14, 12, 12, 14, 12, 18, 18,
12, 18, 12, 18, 18, 18]
(%i33) discrete_freq (%);
(%o33) [[12, 14, 16, 18], [26, 25, 21, 28]]

Ruby search for the same values in multidimensional array

I have an array with many arrays inside (2d) (in this example there are four of them):
[
[13, 15, 18, 23, 23, 11, 14, 19, 19, 5, 10, 10, 8, 8],
[8, 15, 19, 21, 21, 12, 16, 18, 18, 11, 13, 13, 6, 6],
[9, 15, 21, 23, 23, 7, 13, 15, 15, 12, 14, 14, 8, 8],
[2, 8, 14, 16, 16, 7, 13, 15, 15, 12, 14, 14, 8, 8]
]
I need to find if any element on any of these arrays is the same and at the same index as in other array. I need to get all those numbers and their indexes.
For ex. First_array[1] = 15, as well as second_array[1] = 15 and third_array[1] = 15. So I need these, with their indexes.
Also all needed values must come from arrays that are to the left or to the right to the array. For ex. - array_one[3] = 23, array_two[3] = 21 and array_three[3] = 23. I dont need these since array_two has a different value and it separates array_one from array_three.
And What I can get is the length of arrays (they all the same length) and the number of arrays, as variables.
I hope you got my point :)
Looks like I am abit closer to my goal. It seems this checks well for the second array (so only two first arrays being checked, but if this was done, the rest should be much easier). And do not judge me, judge just the code :D I know its ugly, its just a prototype:
array.each do |c|
c.each do |v|
c.each_with_index do |k, i|
next_array = array[i + 1]
if next_array.include? v
its_index = next_array.index(v)
if c.index(v) == its_index
p v
end
end
break
end
end
return
end
arr = [[13, 15, 18, 23, 23, 11, 14, 19, 19, 5, 10, 10, 8, 8],
[ 8, 15, 19, 21, 23, 12, 16, 18, 19, 11, 13, 13, 6, 8],
[ 9, 15, 21, 23, 16, 12, 13, 15, 15, 12, 14, 14, 8, 8],
[ 2, 8, 14, 21, 16, 7, 13, 15, 15, 12, 14, 14, 8, 8]]
I've modified arr in a few places.
arr.transpose.each_with_index.with_object({}) do |(col,j),h|
i = 0
h[j] = col.chunk(&:itself).each_with_object({}) do |(x,arr),g|
count = arr.size
g.update(i=>{ value: x, number: count }) if count > 1
i += count
end
end
#=> {0=>{},
# 1=>{0=>{:value=>15, :number=>3}},
# 2=>{},
# 3=>{},
# 4=>{0=>{:value=>23, :number=>2}, 2=>{:value=>16, :number=>2}},
# 5=>{1=>{:value=>12, :number=>2}}
# 6=>{2=>{:value=>13, :number=>2}},
# 7=>{2=>{:value=>15, :number=>2}},
# 8=>{0=>{:value=>19, :number=>2}, 2=>{:value=>15, :number=>2}},
# 9=>{2=>{:value=>12, :number=>2}},
# 10=>{2=>{:value=>14, :number=>2}},
# 11=>{2=>{:value=>14, :number=>2}},
# 12=>{2=>{:value=> 8, :number=>2}},
# 13=>{0=>{:value=> 8, :number=>4}}}
The keys of this hash are indices of columns of arr. The values are hashes that contain the locations and counts of all vertically-adjacent elements which appear at least twice. The columns at indices 0, 2 and 3, are the only ones that contains no vertically-adjacent duplicate values. The column at index 1 contains 3 15's beginning at row index 0; the column at index 4 contains 2 23's, beginning at row index 0 and 2 16's, beginning at row index 2.
matrix = [
[13, 15, 18, 23, 23, 11, 14, 19, 19, 5, 10, 10, 8, 8],
[ 8, 15, 19, 21, 21, 12, 16, 18, 18, 11, 13, 13, 6, 6],
[ 9, 15, 21, 23, 23, 7, 13, 15, 15, 12, 14, 14, 8, 8],
[ 2, 8, 14, 16, 16, 7, 13, 15, 15, 12, 14, 14, 8, 8]
]
equal_surround = matrix
.each_with_index.map do |v,i|
v.each_with_index.map do |k,j|
if (i-1>=0 && k == matrix[i-1][j])
k
elsif (i+1 < matrix.length && k == matrix[i+1][j])
k
else
nil
end
end
end
=> [
[nil, 15, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil],
[nil, 15, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil],
[nil, 15, nil, nil, nil, 7, 13, 15, 15, 12, 14, 14, 8, 8],
[nil, nil, nil, nil, nil, 7, 13, 15, 15, 12, 14, 14, 8, 8]
]
You didn't show any code, so I won't write any either.
I can tell you that array#transpose should make this problem much more manageable, though.
You'll just need to iterate on the rows (former columns) and look for any repeating number.
You can either do it FORTRAN style with a loop or with fancier Enumerable methods, like each_with_index, map or chunk.
test_array = [
[13, 15, 18, 23, 23, 11, 14, 19, 19, 5, 10, 10, 8, 8],
[8, 15, 19, 21, 21, 12, 16, 18, 18, 11, 13, 13, 6, 6],
[9, 15, 21, 23, 23, 7, 13, 15, 15, 12, 14, 14, 8, 8],
[2, 8, 14, 16, 16, 7, 13, 15, 15, 12, 14, 14, 8, 8]
]
final_res = Hash.new {|h,k| h[k] = Array.new }
test_array.each_cons(2).to_a.each_with_index do |(a,b),i|
final_match = Hash.new {|h,k| h[k] = Array.new }
res = a & b
res.each do |ele|
a_index = a.each_index.select{|i| a[i] == ele}
b_index = b.each_index.select{|i| b[i] == ele}
(a_index & b_index).size > 0 ? final_match[ele] << (a_index & b_index) : ''
end
final_match.each_value {|v| v.flatten!}
final_res[:"Match Values Between Array #{i+1} amd Array #{i+2}"] << final_match
end
final_res.each do |a|
puts a
end
OUTPUT:
Match Values Between Array 1 amd Array 2
{15=>[1]}
Match Values Between Array 2 amd Array 3
{15=>[1]}
Match Values Between Array 3 amd Array 4
{15=>[7, 8], 7=>[5], 13=>[6], 12=>[9], 14=>[10, 11], 8=>[12, 13]}

Resources